GSOC (Partial) Week 7 report
An exciting week.
This week was exciting. Spending it on learning about the go runtime was the reason for this. As insightfull as it was however, it also confused me a little bit. Before this goes any further, I should state that this is a partial report on my research and my findings. My aims for this week were the following: To investigate the behavior of go programs under the Hurd, to study the go runtime, and possibly modify it to see if the goroutine issues are libpthread’s issue or the go’s runtime issue.
Presenting my findings.
Most of my time was spent studying the gcc go frontend, libgo and the go runtime. Fortunatelly, I can say (gladly) that it was time well spent. What I got from it were some nice pieces of insight, but also some slight confusion and doubts.
The first interesting thing in my findings was this:
struct G
{
Defer* defer;
Panic* panic;
void* exception; // current exception being thrown
bool is_foreign; // whether current exception from other language
void *gcstack; // if status==Gsyscall, gcstack = stackbase to use during gc
uintptr gcstack_size;
void* gcnext_segment;
void* gcnext_sp;
void* gcinitial_sp;
ucontext_t gcregs;
byte* entry; // initial function
G* alllink; // on allg
void* param; // passed parameter on wakeup
bool fromgogo; // reached from gogo
int16 status;
int64 goid;
uint32 selgen; // valid sudog pointer
const char* waitreason; // if status==Gwaiting
G* schedlink;
bool readyonstop;
bool ispanic;
bool issystem;
int8 raceignore; // ignore race detection events
M* m; // for debuggers, but offset not hard-coded
M* lockedm;
M* idlem;
int32 sig;
int32 writenbuf;
byte* writebuf;
// DeferChunk *dchunk;
// DeferChunk *dchunknext;
uintptr sigcode0;
uintptr sigcode1;
// uintptr sigpc;
uintptr gopc; // pc of go statement that created this goroutine
int32 ncgo;
CgoMal* cgomal;
Traceback* traceback;
ucontext_t context;
void* stack_context[10];
};
Yep. This is the code that resembles a (yeah, you guessed it, a goroutine). I was pretty surprised at first to see that a thread is resembled as a struct. But then again, taking a closer look at it, it makes perfect sense. The next one though was a lot trickier:
struct M
{
G* g0; // goroutine with scheduling stack
G* gsignal; // signal-handling G
G* curg; // current running goroutine
int32 id;
int32 mallocing;
int32 throwing;
int32 gcing;
int32 locks;
int32 nomemprof;
int32 waitnextg;
int32 dying;
int32 profilehz;
int32 helpgc;
uint32 fastrand;
uint64 ncgocall; // number of cgo calls in total
Note havenextg;
G* nextg;
M* alllink; // on allm
M* schedlink;
MCache *mcache;
G* lockedg;
G* idleg;
Location createstack[32]; // Stack that created this thread.
M* nextwaitm; // next M waiting for lock
uintptr waitsema; // semaphore for parking on locks
uint32 waitsemacount;
uint32 waitsemalock;
GCStats gcstats;
bool racecall;
void* racepc;
uintptr settype_buf[1024];
uintptr settype_bufsize;
uintptr end[];
};
This was a source of endless confusion at the beginning. It does have some hints reassuring the fact that G’s are indeed goroutines, but nothing that really helps to describe what an M is. It’s structure is identical to that of the G however, which means that it might have something to do with a thread. And indeed it is. Further study of the source code made me speculate that M’s must be the real operating system scheduled (kernel) threads, while G’s (goroutines) must be the lightweight threads managed by the go runtime.
I was more than happy to find comments that reassured that position of mine.
// The go scheduler's job is to match ready-to-run goroutines (`g's)
// with waiting-for-work schedulers (`m's)
Another cool finding was the go (runtime) scheduler - from which the above comment originates:
struct Sched {
Lock;
G *gfree; // available g's (status == Gdead)
int64 goidgen;
G *ghead; // g's waiting to run
G *gtail;
int32 gwait; // number of g's waiting to run
int32 gcount; // number of g's that are alive
int32 grunning; // number of g's running on cpu or in syscall
M *mhead; // m's waiting for work
int32 mwait; // number of m's waiting for work
int32 mcount; // number of m's that have been created
volatile uint32 atomic; // atomic scheduling word (see below)
int32 profilehz; // cpu profiling rate
bool init; // running initialization
bool lockmain; // init called runtime.LockOSThread
Note stopped; // one g can set waitstop and wait here for m's to stop
};
From that particular piece of code, without a doubt the most interesting line is: G *gfree
. That is a pool of the go routines that are available to be used.
There are also helper schedulling functions, from which, the most interesting (for my purposes), was the static void gfput(G*);
which realeases a go routine (puts it to the gfree list)
// Put on gfree list. Sched must be locked.
static void
gfput(G *gp)
{
gp->schedlink = runtime_sched.gfree;
runtime_sched.gfree = gp;
}
There are loads of other extremely interesting functions there, but for the sake of space I will not expand here more. However I will expand on what it is that is confusing me:
The source of confusion
My tests in this point are to include testing if removing thread destruction from the go runtime would result in difference in behavior. There are however (as far as go is concerned), two kinds of threads in the go runtime. Goroutines (G’s) and the kernel schedulable threads (M’s).
Neither of which, seem to really be destroyed. From my understanding so far, G’s are never totally destroyed (I may be wrong here, I am still researching this bit). Whenever
they are about to “destroyed”, they are added to the scheduler’s list of freeG’s to allow for reuse, as evidenced by the gfput
and gfget
functions.
M’s on the other hand (the kernel threads), also seem to not be destroyed. A comment in go’s scheduler seems to support this (// For now, m's never go away.
) and as a
matter of fact I could not find any code that destroyed M’s (I am still researching this bit).
Since none of the two actually get destroyed, and seeing as thread creation alone should not be buggy, how come we are facing the specific bugs we are facing? I will try to provide with an interpretation: Either I am fairly wrong and M’s (or G’s or both) actually do get destroyed somewhere (possible and very much probable) or I looking for clues regarding the issue in the wrong place (might be possible but I don’t see it being very probable).