• GSOC Week 9 (Partial) report

    This week was revolving around the print debugging in the gccgo runtime in search for clues regarding the creation of new threads under the goruntime, so as to see if there is something wrong with the runtime itself, or the way the runtime interacts with the libpthread.

    (partial presentation of) findings

    During print debugging the gccgo runtime, I didn’t notice anything abnormal or unusual so far. For example, the code that does trigger the assertion failure seems to work at least once, since pthread_create() returns 0 at least once.

    This is expected behavior, since we already have stated that there is at least one M (kernel thread) created at the initialisation of the program’s runtime.

    If however, we try to use a go statement in our program, to make usage of a goroutine, the runtime still fails at the usual assertion fail, however the output of the program is this:

    root@debian:~/Software/Experiments/go# ./a.out
    [DEBUG] pthread_create returned 0
    a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
    __mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
    Aborted
    

    The above output can give us some pieces of information:

    • pthread_create() is called at least once.
    • it executes successfuly and without errors - libpthread code suggests that 0 is returned upon successful execution and creation of a thread
    • However the assertion is still triggered, which we know it’s getting triggered during thread creation.

    The second bullet point is also being supported by the fact that even if you exe cute something as simple as hello world in go, a new M is created, so you get something along the lines of this as an output:

    root@debian:~/Software/Experiments/go# ./a.out
    [DEBUG] pthread_create returned 0
    Hello World!
    root@debian:~/Software/Experiments/go#

    There is however something that the above piece of code doesn’t tell us, but it would be useful to know: How many times did we create a new thread? So we modify our gcc’s source code to see how many times the runtimes attempts to create a new kernel thread (M). This is what we get out of it:

    root@debian:~/Software/Experiments/go# ./a.out
    [DEBUG] Preparing to create a new thread.
    [DEBUG] pthread_create returned 0
    a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
    __mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
    [DEBUG] Preparing to create a new thread.
    aborted.
    

    The code at this point in the runtime is this:

    // Create a new m.  It will start off with a call to runtime_mstart.
    M*
    runtime_newm(void)
    {
    	M *mp;
    	pthread_attr_t attr;
    	pthread_t tid;
    	size_t stacksize;
    	sigset_t clear;
    	sigset_t old;
    	int ret;
    
    #if 0
    	static const Type *mtype;  // The Go type M
    	if(mtype == nil) {
    		Eface e;
    		runtime_gc_m_ptr(&e);
    		mtype = ((const PtrType*)e.__type_descriptor)->__element_type;
    	}
    #endif
    
    	// XXX: Added by fotis for print debugging.
    	printf("[DEBUG] Preparing to create a new thread.\n")
    
    	mp = runtime_mal(sizeof *mp);
    	mcommoninit(mp);
    	mp->g0 = runtime_malg(-1, nil, nil);
    
    	if(pthread_attr_init(&attr) != 0)
    		runtime_throw("pthread_attr_init");
    	if(pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) != 0)
    		runtime_throw("pthread_attr_setdetachstate");
    
    	// <http://www.gnu.org/software/hurd/open_issues/libpthread_set_stack_size.html>
    #ifdef __GNU__
    	stacksize = StackMin;
    #else
    	stacksize = PTHREAD_STACK_MIN;
    
    	// With glibc before version 2.16 the static TLS size is taken
    	// out of the stack size, and we get an error or a crash if
    	// there is not enough stack space left.  Add it back in if we
    	// can, in case the program uses a lot of TLS space.  FIXME:
    	// This can be disabled in glibc 2.16 and later, if the bug is
    	// indeed fixed then.
    	stacksize += tlssize;
    #endif
    
    	if(pthread_attr_setstacksize(&attr, stacksize) != 0)
    		runtime_throw("pthread_attr_setstacksize");
    
    	// Block signals during pthread_create so that the new thread
    	// starts with signals disabled.  It will enable them in minit.
    	sigfillset(&clear);
    
    #ifdef SIGTRAP
    	// Blocking SIGTRAP reportedly breaks gdb on Alpha GNU/Linux.
    	sigdelset(&clear, SIGTRAP);
    #endif
    
    	sigemptyset(&old);
    	sigprocmask(SIG_BLOCK, &clear, &old);
    	ret = pthread_create(&tid, &attr, runtime_mstart, mp);
    
    	/* XXX: added for debug printing */
    	printf("[DEBUG] pthread_create() returned %d\n", ret);
    
    	sigprocmask(SIG_SETMASK, &old, nil);
    
    	if (ret != 0)
    		runtime_throw("pthread_create");
    
    	return mp;
    }

    We can deduce two things about our situation right now:

    • There is at least one thread successfully created, and there is an attempt to create another one.
    • The second time, there is a failure before pthread_create is called.

    Continuation of work.

    I have been following this course of path the last week. I presented some of my findings, and hope to soon be able to write an exhaustive report on what exactly it is that causes the bug.

  • GSOC Week 8 (Partial) report

    This week was spent studying the go language’s runtime and studying the behaviour of various go programs when executed under the Hurd. I learnt a variety of new things, and got some new clues about the problem.

    The new libgo clues

    I already know that M’s are the “real” kernel schedulable threads and G’s are the go runtime managed ones (goroutines). Last time I had gone through the go runtime’s code I had noticed that neither of them get created, so there must be an issue with thread creation. But since there is at least one of each created during the program’s initialization, how come most programs are able to run, and issues present themselves when we manually attempt to run a goroutine?

    I will admit that the situation looks strange. So I decided to look more into it. Before we go any further, I have to embed the issues I had when I run goroutine powered programs under the Hurd.

    root@debian:~/Software/Experiments/go# ./a.out
    a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
    __mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
    Aborted
    

    __pthread_create_internal is a libpthread function that gets called when a new posix thread is instanciated. So we know that when we call a goroutine, apart from the goroutine, there is at least one kernel thread created, otherwise, if a new goroutine was created, and not a new kernel thread (M) why wasn’t it matched with an existing kernel thread (remember there is at least one).

    That made me look into the go runtime some more. I found a lot of things, that I can not enumerate here, but amongst the most interesting ones, was the following piece of code:

    // Create a new m.  It will start off with a call to runtime_mstart.
    M*
    runtime_newm(void)
    {
    	M *mp;
    	pthread_attr_t attr;
    	pthread_t tid;
    	size_t stacksize;
    	sigset_t clear;
    	sigset_t old;
    	int ret;
    
    #if 0
    	static const Type *mtype;  // The Go type M
    	if(mtype == nil) {
    		Eface e;
    		runtime_gc_m_ptr(&e);
    		mtype = ((const PtrType*)e.__type_descriptor)->__element_type;
    	}
    #endif
    
    	mp = runtime_mal(sizeof *mp);
    	mcommoninit(mp);
    	mp->g0 = runtime_malg(-1, nil, nil);
    
    	if(pthread_attr_init(&attr) != 0)
    		runtime_throw("pthread_attr_init");
    	if(pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) != 0)
    		runtime_throw("pthread_attr_setdetachstate");
    
    	stacksize = PTHREAD_STACK_MIN;
    
    	// With glibc before version 2.16 the static TLS size is taken
    	// out of the stack size, and we get an error or a crash if
    	// there is not enough stack space left.  Add it back in if we
    	// can, in case the program uses a lot of TLS space.  FIXME:
    	// This can be disabled in glibc 2.16 and later, if the bug is
    	// indeed fixed then.
    	stacksize += tlssize;
    
    	if(pthread_attr_setstacksize(&attr, stacksize) != 0)
    		runtime_throw("pthread_attr_setstacksize");
    
    	// Block signals during pthread_create so that the new thread
    	// starts with signals disabled.  It will enable them in minit.
    	sigfillset(&clear);
    
    #ifdef SIGTRAP
    	// Blocking SIGTRAP reportedly breaks gdb on Alpha GNU/Linux.
    	sigdelset(&clear, SIGTRAP);
    #endif
    
    	sigemptyset(&old);
    	sigprocmask(SIG_BLOCK, &clear, &old);
    	ret = pthread_create(&tid, &attr, runtime_mstart, mp);
    	sigprocmask(SIG_SETMASK, &old, nil);
    
    	if (ret != 0)
    		runtime_throw("pthread_create");
    
    	return mp;
    }

    This is the code that creates a new kernel thread. Notice the line ret = pthread_create(&tid, &attr, runtime_mstart, mp);. It’s obvious that it creates a new kernel thread, so that explains why we get the specific error. But what is not explained is that since we do have at least one in program startup, why is this specific error only triggered when we manually create a go routine?

    Go programs under the Hurd

    Apart from studying Go’s runtime source code, I also run some experiments under the Hurd. I got some very weird results that I am investigating, but I would like to share nonetheless. Consider the following piece of code:

    package main
    
    import "fmt"
    
    func say(s string) {
        for i := 0; i < 5; i++ {
            fmt.Println(s)
        }
    }
    
    func main() {
        say("world")
        say("hello")
    }

    A very basic example that can demonstrate goroutines. Now, if we change one of the say functions inside main to a goroutine, this happens:

    root@debian:~/Software/Experiments/go# ./a.out
    a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
    __mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
    Aborted
    

    BUT if we change BOTH of these functions to goroutines (go say("world"), go say("hello")), this happens:

    root@debian:~/Software/Experiments/go# ./a.out
    root@debian:~/Software/Experiments/go# 

    Wait a minute. It can’t be! Did it execute correctly? Where is the output?

    root@debian:~/Software/Experiments/go# echo $?
    0
    root@debian:~/Software/Experiments/go#

    It reports that it has executed correctly. But there is no output.

    What I am doing next

    I will continue reading through the go runtime for some clues. On the more active size, I am writing a custom test case for goroutine testing under the Hurd, while also doing some analysis on the programs that run there (currently studying the assembly generated for these programs) to see how they differ and why we get this particular behavior.

  • GSOC (Partial) Week 7 report

    An exciting week.

    This week was exciting. Spending it on learning about the go runtime was the reason for this. As insightfull as it was however, it also confused me a little bit. Before this goes any further, I should state that this is a partial report on my research and my findings. My aims for this week were the following: To investigate the behavior of go programs under the Hurd, to study the go runtime, and possibly modify it to see if the goroutine issues are libpthread’s issue or the go’s runtime issue.

    Presenting my findings.

    Most of my time was spent studying the gcc go frontend, libgo and the go runtime. Fortunatelly, I can say (gladly) that it was time well spent. What I got from it were some nice pieces of insight, but also some slight confusion and doubts.

    The first interesting thing in my findings was this:

    struct	G
    {
    	Defer*	defer;
    	Panic*	panic;
    	void*	exception;	// current exception being thrown
    	bool	is_foreign;	// whether current exception from other language
    	void	*gcstack;	// if status==Gsyscall, gcstack = stackbase to use during gc
    	uintptr	gcstack_size;
    	void*	gcnext_segment;
    	void*	gcnext_sp;
    	void*	gcinitial_sp;
    	ucontext_t gcregs;
    	byte*	entry;		// initial function
    	G*	alllink;	// on allg
    	void*	param;		// passed parameter on wakeup
    	bool	fromgogo;	// reached from gogo
    	int16	status;
    	int64	goid;
    	uint32	selgen;		// valid sudog pointer
    	const char*	waitreason;	// if status==Gwaiting
    	G*	schedlink;
    	bool	readyonstop;
    	bool	ispanic;
    	bool	issystem;
    	int8	raceignore; // ignore race detection events
    	M*	m;		// for debuggers, but offset not hard-coded
    	M*	lockedm;
    	M*	idlem;
    	int32	sig;
    	int32	writenbuf;
    	byte*	writebuf;
    	// DeferChunk	*dchunk;
    	// DeferChunk	*dchunknext;
    	uintptr	sigcode0;
    	uintptr	sigcode1;
    	// uintptr	sigpc;
    	uintptr	gopc;	// pc of go statement that created this goroutine
    
    	int32	ncgo;
    	CgoMal*	cgomal;
    
    	Traceback* traceback;
    
    	ucontext_t	context;
    	void*		stack_context[10];
    };

    Yep. This is the code that resembles a (yeah, you guessed it, a goroutine). I was pretty surprised at first to see that a thread is resembled as a struct. But then again, taking a closer look at it, it makes perfect sense. The next one though was a lot trickier:

    struct	M
    {
    	G*	g0;		// goroutine with scheduling stack
    	G*	gsignal;	// signal-handling G
    	G*	curg;		// current running goroutine
    	int32	id;
    	int32	mallocing;
    	int32	throwing;
    	int32	gcing;
    	int32	locks;
    	int32	nomemprof;
    	int32	waitnextg;
    	int32	dying;
    	int32	profilehz;
    	int32	helpgc;
    	uint32	fastrand;
    	uint64	ncgocall;	// number of cgo calls in total
    	Note	havenextg;
    	G*	nextg;
    	M*	alllink;	// on allm
    	M*	schedlink;
    	MCache	*mcache;
    	G*	lockedg;
    	G*	idleg;
    	Location createstack[32];	// Stack that created this thread.
    	M*	nextwaitm;	// next M waiting for lock
    	uintptr	waitsema;	// semaphore for parking on locks
    	uint32	waitsemacount;
    	uint32	waitsemalock;
    	GCStats	gcstats;
    	bool	racecall;
    	void*	racepc;
    
    	uintptr	settype_buf[1024];
    	uintptr	settype_bufsize;
    
    	uintptr	end[];
    };

    This was a source of endless confusion at the beginning. It does have some hints reassuring the fact that G’s are indeed goroutines, but nothing that really helps to describe what an M is. It’s structure is identical to that of the G however, which means that it might have something to do with a thread. And indeed it is. Further study of the source code made me speculate that M’s must be the real operating system scheduled (kernel) threads, while G’s (goroutines) must be the lightweight threads managed by the go runtime.

    I was more than happy to find comments that reassured that position of mine.

    // The go scheduler's job is to match ready-to-run goroutines (`g's)
    // with waiting-for-work schedulers (`m's)
    

    Another cool finding was the go (runtime) scheduler - from which the above comment originates:

    struct Sched {
    	Lock;
    
    	G *gfree;	// available g's (status == Gdead)
    	int64 goidgen;
    
    	G *ghead;	// g's waiting to run
    	G *gtail;
    	int32 gwait;	// number of g's waiting to run
    	int32 gcount;	// number of g's that are alive
    	int32 grunning;	// number of g's running on cpu or in syscall
    
    	M *mhead;	// m's waiting for work
    	int32 mwait;	// number of m's waiting for work
    	int32 mcount;	// number of m's that have been created
    
    	volatile uint32 atomic;	// atomic scheduling word (see below)
    
    	int32 profilehz;	// cpu profiling rate
    
    	bool init;  // running initialization
    	bool lockmain;  // init called runtime.LockOSThread
    
    	Note	stopped;	// one g can set waitstop and wait here for m's to stop
    };

    From that particular piece of code, without a doubt the most interesting line is: G *gfree. That is a pool of the go routines that are available to be used. There are also helper schedulling functions, from which, the most interesting (for my purposes), was the static void gfput(G*); which realeases a go routine (puts it to the gfree list)

    // Put on gfree list.  Sched must be locked.
    static void
    gfput(G *gp)
    {
    	gp->schedlink = runtime_sched.gfree;
    	runtime_sched.gfree = gp;
    }

    There are loads of other extremely interesting functions there, but for the sake of space I will not expand here more. However I will expand on what it is that is confusing me:

    The source of confusion

    My tests in this point are to include testing if removing thread destruction from the go runtime would result in difference in behavior. There are however (as far as go is concerned), two kinds of threads in the go runtime. Goroutines (G’s) and the kernel schedulable threads (M’s).

    Neither of which, seem to really be destroyed. From my understanding so far, G’s are never totally destroyed (I may be wrong here, I am still researching this bit). Whenever they are about to “destroyed”, they are added to the scheduler’s list of freeG’s to allow for reuse, as evidenced by the gfput and gfget functions. M’s on the other hand (the kernel threads), also seem to not be destroyed. A comment in go’s scheduler seems to support this (// For now, m's never go away.) and as a matter of fact I could not find any code that destroyed M’s (I am still researching this bit).

    Since none of the two actually get destroyed, and seeing as thread creation alone should not be buggy, how come we are facing the specific bugs we are facing? I will try to provide with an interpretation: Either I am fairly wrong and M’s (or G’s or both) actually do get destroyed somewhere (possible and very much probable) or I looking for clues regarding the issue in the wrong place (might be possible but I don’t see it being very probable).

  • GSOC: Week 6 report

    First of all, I would like to apologize for this report being late. But unfortunately this happened: I Accidentally 93 MB

    Only that, in my case, it was not exactly 93 MB, rather it was about 1.5GB. Yeah, I accidentally obliterated my GCC repository on the Hurd, so I had to reclone and rebuild everything, something that took considerable amounts of time. How this happened is a long story that involved me wanting to rebuild my gcc, and cd-ing 2 directories above the build folder, and ending up rm -rf * from my gcc folder (that included the source, and the build folder) rather than my gcc_build folder. Thank god, that was only a minor setback, and the (small scale) crisis was soon averted.

    Further research

    This week was mostly spent reading source code, primarily looking for clues for the previous situation, and secondarily to get a better undestanding of the systems I am working on. This proved to be fertile, as I got a firmer grip of libpthread, and the GNU Mach system. However, while this week was mostly spent reading documentation, that doesn’t mean that I didn’t do anything practical. I also used my time to do some further research into what was it specifically that triggered the assertion failure. That required us to play a little bit with our newly built compiler on the Hurd and see what we can do with go on the Hurd.

    Testing gccgo under the Hurd

    If you recall correctly, the last time I reported I had found out that an assertion on libpthread`s code was failing, and that was the root cause that failed both the gccgo tests and the libgo tests. That assertion was failing at two different places in the code, the first being __pthread_create_internal which is a libpthread function located in libpthread/pthread/pt-create.c and is invoked when an application wants to create a new POSIX thread. That function of course is not getting called directly, rather it is invoked by pthread_create which is the function that user space application use to create the new thread. (For reference reasons you can find the code here)

    The second place where that assertion was failing was at __sem_timedwait_internal at the file libpthread/sysdeps/generic/sem-timedwait.c, where it gets inlined in the place of self = _pthread_self ();. (For more information, checkout last week’s report).

    So I was curious to test out the execution of some sample programs under the compiler we built on the Hurd. Beginning with some very simple hello world like programs, we could see that they were compiling successfully, and also ran successfully without any issues at all. Seeing as the assertion failure is generated when we attempt to create a new thread, I figured I might want to start playing with go routines under the Hurd.

    So we started playing with a simple hello world like goroutine example (the one available under the tour of go on the golang.org website.)

    package main
    
    import (
        "fmt"
        "time"
    )
    
    func say(s string) {
        for i := 0; i < 5; i++ {
            time.Sleep(100 * time.Millisecond)
            fmt.Println(s)
        }
    }
    
    func main() {
        go say("world")
        say("hello")
    }

    This gets compiled without any issues at all, but when we try to run it…

    a.out: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
    Aborted
    
    
    goroutine 1 [sleep]:
    time.Sleep
    	../../../gcc_source/libgo/runtime/time.goc:26
    
    goroutine 3 [sleep]:
    time.Sleep
    	../../../gcc_source/libgo/runtime/time.goc:26

    Bam! It exploded right infront of our face. Let’s see if this might become friendlier if we alter it a little bit. To do this we removed the go from say to avoid running it as a goroutine, and we also removed time.Sleep (along with the time import), whose job is to pause a go routine.

    When you do this, the code seems to be a hello world like for loop sample, that prints:

    root@debian:~/Software/Experiments/go# ./a.out
    world
    world
    world
    world
    world
    hello
    hello
    hello
    hello
    hello
    

    Hmm. Let’s play with it some more. Changing our code a little bit to make say("world") run as a goroutine gives us the following code:

    package main
    
    import "fmt"
    
    func say(s string) {
        for i := 0; i < 5; i++ {
            fmt.Println(s)
        }
    }
    
    func main() {
        go say("world")
        say("hello")
    }

    Which, when executed results in this:

    root@debian:~/Software/Experiments/go# ./a.out
    a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
    __mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
    Aborted
    

    So we can see that the simplest go programs that run with goroutines do not run. Let’s still try some programs that invoke goroutines to see if our assumptions are correct. Below is the code of a very simple web server in go (found in the golang website).

    package main
    
    import (
        "fmt"
        "net/http"
    )
    
    type Hello struct{}
    
    func (h Hello) ServeHTTP(
        w http.ResponseWriter,
        r *http.Request) {
        fmt.Fprint(w, "Hello!")
    }
    
    func main() {
        var h Hello
        http.ListenAndServe("localhost:4000", h)
    }

    The (non surprising) result is the following:

    a.out: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
    Aborted
    
    goroutine 1 [syscall]:
    no stack trace available
    

    Hmm. This failure was last caused by time.Sleep. So let’s take a closer look into the code of the ListenAndServe function. The code for this function in the go runtime is this:

    // ListenAndServe listens on the TCP network address srv.Addr and then
    // calls Serve to handle requests on incoming connections.  If
    // srv.Addr is blank, ":http" is used.
    func (srv *Server) ListenAndServe() error {
    	addr := srv.Addr
    	if addr == "" {
    		addr = ":http"
    	}
    	l, e := net.Listen("tcp", addr)
    	if e != nil {
    		return e
    	}
    	return srv.Serve(l)
    }

    This calls the function Serve. The interesting part in this one is line 1271:

    
     time.Sleep(tempDelay)
    
    

    It calls time.Sleep on accept failure. Which is known to pause go routines, and as a result be the ultimate cause for the result we are seeing.

    Final thoughts - Work for next week

    So pretty much everything that has anything to do with a goroutine is failing. Richard Braun on the #hurd suggested that since creation and destruction of threads is buggy in libpthread, maybe we should try a work around until a proper fix is in place. Apart from that my mentor Thomas Schwinge suggested to make thread destruction in go’s runtime a no-op to see if that makes any difference. If it does that should mean that there is nothing wrong in the go runtime itself, rather, the offending code is in libpthread. This is also my very next course of action, which I shall report on very soon.

  • GSOC: Week 5 report

    A clue!

    So last week we were left with the compiler test logs and the build results logs that we had to go through to checkout what was the root cause of all these failures in the gccgo test results, and more importantly in the libgo tests. So I went through the gccgo logs in search for a clue about why this may have happened. Here is the list of all the failures I compiled from the logs:

    
    spawn [open ...]^M
    doubleselect.x: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_s      elf_ + 0), ktid); ok; })' failed.
    FAIL: go.test/test/chan/doubleselect.go execution,  -O2 -g
    
    ==========================================================
    
    spawn [open ...]^M
    nonblock.x: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_       + 0), ktid); ok; })' failed.
    FAIL: go.test/test/chan/nonblock.go execution,  -O2 -g
    
    ==========================================================
    
    Executing on host: /root/gcc_new/gccbuild/gcc/testsuite/go/../../gccgo -B/root/gcc_new/gccbuild/gcc/testsuite/go/../../  -fno-diagnostics-show-caret -fdiagnostics-color=never  -I/root/gcc_new/gccbuild/i68      6-unknown-gnu0.3/./libgo  -fsplit-stack -c  -o split_stack376.o split_stack376.c    (timeout = 300)
    spawn /root/gcc_new/gccbuild/gcc/testsuite/go/../../gccgo -B/root/gcc_new/gccbuild/gcc/testsuite/go/../../ -fno-diagnostics-show-caret -fdiagnostics-color=never -I/root/gcc_new/gccbuild/i686-unknown-gnu0.      3/./libgo -fsplit-stack -c -o split_stack376.o split_stack376.c^M
    cc1: error: '-fsplit-stack' currently only supported on GNU/Linux^M
    cc1: error: '-fsplit-stack' is not supported by this compiler configuration^M
    compiler exited with status 1
    output is:
     cc1: error: '-fsplit-stack' currently only supported on GNU/Linux^M
     cc1: error: '-fsplit-stack' is not supported by this compiler configuration^M 
    
    UNTESTED: go.test/test/chan/select2.go
    
    ==========================================================
    
    Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
    spawn [open ...]^M
    select3.x: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate       ((__mach_task_self_ + 0), ktid); ok; })' failed.
    Aborted
     
    FAIL: go.test/test/chan/select3.go execution,  -O2 -g
    
    ==========================================================
    
    Executing on host: /root/gcc_new/gccbuild/gcc/testsuite/go/../../gccgo -B/root/gcc_new/gccbuild/gcc/testsuite/go/../../ /root/gcc_new/gcc/gcc/testsuite/go.test/test/chan/select5.go  -fno-diagnostics-show-      caret -fdiagnostics-color=never  -I/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo  -O  -w  -pedantic-errors  -L/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo -L/root/gcc_new/gccbuild/i686-unknown-      gnu0.3/./libgo/.libs  -lm   -o select5.exe    (timeout = 300)
    spawn /root/gcc_new/gccbuild/gcc/testsuite/go/../../gccgo -B/root/gcc_new/gccbuild/gcc/testsuite/go/../../ /root/gcc_new/gcc/gcc/testsuite/go.test/test/chan/select5.go -fno-diagnostics-show-caret -fdiagno      stics-color=never -I/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo -O -w -pedantic-errors -L/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo -L/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.lib      s -lm -o select5.exe^M
    PASS: go.test/test/chan/select5.go -O (test for excess errors)
    FAIL: go.test/test/chan/select5.go execution
    
    ==========================================================
    
    Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
    spawn [open ...]^M
    bug147.x: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate       ((__mach_task_self_ + 0), ktid); ok; })' failed.
    Aborted
     
    FAIL: go.test/test/fixedbugs/bug147.go execution,  -O2 -g
    
    =========================================================
    
    Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
    spawn [open ...]^M
    BUG: bug347: cannot find caller
    Aborted
     
     
    FAIL: go.test/test/fixedbugs/bug347.go execution,  -O0 -g
    
    ========================================================
    
    Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
    spawn [open ...]^M
    BUG: bug348: cannot find caller
    panic: runtime error: invalid memory address or nil pointer dereference
    [signal 0xb code=0x2 addr=0x0]
     
    goroutine 1 [running]:
    FAIL: go.test/test/fixedbugs/bug348.go execution,  -O0 -g
    
    ========================================================
    
    Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
    spawn [open ...]^M
    mallocfin.x: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self      _ + 0), ktid); ok; })' failed.
    FAIL: go.test/test/mallocfin.go execution,  -O2 -g
    
    =======================================================
    
    Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
    spawn [open ...]^M
    Aborted
     
     
    FAIL: go.test/test/nil.go execution,  -O2 -g
    
    ======================================================
    
    Setting LD_LIBRARY_PATH to .:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:.:/root/gcc_new/gccbuild/i686-unknown-gnu0.3/./libgo/.libs:/root/gcc_new/gccbuild/gcc:/root      /gcc_new/gccbuild/./gmp/.libs:/root/gcc_new/gccbuild/./prev-gmp/.libs:/root/gcc_new/gccbuild/./mpfr/.libs:/root/gcc_new/gccbuild/./prev-mpfr/.libs:/root/gcc_new/gccbuild/./mpc/.libs:/root/gcc_new/gccbuild      /./prev-mpc/.libs
    spawn [open ...]^M
    Aborted
     
     
    FAIL: go.test/test/recover3.go execution,  -O2 -g
    
    

    See a pattern there? Well certainly I do. In several occasions, the root cause for the fail is this:

    Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate       ((__mach_task_self_ + 0), ktid); ok; })' failed.
    

    Hmm… That’s interesting. Let us go through the libgo results too.

    
    Test Run By root on Fri Jul 12 17:56:44 UTC 2013
    Native configuration is i686-unknown-gnu0.3
    
    		=== libgo tests ===
    
    a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
    ../../../gcc/libgo/testsuite/gotest: line 486: 10005 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
    FAIL: bufio
    timed out in gotest
    ../../../gcc/libgo/testsuite/gotest: line 484: kill: (10005) - No such process
    a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
    ../../../gcc/libgo/testsuite/gotest: line 486: 10637 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
    FAIL: bytes
    timed out in gotest
    ../../../gcc/libgo/testsuite/gotest: line 484: kill: (10637) - No such process
    a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
    ../../../gcc/libgo/testsuite/gotest: line 486: 10757 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
    FAIL: errors
    timed out in gotest
    ../../../gcc/libgo/testsuite/gotest: line 484: kill: (10757) - No such process
    a.out: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
    Aborted
    
    
    goroutine 1 [syscall]:
    no stack trace available
    FAIL: expvar
    timed out in gotest
    ../../../gcc/libgo/testsuite/gotest: line 484: kill: (10886) - No such process
    a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
    ../../../gcc/libgo/testsuite/gotest: line 486: 11058 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
    FAIL: flag
    timed out in gotest
    ../../../gcc/libgo/testsuite/gotest: line 484: kill: (11058) - No such process
    a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
    ../../../gcc/libgo/testsuite/gotest: line 486: 11475 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
    FAIL: fmt
    timed out in gotest
    ../../../gcc/libgo/testsuite/gotest: line 484: kill: (11475) - No such process
    a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
    ../../../gcc/libgo/testsuite/gotest: line 486: 11584 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
    FAIL: html
    timed out in gotest
    ../../../gcc/libgo/testsuite/gotest: line 484: kill: (11584) - No such process
    a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
    ../../../gcc/libgo/testsuite/gotest: line 486: 11747 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
    FAIL: image
    timed out in gotest
    ../../../gcc/libgo/testsuite/gotest: line 484: kill: (11747) - No such process
    a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
    ../../../gcc/libgo/testsuite/gotest: line 486: 11999 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
    FAIL: io
    timed out in gotest
    ../../../gcc/libgo/testsuite/gotest: line 484: kill: (11999) - No such process
    a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
    ../../../gcc/libgo/testsuite/gotest: line 486: 12116 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
    FAIL: log
    timed out in gotest
    ../../../gcc/libgo/testsuite/gotest: line 484: kill: (12116) - No such process
    a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
    ../../../gcc/libgo/testsuite/gotest: line 486: 13107 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
    FAIL: math
    timed out in gotest
    ../../../gcc/libgo/testsuite/gotest: line 484: kill: (13107) - No such process
    a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
    ../../../gcc/libgo/testsuite/gotest: line 486: 13271 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
    FAIL: mime
    timed out in gotest
    ../../../gcc/libgo/testsuite/gotest: line 484: kill: (13271) - No such process
    a.out: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
    Aborted
    
    
    goroutine 1 [chan receive]:
    a.out: ./pthread/../sysdeps/generic/sem-timedwait.c:50: __sem_timedwait_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
    panic during panic
    testing.RunTestsFAIL: net
    timed out in gotest
    ../../../gcc/libgo/testsuite/gotest: line 484: kill: (14234) - No such process
    a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
    ../../../gcc/libgo/testsuite/gotest: line 486: 14699 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
    FAIL: os
    timed out in gotest
    ../../../gcc/libgo/testsuite/gotest: line 484: kill: (14699) - No such process
    a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid; __mach_port_deallocate ((__mach_task_self_ + 0), ktid); ok; })' failed.
    ../../../gcc/libgo/testsuite/gotest: line 486: 14860 Aborted                 ./a.out -test.short -test.timeout=${timeout}s "$@"
    FAIL: path
    timed out in gotest
    
    ...
    
    
    runtest completed at Fri Jul 12 18:09:07 UTC 2013
    

    That’s certainly even more interesting. In case you haven’t noticed, it’s the same assertion that caused the failures in gccgo test suite. Let us find the offending code, shall we?

    /* Set the new thread's signal mask and set the pending signals to
         empty.  POSIX says: "The signal mask shall be inherited from the
         creating thread.  The set of signals pending for the new thread
         shall be empty."  If the currnet thread is not a pthread then we
         just inherit the process' sigmask.  */
      if (__pthread_num_threads == 1)
        err = sigprocmask (0, 0, &sigset);
      else
        err = __pthread_sigstate (_pthread_self (), 0, 0, &sigset, 0);
      assert_perror (err);

    This seems to be the code that the logs point to. But no sign of the assertion. After discussing this issue with my peers in #hurd, I was told that the code I was looking for (the failing assertion), is getting inlined via _pthread_self () and is actually located in libpthread/sysdeps/mach/hurd/pt-sysdep.h.

    extern __thread struct __pthread *___pthread_self;
    #define _pthread_self()                                            \
    	({                                                         \
    	  struct __pthread *thread;                                \
    	                                                           \
    	  assert (__pthread_threads);                              \
    	  thread = ___pthread_self;                                \
    	                                                           \
    	  assert (thread);                                         \
    	  assert (({ mach_port_t ktid = __mach_thread_self ();     \
                         int ok = thread->kernel_thread == ktid;       \
                         __mach_port_deallocate (__mach_task_self (), ktid);\
                         ok; }));                                      \
              thread;                                                  \
             })

    So this is what I was looking for. Further discussing it in the weekly IRC meeting, braunr provided me with some more clues:

    08:38:15 braunr> nlightnfotis: did i answer that ?
    08:38:24 nlightnfotis> braunr: which one?
    08:38:30 nlightnfotis> hello btw :)
    08:38:33 braunr> the problems you’re seeing are the pthread resources leaks i’ve been trying to fix lately
    08:38:58 braunr> they’re not only leaks
    08:39:08 braunr> creation and destruction are buggy
    08:39:37 nlightnfotis> I have read so in http://www.gnu.org/software/hurd/libpthread.html. I believe it’s under Thread’s Death right?
    08:40:15 braunr> nlightnfotis: yes but it’s buggy
    08:40:22 braunr> and the description doesn’t describe the bugs
    08:41:02 nlightnfotis> so we will either have to find a temporary workaround, or better yet work on a fix, right?
    08:41:12 braunr> nlightnfotis: i also told you the work around
    08:41:16 braunr> nlightnfotis: create a thread pool

    Work for next week

    This leaves us with next week’s work, which is to hack in libpthread’s code to attempt to create a thread pool, so that we avoid some of the issues that are present now with the current implementation of the Hurd libpthread code.

    It was also suggested by Samuel Thibault (youpi) that I should run the libgo tests by hand and see if I get some more clues, like stack traces. It sounds like a good idea to me, so that’s something that I will look into too.