Introduction

This week was spent investigating the runtime and debugging executables with gdb. It was interesting in the sense that it provided me with some interesting pieces of information. Without any further ado, let’s present our findings:

My findings

Before starting out playing with libpthread, and glibc, I wanted to make sure that the goruntime behaved the way I believed it behaved, and make some further assurances about the goruntime. These assurances had to do with the total number of goroutines and the total number of machine threads at various checkpoints in the language runtime.

The first thread in the program is initialised during runtime_schedinit.
The number of m’s (kernel threads) is dependent on the number of goroutines. The runtime basically attempts to create an equal amount of m’s to run the goroutines. We can observe everytime a new goroutine is created, there is a number of calls to initiate a new kernel thread.
There are at least two kernel threads. One that supports the runtime (mainly the garbage collector) and one that executes the code of the go program.

There is only one small piece of code in the goruntime that creates some sort of confusion for me, and that is the code for a new m initialisation. Let me first present the code that confuses me:

M*
runtime_newm(void)
{

    ...
	mp = runtime_mal(sizeof *mp);

    ...
	mcommoninit(mp);
	mp->g0 = runtime_malg(-1, nil, nil);

    ...
	if(pthread_attr_init(&attr) != 0)
		runtime_throw("pthread_attr_init");
	if(pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) != 0)
		runtime_throw("pthread_attr_setdetachstate");

    ...
}

I purposely compacted the function for brevity, as it only serves as a demonstration for a point. Now, my confusion lies in the line mp->g0 = runtime_malg(-1, nil, nil). It is a piece of code that allocates memory for a new goroutine. Now I am ok with that, but what I do not understand is that new kernel threads (m’s) are supposed to be pick and run a goroutine from the global goroutine pool - that is run an existing one, and not create a new one. Now, the runtime_malg is given parameters that don’t initialise a new goroutine properly, but still, new memory is allocated for a new goroutine, and is returned to mp->g0 from runtime_malg.

Assuming I have not misunderstood something, and I am not mistaken (which is kind of likely), this is behavior that could lead to a number of questions and/or problems. For instance, what happens to the goroutine created by runtime_malg? Is it killed after the m is assigned a new goroutine to execute? Is it parked on the goroutine global list? Is it just ignored? Does it affect the runtime scheduler’s goroutine count? This is the last thing I feel I wanna clear out regarding gccgo’s runtime.

gdb

For this week, I also run the executables created by gccgo through gdb. It was a fertile attempt that, most of the time, confirmed my findings in the goruntime. It also provided us with some other nice pieces of information regarding the crashing of goroutines, but also left me with a question.

The code in question that I run through gdb is this:

package main

import "fmt"

func say(s string) {
    for i := 0; i < 5; i++ {
        fmt.Println(s)
    }
}

func main() {
    fmt.Println("[!!] right before a go statement")
    go say("world")
    say ("hello")
}

Your very typical hello world like goroutine program. Now, setting a break point in main (not the program’s main, that’s main.main. main as far as the runtime is concerned is the runtime entry point, in go-main.c) and running it through gdb yields the following results:

Breakpoint 1, main () at ../../../gcc_source/libgo/runtime/go-main.c:52
52 runtime_check ();
2:  __pthread_total = 1
1: runtime_sched.mcount = 0
(gdb) next
53 runtime_args (argc, (byte **) argv);
2: __pthread_total = 1
1: runtime_sched.mcount = 0
54 runtime_osinit ();
2: __pthread_total = 1
1: runtime_sched.mcount = 0
63: runtime_schedinit ();
2: __pthread_total = 1
1: runtime_sched.mcount = 1

Up until now, nothing unexpected. The kernel thread is registered with the runtime scheduler during its initialisation process in runtime_schedinit and that’ why the runtime_sched.mcount is reported to be zero many times before schedinit is run.

68 __go_go (mainstart, NULL);
2: __pthread_total = 1
1: runtime_sched.mcount = 1
(gdb) display runtime_sched.gcount
3: runtime_sched.gcount = 0

That too is ok, because a new goroutine is registered with the scheduler during the call to __go_go. Now I am gonna fast forward a bit, to a more interesting point.

...
[DEBUG] (in runtime_gogo) new goroutine's status is 2
[DEBUG] (in runtime_gogo) number of goroutines now is 2
[New Thread 629.30]

Program received SIGTRAP, Trace/breakpoint trap.
0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
3: runtime_sched.gcount = 2
2: __pthread_total = 2
1: runtime_sched.mcount = 2
(gdb) info threads
 Id   Target  Id       Frame
 6    Thread  629.30   0x08048eb7 in main.main () at goroutine.go:12
 5    Thread  629.29   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
*4    Thread  629.28   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
```
This is getting weird. I mean, libpthread is reporting that 2 threads are active,
but gdb reports that 3 are active. Anyway, let's continue:

```
[DEBUG] (in runtime_stoptheworld) stopped the garbage collector
[DEBUG] (in runtime_starttheworld) starting the garbage collector
[DEBUG] (in runtime_starttheworld) number of m's now is: 2
[DEBUG] (in runtime_starttheworld) [note] there is already one gc thread
[!!] right before a go statement

Program received signal SIGTRAP, Trace/breakpoint trap.
0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
3: runtime_sched.gcount = 2
2: __pthread_total = 2
1: runtime_sched.mcount = 2
(gdb) continue
... (output omitted by me for brevity)

[DEBUG] (in runtime_newm) Right before the call to pthread_create.
a.out: ./pthread/pt-create.c:167: __pthread_create_internal: Assertion `({ mach_port_t ktid = __mach_thread_self (); int ok = thread->kernel_thread == ktid;
__mach_port_deallocate ((__mach_task_self + 0), ktid); ok; })' failed.
[New Thread 629.31]

Program received signal SIGABRT, Aborted.
0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
3: runtime_sched.gcount = 3
2: __pthread_total = 2
1: runtime_sched.mcount = 3

Oh my goodness. From a first glance, this seems to be a very serious inconsistency between libpthread and the goruntime. At this point, the go scheduler reports 3 threads (3 registered threads, that means that flow of execution has passed mcommoninit, the kernel thread initialisation function which also registers the kernel thread with the runtime_scheduler) whereas libpthread reports 2 threads.

But WAIT! Where are you going? Things are about to get even more interesting!

(gdb) info threads
 Id   Target  Id       Frame
 7    Thread  629.31   0x01f4da00 in entry_point () from /lib/i386-gnu/libpthread.so.0.3
 6    Thread  629.30   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
 5    Thread  629.29   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
*4    Thread  629.28   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3

GDB reports 4 threads. Yes, 4 threads ladies and gentlemen. Now take a look closely. 3 threads are in the same frame, with the one with id 4 being the one currently executed. And there is also a pattern. 0x01da48ec is the value of the eip register for all 3 of them.

That’s one thing that is for certain. Now I already have an idea. Why not change the current thread to the one with id 7? I’m sold to the idea, let’s do this:

(gdb) thread 7
[Switching to thread 7 (Thread 629.31)]
#0  0x01f4da00 in entry_point () from /lib/i386-gnu/libpthread.so.0.3
(gdb) continue
Continuing.

Program received signal SIGABRT, Aborted.
[Switching to Thread 629.28]
0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
3: runtime_sched.gcount = 3
2: __pthread_total = 2
1: runtime_sched.mcount = 3
(gdb) info threads
 Id   Target  Id       Frame
 7    Thread  629.31   0x01dc08b0 in ?? () from /lib/i386-gnu/libc.so.0.3
 6    Thread  629.30   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
 5    Thread  629.29   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3
*4    Thread  629.28   0x01da48ec in ?? () from /lib/i386-gnu/libc.so.0.3

Damn. But I am curious. What’s the next value to be executed?

(gdb) x/i $eip
=> 0x1da48ec: ret

And what is the next value to be executed for the thread with id 7?

(gdb) x/i $eip
=> 0x1dc08b0: call *%edx

Conclusion

Apparently, there is still much debugging left to checkout what is really happening. But we have got some leads in the right direction, that hopefully will lead us to finally finding out where the problem lies, and correct it.

Most importantly, in my immediate plans, before iI start playing around with libpthread is to attempt the same debugging run on the same code, under linux (x86). Seeing as go is clean on linux, it would provide some clues as to what the expected results should be, and where the execution differentiates substantially, a clue that might be vital to finding the problem.