When I write multi thread programs, it's always a trouble to debug whenever a thread hangs. That's because:
- You don't see what's going on. I don't put logging when I write a library and I don't usually use raw thread.
- The problem is gone away if I put a debug print
- And/or it happens only once in a hundred
So, I started thinking that if I want to resolve this, it has to be a fundamental solution instead of AdHoc one.
Now, what could be a fundamental solution for this? Before forming ideas, which may not even be applicable, I need to think what's the requirements to debug hanged threads. So, things what I want to see are:
- State of the thread If it's running, sleeping, waiting whatsoever
- Where the location is or which procedure is being called.
- Who the caller of the procedure is
Looking at these, it's basically thread introspection. This means, I need to know the thread, even though once a thread is created, then we just release it to dark space hoping he can manage his wellness (I feel like a parent looking at my own child leaving the house 😢). If you need to be a worrying parent, you need to let your children have a GPS to track where they are. Okay then, let Sagittarius be a good parent for his own children (threads).
The idea came up in my mind is to change the VM architecture. Currently, a VM, as we all know it's an abbreviation of Virtual Machine, is a thread. So, whenever a thread is created, then we have a new VM. Now, there must be a manager to holds / monitor the threads. I don't dare to change names, so let's the manager kernel. Then the architecture should look like this:
+------------------------------------+
| kernel |
+---+--------------------------------+
| threads
+-+-+------+---+
| | main | * |
+---+------+-|-+
|
+-+-+-------+---+
| * | child | * |
+---+-------+-|-+
|
+-+-+-------+---+
| * | child | * |
+---+-------+---+
Threads should be stored in a double-linked list and each threads should have the reference of the kernel to make my life easier. With this architecture, it seems I can add extra thread to monitor other siblings from Scheme world.
I feel I'm overlooking something but probably a good starting point.