In the previous post, I've mentioned that I needed to collect all threads
to see which one is hanging. After implementing it, I've also noticed that
all threads were hanging. So I've decided to implement a better remote
debugger(ish).
CAVEAT
The functions and interface are experimental, so it might get changed
in the future.
Suppose we have this script:
(import (rnrs)
(srfi :1)
(srfi :18)
(sagittarius)
(sagittarius debug))
;; Creating a remote debugger. You can specify a specific port number as well
(define remote-debugger (make-remote-debugger "0"))
;; Showing the port.
(print "Debugger port: " (remote-debugger-port remote-debugger))
(define ((sleep-in-deep name time))
(define (before-sleep)
(format #t "~a wants to sleep~%" name))
(define (after-sleep time)
(format #t "~a slept for ~a but wants to sleep more~%" name time))
(define (sleep time) (thread-sleep! time))
(before-sleep)
(sleep time)
(after-sleep time))
(define threads
(map thread-start!
(map (lambda (i)
(make-thread (sleep-in-deep i (* i 1800))
(string-append "sleep-" (number->string i))))
(iota 3))))
(for-each thread-join! threads)
It doesn't do anything but just sleeps. Now, if you run the script, it
shows this kind of message on the cosole.
Debugger port: 50368
0 wants to sleep
0 slept for 0 but wants to sleep more
2 wants to sleep
1 wants to sleep
At this moment, remote debugger is just a type of remote REPL, so open
Emacs and run sagittarius (can be older one). Then do like this
sash> (import (sagittarius remote-repl))
sash> (connect-remote-repl "localhost" "50368")
connect: localhost:50368 (enter ^D to exit)
It's connected, then I want to check hanging threads.
sash> (for-each print (sleeping-threads))
#<thread root runnable 0x10ac6dc80>
#<thread remote-debugger-thread-1295 runnable 0x10bdb8640>
#<thread sleep-1 runnable 0x10bdb8000>
#<thread sleep-2 runnable 0x10c6e0c80>
#<unspecified>
I have 4 threads hanging. root
is the main thread, so I can ignore.
remote-debugger-thread-1295
, the number varies, is a remote debugger's
server thread, I can ignore this as well. So, the rest of the threads
are the targets. I want to filter it for the later usage. I can simply
do this:
sash> (define t* (filter (lambda (t) (and (string? (thread-name t)) (string-prefix? "sleep-" (thread-name t)))) (sleeping-threads)))
Now, let's see the backtrace of the threads.
(for-each print (map thread->pretty-backtrace-string t*))
Thread sleep-1
stack trace:
[1] thread-sleep!
[2] sleep-in-deep
src: (thread-sleep! time)
"sleep.scm":15
Thread sleep-2
stack trace:
[1] thread-sleep!
[2] sleep-in-deep
src: (thread-sleep! time)
"sleep.scm":15
#<unspecified>
It seems the sleep-in-deep
is calling the thread-sleep!
procedure.
Okay, what's the value of the time
?
To see those variables, I need to collect the backtrace of the threads.
Like this:
sash> (define bt* (map thread-backtrace t*))
A backtrace contains multiple frames, in this example, each backtrace
of the threads contains 2 frames. I want to see the first one which is
the thread-sleep!
frame. So, doing this:
sash> (for-each print (map (lambda (bt) (thread-backtrace-arguments bt 1)) bt*))
((local (0 . 1800)))
((local (0 . 3600)))
#<unspecified>
1800
and 3600
seconds! That's why the threads are hanging (obviously...).
In this example, I only showed how to see the arguments (local variables
and free variables), but the remote debugger has a bit more functionality,
such as inspecting objects and accessing its slots. Using this debugger,
I found the root cause of the bug made me suffer for a couple of weeks.