OS X support and Travis CI

Travis CI doesn't support Bitbucket so I didn't use it until now. I knew there's a way to use it, more precisely, there's a service which synchronises Bitbucket's repository to Github (BitSyncHub). But I didn't use it since there are couple of CI service which support Bitbucket (e.g. drone.io). Now, I've heard that Travis CI supports OSX so I wanted to use it. Using BitSyncHub wasn't difficult at all (so if you want to use Travis CI, it might be worth to consider). Enabling multiple OS support requires dropping a line to their support. No problem. Then I've got the environment, yahoo!!

OS X is, I believe, considered POSIX OS in most of the case. So I was hoping that Sagittarius would work without any change. Well, it didn't. When I write OS specific code, I refer POSIX spec to make sure I'm writing portable code. Was I doing right? I think so because the code worked on FreeBSD without any change which is also considered POSIX compliant OS. The pitfalls of OSX were the followings:
  • sem_timedwait is not implemented (linker error)
  • sem_init is not supported (can be compiled but an error)
  • Passing O_TRUNC to shm_open is an error.
Well, not that much things but pretty much annoying. Especially the first one. I've created a patch to make 0.6.7 be compiled so that CI server can build it.

Long time ago, Sagittarius got a pull request which made it work on OSX. The PR also contains a fix for libffi search path. It requires to specify the path explicitly. Which I think fine as long as you have control of the environment. However on CI server, we don't know until when the version is there. So I've written kind of solution to fine libffi in build process. So now, you can simply build like the following:
$ cmake .
$ make
It does kinda ugly thing internally but it's working so I'm happy with it.

Then I've experienced build failure number of times (including stupid mistake), finally got the successfull build.

Now, I can say Sagittarius supports OSX again.











Renaming identifier (solution of pandoric macro)

I've found a solution for the bug described on the article: pandoricマクロ. It was a rather simple solution.

There were 2 issues and one was a bug.
  • Making a global variable strips syntax information of the given identifier (bug)
  • Explicit renaming doesn't rename when identifiers are bound
The first one was a very simple bug, on the compiler if define has an identifier for its binding's name, then it shouldn't strip. That's it. Though, this isn't a good solution because if the identifier is bound in other library then the binding would be created not the one creating a binding but the one created the identifier. I think it's not a big deal but rather weird.

To resolve the weirdness, the identifier needs to be renamed. On Sagittarius, identifiers internally contain its identity which is a sort of history when the identifier is renamed. If the identity is #f, then it's a global identifier (means almost the same as raw symbol). Now, as far as I remember, each identifier, which reached to define and contains identity, are renamed means created in macros and it should be safe to change its name to unique symbol which is done by compiler via rename-pending-identifier. So I've added extra check to change the identifier name.

The benefit of this change is not only making R7RS syntax-rules work the same as R6RS one but also make R6RS datum->syntax works more R6RS compliant. More precisely, it resolved this issue. So now this won't work:
(import (rnrs))

(define-syntax define/datum->syntax
  (lambda (x)
    (define (concat n1 n2)
       (string-append (symbol->string (syntax->datum n1))
              (symbol->string (syntax->datum n2)))))
    (syntax-case x ()
      ((_ name1 name2 var) ;; oops
       (with-syntax ((name (datum->syntax #'k (concat #'name1 #'name2))))
     #'(define name var))))))

(define/datum->syntax n1 n2 'bar)

;; -> error
(though, I've also found a piece of code depending on this bug...)

In my understanding, this type of renaming isn't required on R7RS. So this is a valid R7RS script:
(define-syntax defbar
  (syntax-rules ()
    ((_ name)
     (defbar name t1))
    ((_ name t1)
       (define t1 'bar)
       (define (name) t1)))))

(defbar foo)

;; may or may not an error
For example, Chicken Scheme returns bar. (For my preferecne, if identifiers are bound in macros, then it should be renamed as R6RS requires, but I can't do much about it.)












一応言い訳をしておくと、R6RS版のsyntax-rulesなら動く。問題になるのはR7RS版のsyntax-rules。何度もブログに書いてるから知ってる人は知ってると思うけど、SagittariusではR7RSで追加された拡張を実現するために、R7RSのsyntax-rulesはChibi Schemeから移植したものを使っている。これは大抵の場合で上手く動くんだけど、上記のような、テンプレート変数で生成した一時変数を別ライブラリで大域に束縛すると動かなくなる。

;; foo.scm
(define-library (foo)
  (export defbar)
  (import (scheme base))
    (define-syntax defbar
      (syntax-rules ()
        ((_ name)
         (defbar name t1))
        ((_ name t1)
           (define t1 'bar)
           (define (name) t1)))))))

(import (scheme base) (scheme write) (foo))

(defbar boo)

(display (boo)) (newline)
;; -> error
sash -r7 foo.scm
(scheme base)(rnrs)にすると動く(define-libraryは変更する必要はない)。何が問題かといえば、defbar内で細くされたテンプレート変数は、そのマクロが定義されたライブラリを内部で保持する識別子となるため、展開後に参照しようとするとそんなものないと怒られるのである。これは健全性を保持するためにそうなっているのだが、一時変数を大域で束縛するとそれがあだになるのである。






Safer thread termination (2)

I think I've found a (imperfect) solution to do it.


The goal is pretty simple. No crash, that's it. Optionally, releasing resource as much as possible. But the resource in this case is not about Scheme world or Sagittarius created, it's OS resource such as allocated stack or so.

The problem

Yes, it's the problem. The problem was sometimes it crashed with 0xC0000005 (access violation, SEGV in POSIX way). The termination process was described in previous article (Safer thread termination) . The cause, in short, was interrupting Eip or Rip may break call stack frame.

As my conclusion, if the Scheme process went into deep inside of C function call, then the program counter is overwritten, the call stack may get broke. So when the process tries to return from RaiseException however there is no place to return or returning invalid address. (This may not be true but seems like it.)

Comparing stack pointer was not enough

In previous article, I've written to do it. It does prevent the error when the thread has already returned from entry point however it didn't resolve returning from deep stack address. So I need to find one more thing.

The issue is breaking call stack. Means, if I can restore it until the thread entry function is called, then the call stack is not broken and can return properly.


If I need to restore the stack frame of certain point, then setjmp and longjmp seem the things I need to use, instead of calling RaiseException. So I've put setjmp before calling Scheme world process. Then the thread-terminate! interrupts the process to the C function which calls longjmp.

Not a perfect solution

Unfortunately, this isn't a perfect solution. It works most of the time and, I think, should be good enough (so far I don't get crash in my test case). However, the following code may not work:
(thread-terminate! (thread-start! (make-thread (lambda () #t))))
The problem of this piece of code is that the thread may not be terminated. If the thread couldn't reach the point where setjmp is called, then thread-terminate! would call longjmp without proper jmp_buf. Or the thread wouldn't be terminated if it didn't reach the internal thread entry point. Who writes this crap? Who knows.

This may not be a perfect solution but at least less possibility to get crashed. Thus it's safer than before. So, for now, I'm happy with it.


Safer thread termination

Terminating a thread is a dangerous operation (AFAIK). So this shouldn't be done in general. However I've wrote the library (util concurrent) which uses thread-terminate! to finish tasks forcibly. As my understanding, as long as you free all resources held by the thread, terminating thread isn't that badly wrong. Well, if this was the conclusion, then I wouldn't write this. It actually works fine on POSIX environments I usually test, including Cygwin, however not working properly on Windows.

Windows has the API which terminates a thread, called TerminateThread. I, however, don't use this because of the following reasons:
  • It may not release all resources
  • There is no way to handle after termination
If the resource aren't released, then it may leak eventually. I don't use Sagittarius on 24/7 service so this might be a trivial issue but you'll never know. So it's better to have as safe as possible.

If you don't use TerminateThread, then there's, afaik, not many choice to do it. The way I've chosen is using CONTEXT. The basic process is the followings:
  1. Suspend the target thread
  2. Check if it's active
  3. Get thread context
  4. Set program counter to the function calls RaiseException
  5. Resume the thread
The thread caller wraps the thread function with _try and _except, so the thrown exception will be caught.

Until here, there seems nothing wrong and works fine. Well no. This works most of the case however at some point you may get a context which contains no call frame of thread caller function. I think this means either thread is almost terminating or it's finished but still active. Then you'd get access violation error.

Now, what can I do? I think I'll try to compare stack pointer. Luckly, Boehm GC has base stack pointer on each thread. So if I can get this and compare with the thread context stack pointer, I might be able to detect if the thread is already finished (or at least returned from the thread function of Sagittarius) or not.

If you know better way, please let me know.