Syntax highlighter

2016-07-29

Exception handling in C world

The following piece of code goes into infinite loop on Sagittarius.
(import (rnrs))
(with-exception-handler
 (lambda (k) #t)
 (lambda ()
   (guard (e (#f #f))
     (error 'test "msg"))))
I couldn't figure it out why this never returned at first glance. And it turned out to be very interesting problem.

The problem is related to continuation boundary and exception handlers. If you print the k, then only the very first time you'd see error object with "msg" after that it'd be "attempt to return from C continuation boundary.". But why?

This happens when you capture a continuation and invoke it in the different C level apply. Each time C level apply is called, then VM put a boundary mark on its stack to synchronise the C level stack and VM stack. This is needed because there's no way to restore C level stack when the function is returned. When a continuation is captured on the C apply which is already returned, then invocation of this continuation would cause unexpected result. To avoid this, the VM checks if the continuation is captured on the same C level apply.

Still, why this combination causes this? The answer is raise is implemented in C world and exception handlers are invoked by C level apply. The whole step of the infinite loop is like this:
  1. guard without else clause captures a continuation.
  2. When error is called, then guard re-raise and invoke the captured continuation. (required by R6RS/R7RS)
  3. Exception handler is invoked by C level apply.
  4. VM check the continuation boundary and raises an error on the same dynamic environment as #3
  5. Goes to #3. Hurray!
There are 2 things I can do:
  1. Create a specific condition and when with-exception-handler received it, then it wouldn't restore the exception handler. [AdHoc]
  2. Let raise use the Scheme level apply. (Big task)
#1 probably wouldn't work properly. #2 is really a huge task. I need to think about it.

2016-07-12

リモートREPLとライブラリ依存関係

最近ちょっとしたツールを作るのにPaella(に付属しているPlato)を使っているのだが、リモートREPLの問題とリロードの問題が面倒だなぁと思ってきたのでちょっとメモ。

リモートREPLの問題

これは非常に簡単で#!read-macro=...のようなのが送れない。問題も分かっていて、#!から始まるリーダーマクロは値を返さないで次の値を読みにいく。マクロはリーダー内で解決される。スクリプトや通常のREPLなら特に問題ないんだけど、それ自体を送りたい場合にはあまり嬉しくない。解決方法はぱっと思いつくだけで以下:
  • リモートREPL用のリーダーを作る
    • 面倒
  • リモートREPL用ポートを作って、リードマクロを読ませる
    • アドホックだけど、悪くない気がする
一つ目のは既存のリーダーマクロ機構をどうするんだ?という話になるので、最後の手段にしたい(作れば確実に動くのは分かっているので)。二つ目は何とかいけそうな気がしないでもないんだけど、ソースポート(この場合は標準入力ポート)まで影響が出るかが疑問(出ない気がしてる)。そもそもソースポートにまで影響はでる必要はない気もするなぁ。試してみるか。

ライブラリ依存関係

Platoを使う最大の理由はREPLでリロードができるからなんだけど、ハンドラ(と呼んでいるライブラリ)が依存しているライブラリを変更した際に上手いことその変更を反映する方法がない。ハンドラであればリロードが可能なのだが、依存ライブラリだとロードパスの問題とかも出てきて嬉しくない。手動でloadを呼ぶとか、コード自体をリモートREPLに貼り付けるとか(上記の問題が出ることもあるが)方法がないこともないけど今一面倒。
依存関係の子に当たる部分を親が解決できればなんとかなりそうなんだけど、現状の仕組みでは子が親を探すことができても親は子を知る術がない。突き詰めていくとincludeされたファイルの変更は追跡できないとかあるし。
あるとよさそうなものとして、
  • ライブラリファイルの変更を検知したらリロードする機構
  • 依存関係の親が子を知る手段
かなぁ。一つ目はそういえばファイルシステムの監視機構を入れたからやろうと思えばやれなくもないのか(自動監視だと任意のタイミングでやれないから微妙かな)。二つ目はC側に手を入れてやればやれるけど、必要な場面がかなり限定されているので単なるオーバーヘッドにしかならない気がするなぁ(メモリも喰うだろうし)。こっちはかなり考えないとだめっぽい。いいアイデア募集。

2016-07-08

Syntax parameters (aka SRFI-139)

Marc Nieper-Wißkirchen submitted an interesting SRFI. The SRFI was based on the paper 'Keeping it Clean with Syntax Parameters'. The paper mentioned some corner case of breaking hygiene with datum->syntax, which I faced before (if I knew this paper at that moment!).

In 'Implementation' section, the SRFI mentions that this is implemented on 'Rapid Scheme', Guile and Racket. Unfortunately there's no portable implementation. In my very prejudiced perspective, Guile isn't so fascinated by macro, so the syntax parameters is probably implemented on top of existing macro expander (I haven't check since Guile is released under GPL, and if I see the code it may violates the license). If so, it might be able to be implemented on syntax-case.

Without any deep thinking, I've written the very sloppy implementation like this:
#!r6rs
(library (srfi :139 syntax-parameters)
  (export define-syntax-parameter
          syntax-parameterize)
  (import (rnrs))

(define-syntax define-syntax-parameter
  (syntax-rules ()
    ((_ keyword transformer)
     (define-syntax keyword transformer))))

(define-syntax syntax-parameterize
  (lambda (x)
    (define (rewrite k body keys)
      (syntax-case body ()
        (() '())
        ((a . d)
         #`(#,(rewrite k #'a keys). #,(rewrite k #'d keys)))
        (#(e ...)
         #`#(#,@(rewrite k #'(e ...) keys)))
        (e
         (and (identifier? #'e)
              (exists (lambda (o) (free-identifier=? #'e o)) keys))
         (datum->syntax k (syntax->datum #'e)))
        (e #'e)))
      
    (syntax-case x ()
      ((k ((keyword spec) ...) body1 body* ...)
       (with-syntax (((n* ...)
                      (map (lambda (n) (datum->syntax #'k (syntax->datum n)))
                           #'(keyword ...)))
                     ((nb1 nb* ...)
                      (rewrite #'k #'(body1 body* ...) #'(keyword ...))))
         #'(letrec-syntax ((n* spec) ...) nb1 nb* ...))))))
)
And this can be used like this (taken from example of the SRFI):
#!r6rs
(import (rnrs) (srfi :139 syntax-parameters))

(define-syntax-parameter abort
  (syntax-rules ()
    ((_ . _)
     (syntax-error "abort used outside of a loop"))))

(define-syntax forever
  (syntax-rules ()
    ((forever body1 body2 ...)
     (call-with-current-continuation
      (lambda (escape) 
 (syntax-parameterize
  ((abort
    (syntax-rules ()
      ((abort value (... ...))
       (escape value (... ...))))))
  (let loop ()
    body1 body2 ... (loop))))))))

(define i 0)
(forever
 (display i)
 (newline)
 (set! i (+ 1 i))
 (when (= i 10)
   (abort)))

(define-syntax-parameter return
  (syntax-rules ()
    ((_ . _)
     (syntax-error "return used outside of a lambda^"))))

(define-syntax lambda^
  (syntax-rules ()
    ((lambda^ formals body1 body2 ...)
     (lambda formals
       (call-with-current-continuation
 (lambda (escape)
          (syntax-parameterize
    ((return
      (syntax-rules ()
        ((return value (... ...))
  (escape value (... ...))))))
    body1 body2 ...)))))))

(define product
  (lambda^ (list)
    (fold-left (lambda (n o)
   (if (zero? n)
       (return 0)
       (* n o)))
        1 list)))

(display (product '(1 2 3 4 5))) (newline)
I've tested on Chez, Larceny, Mosh and Sagittarius.

This implementation violates some of 'MUST' specified in the SRFI.
  1. keyword bound on syntax-parameterize doesn't have to be syntax parameter. (on the SRFI it MUST be)
  2. keyword on syntax-parameterize doesn't have to have binding.
And these are the sloppy part:
  1. define-syntax-parameter does nothing
  2. syntax-parameterize traverses the given expression.
If there's concrete test cases and the above implementation passes all, I might send it as a sample implementation for R6RS.

2016-07-07

Weirdness of self evaluating vector

Scheme's vector has a history of being non-self evaluating datum and self evaluating datum. The first one is on R6RS, and the latter one is R7RS (not sure about R5RS). Most of the time, you don't really care about the difference other than it requires ' (quote) or not. However, you may need to think about the difference and maybe also think self evaluating causes more trouble. One of the particular case (and this is the only case I think self evaluating vector is evil) is when vector is used in macro.

Have a look at this case:
(import (rnrs))

(define-syntax foo
  (syntax-case ()
    ((_ e) e)))

(foo #(a b c))
What do you think how it behaves? The answer is depending on the standard. On R6RS, vectors are not self evaluating data so this should be an error. So you can't complain if you'd get a daemon from your nose. On R7RS (of course you should change the importing library name to (scheme base)), on the other hand, vectors are self evaluating data so this should return the input vector.

Now, how about this case?
(import (scheme base) (scheme write))

(define-syntax foo
  (syntax-rules ()
    ((_ "go" (v ...) ())         #(v ...))
    ((_ "go" (v ...) (e e* ...)) (foo "go" (v ... t) (e* ...)))
    ((_ e ...)                   (foo "go" () (e ...)))))

(foo a b c d e)
What would be the expansion result of macro foo? I think this is totally up to implementations (if it's not please let me know). For example, Chibi returns vector of something like {Sc #22 #<Environment 4365836288> () t} (syntactic closure, I think), Sagittarius returns vector of identifier, and Larceny returns vector of symbol t. If you put ' (quote) to the result template then the expansion result should be the same as Larceny returns (though, Chibi still returned a vector of syntactic closure, so this might not be defined, either).

Back to the first case. The first case sometimes bites me when I write/use R6RS macro in R7RS context. For example, SRFI-64 is implemented in R6RS macro and using it like this:
(import (scheme base) (srfi 64))

(test-begin "foo")

(test-equal "boom!" #(a b) (vector 'a 'b))
;; FAIL!!

(test-end)
On Sagittarius, R6RS macro transformer first converts all symbols into identifiers, then syntax information will be stripped only if expressions have quote. Now, SRFI-64 is implemented on the R6RS macro transformer and the vector doesn't have quote. Thus, symbols inside of the vector are converted to identifiers. If it's R6RS, then it's an error. But if it's R7RS, it should be a valid script.

I have sort of solution (not sure if I do it or not): Internally, symbol and the identifiers converted from symbols without any context (c.f. not using datum->syntax) are theoretically the same. So if compiler sees such an identifier, then it should be able to unwrap it safely.

I haven't decided how it should be. So for now, just a memo and let it sleep.