Syntax highlighter

2015-10-24

er-macro-transformer on top of syntax-case

There are numbers of low level hygienic macros in Scheme world. The most famous ones are probably the followings:
  • explicit renaming
  • syntax-case
  • syntactic closure
Of course there are more (e.g. ir, reverse syntactic) but if you discuss low level hygienic macros, then above would usual be the ones.

R6RS has syntax-case and rumour says R7RS would have explicit renaming. According to this article, if implementations have one of them, then the rest can be implemented atop it. I'm wondering is it really true? Very lame conclusion is true. Because Sagittarius uses kind of syntactic closure and implements both explicit renaming and syntax-case. Now, can it be done in a portable way?

So I've wrote this. It seems this can work most of R6RS implementations except Racket. Though I haven't tested on Larceny, Guile and Vicare yet, they are using either Psyntax or van Tonder expander such as Mosh and IronScheme (Psyntax) or NMosh (van Tonder). So should work.

The basic idea of the implementation is very simple. rename procedure is a simple wrapper of datum->syntax. compare is free-identifier=?. Then wrap the returning form with datum->syntax*.

The initial revision of the Gist used mere datum->syntax. This wasn't good enough because the procedure should only accept datum not syntax object. The error was raised by Psyntax implementations and Racket. Then I've been suggested to walk thought the returning form.

I first thought this won't work because traversing and constructing a new form would return a list not a syntax object. However I was just didn't consider thoroughly. If I use syntax-case and quasisyntax (with-syntax could also be), then I can construct syntax object containing syntax object renamed by er-macro-transformer. So I've rewrite the code. Then most of the R6RS implementation seem working. Even Racket seems working if the macro is very simple like the one on the comment.

Now, my question is 'Is this R6RS portable?'. R6RS standard libraries 12.2 says:
The distinction between the terms “syntax object” and “wrapped syntax object” is important. For example, when invoked by the expander, a transformer (section 12.3) must accept a wrapped syntax object but may return any syntax object, including an unwrapped syntax object.
So transformer *MAY* return unwrapped syntax object. Though what I'm doing is returning wrapped syntax object so should be fine. What I couldn't read is whether or not a syntax object can contain syntax object(s) inserted by other transformer. If this is required feature, then this might be a bug of Racket. Otherwise this is not portable.

I hope it's a bug of Racket, then you (not me) can write an explicit renaming SRFI with sample implementation.

For convenience, embedded source.

2015-10-18

カスタムポートとソケットとバッファポート

「と」で繋げるとなんとなくアニメとかのタイトルっぽい感じがするなぁ。

Sagittariusは0.6.9でポート周りにかなりバグを混入した(弄った)のだが、頭を悩ませるバグが顕在化した。それが表題のポートである。どこで使われているかといえばTLSである。

何が問題か?いくつかの問題が合わさってるのではあるんだが、一つはget-bytevector-nとカスタムポートの相性。カスタムポートは要求されたバイト(文字数)を返す必要はないので、例えば1バイトずつ返して呼び出し元に判断させるということができる。例えば以下のようなの
(import (rnrs))

(let* ([pos 0]
       [p (make-custom-binary-input-port
           "custom in"
           (lambda (bv start count)
             (if (= pos 16)
                 0
                 (begin
                   (set! pos (+ 1 pos))
                   (bytevector-u8-set! bv start pos)
                   1)))
           (lambda () pos)
           (lambda (p) (set! pos p))
           (lambda () 'ok))])
  (get-bytevector-n p 3))
;;-> #vu8(1 2 3)
R6RSテストスイーツからの抜粋なのだが、read!手続きは1バイトずつ処理しget-bytevector-nが最終的に何を返すか決定するという形である。カスタムポートが扱うのが有限のデータなら特に問題ないのだが、ソケットのようにいつEOFがくるか分からないものだと割と困るのである。例えばread!の部分がrecvを使っていたとして、あるソケットは要求の半分のデータを受信したする。そうするとget-bytevector-nは残りの半分を取得するためにもう一回read!を呼び、処理はブロックされる。

不定長のデータを扱うのにget-bytevector-nなんて使うなという話なのだが、本当の問題は別のところにある。0.6.9からバッファポートを導入したのだが、こいつとカスタムポートの相性が悪い。何が悪いかと言えば、バッファを埋めるのにget-bytevector-n相当のことがされているのである。カスタムポートを実装したときに、あまり何も考えずにポート側でバイト数を数えるようにしてしまったので複数回の呼び出しが起きるという話なのではある。

解決方法は多分2つくらいあって、
  1. バッファを埋める処理を別枠にする
  2. 現状ポート側でやってることをget-bytevector-nに移す
1.はそこそこadhocな感じがして嫌だなぁ。2.は(99.99%ないけど)別のところに影響が出る可能性がある(もしくは考慮漏れによるバグの混入)。でも2かなぁ、どう考えても。それでとりあえずはバッファの問題は解決するし。

2015-10-15

CPSマクロ

assocをマクロで書いたらどうなるか、ということを考えた。これくらいならそんなに難しい話ではなく、以下のように書けるだろう。
(import (scheme base) (scheme write))

(define-syntax assocm
  (syntax-rules ()
    ((_ key (alist ...))
     (letrec-syntax ((foo (syntax-rules (key)
                            ((_ (key . e) res (... ...)) (key . e))
                            ((_ (a . d) res (... ...)) (foo res (... ...))))))
       (foo alist ...)))))

;; a bit of trick to avoid unbound variable
(define (c d) (list 'c d))
(define d 1)

(assocm c ((a b) (b d) (c d) (d d)))
;; -> (c 1)
取り出せるのであれば、その中身も欲しい。つまり、carcdrだ。これも同じ要領でやればこんな感じで書けるだろう。
(define-syntax cdrm
  (syntax-rules ()
    ((_ (a . d)) d)))

(cdrm (c . d))
;; -> 1
だが、これら二つのマクロはこんな感じでは組み合わせられない。
(cdrm (assocm c ((a b) (b d) (c d) (d d))))
;; -> error
これはcdrmassocmより先に展開されるからである。あるマクロの展開結果を別のマクロで使いたい状況というのはしばしばある。そういうときにはCPSでマクロを書く。最初のassocmをCPSで書いてみよう。
(define-syntax assocm/cps
  (syntax-rules ()
    ((_ k key (alist ...))
     (letrec-syntax ((foo (syntax-rules (key)
                            ((_ (key . e) res (... ...)) (k (key . e)))
                            ((_ (a . d) res (... ...)) (foo res (... ...))))))
       (foo alist ...)))))

(assocm/cps cdrm c ((a . b) (b . d) (c . d) (d . d)))
;; -> 1
このassocm/cpsは最初の引数に次に展開するマクロを受け取ることで、マクロの展開が終わった際に展開結果(この場合は(key . e))を引数kに渡すことを可能としている。要するに単なるCPSであるがこの状態では割と致命的な欠点がある。それは複数のマクロを組み合わせることができないということである。

例えば上記の例でcadrmはどう書くだろうか?普通に考えれば、carmを定義したのち、cdrmを組み合わせて書きたいところであろう。(もちろんゴリゴリ書いてもいいんだけど。) そう(compose f g)みたいなことがしたわけである。
;; I want to write like this!
(assocm/cps (composem cdrm/cps carm) c ((a . b) (b . d) (c . d) (d . d)))
こうなると、単にマクロを受け取って結果を渡すだけではなく、次のマクロが複合マクロかどうかを判別して上手いことやってくれる何かがほしい。こんな感じだろう。
(define-syntax composem (syntax-rules ()))

;; assume k is CPS macro
(define-syntax extract/cps
  ;; it's a bit awkward to have own name in literals
  ;; but this saves me a lot
  (syntax-rules (composem extract/cps)
    ((_ (composem k) args ...) (k args ...))
    ((_ (composem k ...) args ...)
     (extract/cps "flatten" () (k ...) (args ...)))
    ;; flatten nested composem
    ((_ "flatten" (cps ...) ((composem k ...) k* ...) args)
     (extract/cps "flatten" (cps ... k ...) (k* ...) args))
    ((_ "flatten" (cps ...) (k k* ...) args)
     (extract/cps "flatten" (cps ... k) (k* ...) args))
    ((_ "flatten" (cps ...) () (args ...))
     (extract/cps (extract/cps cps ...) args ...))
    ;; extract/cps keyword
    ((_ (extract/cps (composem k)) args ...) (k args ...))
    ((_ (extract/cps (composem k k* ...)) args ...)
     (k (extract/cps (composem k* ...)) args ...))

    ((_ (extract/cps k) args ...) (k args ...))
    ((_ (extract/cps k k* ...) args ...)
     (k (extract/cps (composem k* ...)) args ...))
    ;; short cut
    ((_ k args ...) (k args ...))))
多少無理やり感があるので(extract/cpsリテラルとか)もう少し綺麗にならないかと思ったりはするのだが、まぁとりあえずこんな感じでいいだろう。これを使って上記のassocm/cpsは以下のように書き換える。
(define-syntax assocm/cps
  (syntax-rules ()
    ((_ k key (alist ...))
     (letrec-syntax ((foo (syntax-rules (key)
                            ((_ (key . e) res (... ...)) 
                             (extract/cps k (key . e)))
                            ((_ (a . d) res (... ...)) (foo res (... ...))))))
       (foo alist ...)))))
さらにcarm/cpscdrm/cpsをこんな感じで定義して、最後のkvaluesmとして定義しよう。CPSマクロの最後に展開されるマクロはCPSではないことに注意しよう。
(define-syntax cdrm/cps
  (syntax-rules ()
    ((_ k (a . d)) (extract/cps k d))))

(define-syntax carm/cps
  (syntax-rules ()
    ((_ k (a . d)) (extract/cps k a))))

(define-syntax valuesm
  (syntax-rules ()
    ((_ args) args)
    ;; this isn't really values...
    ((_ args ...) (args ...))))
こうするとassocmから見つかった値のcadr部分をとるマクロは以下のように書ける。
(assocm/cps (composem cdrm/cps carm/cps valuesm) c ((a b) (b d) (c d) (d d)))
;; -> 1
ちなみに、このコードGaucheでは(まだ)動かないので(すぐ直されると思うけど)、実際に動かしてみたい場合はSagittarius、Chibi、Larcenyのいずれかを使うといいだろう。

ちなみに、この手のコードは3日もすると自分でも理解不能になる危険を孕んでいるので使用する際は十分に留意した方がいいだろう。こういったマクロはだいたOleg氏がまとめているので、メタプログラミングの深淵を覗きたい方はそちらを参照されたい。この辺とか。

2015-10-12

Small Scheme or large Scheme?

2 topics about the size of Scheme were posted on c.l.s. One was rather branch of other topic:How many R6RS users and how much code out there?. And the other is indirectly suggesting it: Question about vote of RnRS (the poster mentioned about the change between R5RS and R6RS as drastic change so seems it's about the size.) Even though all what I wanted to say is already said by Taylan on the first topic I've mentioned, I want to write something about the size so bare with me :)

The point of this topic for me is the definition of small or large. The poster said it is useful enough for educational purpose. I agree with it. If I need to add a bit of my opinion about this, I would say the languages which has proper design for the topic of the class are useful enough for educational purpose as long as students don't use convenient libraries. Java is fine for OOP, C++ is fine for OOP and meta programming, Mathematica is excellent for math, etc. So they can also be programming languages for educational purpose, right? Now are they small?

The answer for me is no. Especially if you see C++'s specification, it's more than gigantic even human beings couldn't understand, IMHO (are there people who understand all the spec?). I wouldn't say it's one of beautifully designed language, but it's powerful enough to cover other purposes including professional use . Then, should Scheme be this much huge to make all programmer happy?

My answer is again no. It's too complicated, it's piling up features on top of other features on top of other, lemme quit. One of the reason why C++ is piling up those features, I believe, is that it doesn't have enough abstraction to make language self growing. Most of programming language don't have this type of feature such as defining new syntax. Or if they have it, it'd be rather complicated to use, either deliberately or accidentally. If it's deliberately, then the designer of the language doesn't want users to use it casually. If it's accidentally, then it's not considered well but made rather adhoc. The first decision is understandable, sometimes those things are not really needed and don't want users to summon daemons from their nose.

Now what's the definition of small in Scheme specification? In my opinion, there is no need to pile up features but it can grow by itself. Let me elaborate what it means.  Currently there're on going discussion about hashtable on SRFI. This might be a good example to do it. Is hashtable required by small language as Scheme? Well, my answer from bottom of my heart is yes but rational answer is no. Why no? It can be implemented by vector and record. Now one of the SRFIs also mentioning weak hashtable. This is rather interesting. Implementing this data structure can not be done neither in range of R6RS nor R7RS. So this seems required, right? Wait a sec, there's a draft SRFI about ephemeron. If you use this, then you can implement weak hashtable using vector, record and ephemeron. So the absolute requirement to have is this one. Hurrah, the language spec is kept small enough!

IMO, this is a bit too extreme. Each time users need to make own utility libraries for those common data structure is ridiculous. Then here comes R7RS-large process. The purpose, in my understanding, of this process is that keeping core language absolutely minimum and put a collection of those commonly used things in its specification. So implementations may or may not support all of them. Oops, again, what's absolutely minimum?

Unfortunately, I don't have generic answer for this and, I believe, neither most of Schemers do. The only thing I do have now is that it's not enought to be perfect language which can solve all problems in this world. Concurrency, networking, weak data, etc. These are not in the specification (maybe yet) but absolutely needed. If there's something lacking, then language specification should grow even if it's got bigger.

The world of computer is growing like speed of light. Problems to be solved are zillion. Too small wouldn't solve. We need small enough.

2015-10-09

bound or free identifier

I've just fixed the bug of incorrect usage of free-identifier=? during macro expansion. As my memo and maybe good to share what I've learnt.

Firstly, have a look at this piece of code:
(import (rnrs))

(define-syntax foo
  (lambda (x)
    (syntax-case x ()
      ((_ (t1 t2))
       ;; what should be print?
       (begin (display (free-identifier=? #'t1 #'t2)) (newline)
              (display (bound-identifier=? #'t1 #'t2)) (newline))
       #''ok)
      ((_ (t* ...) a b ...)
       #'(foo (t* ... t) b ...))
      ((_ a ...)
       #'(foo () a ...)))))
(foo 1 2)
If you can immediately see what's printed, then you can skip the next paragraph :)

The difference between bound-identifier=? and free-identifier=? is that the first one only sees where the given 2 identifiers were  created and the second one sees the actual bindings are the same or not. In the above example, identifier ts are created in different places, more precisely different macro expansion process. If you expand the macro manually, then it would look like this:
(foo 1 2)
;; -> (foo () 1 2))
;; -> (foo (t) 2)) ;; <- where the first t is created 
;; -> (foo (t t))  ;; <- the second one is created here in the different macro expansion
;; 'ok
If 2 identifiers whose names are same are created in different macro expansion, then comparing bound-identifier=? should return #f.free-identifier=?, on the other hand, should return #t because those identifiers are not bound to anything thus they have the same binding (unbound variable) and also have the same name (t). (This is what I understood the behaviour of those 2 procedures. Correct me, if I'm wrong.)

Now the bug was related to this difference. The detail is here. In the description, it says free-identifier=? returns #t against pattern variables t but this is correct behaviour. What I did wrong was using free-identifier=? to compare pattern variables during expanding template variables. Moreover, compiler for syntax-case renamed pattern variables which must be preserved so that expander can use bound-identifier=? to compare pattern variables.

If I know what's wrong, then fixing it is not a big problem. Just adding extra check for pattern variable and use bound-identifier=?. Done! I hope this would be the last article related to macro bug... (feeling won't though)

2015-10-08

マクロバグ

何回目だろうこのネタ。もういい加減尽きたと信じたかったが、そうもいかないらしい。

バグの報告は以下のツイート
突き詰めていくと、テンプレート変数を割り当てていく部分で使用しているfree-identifier=?がこの条件だと全ての一時パターン変数に対して#tを返すということが分かった。adhocに直すならこういう場合の比較に対しては#fを返すようにすればいいのだが、コードを眺めていてふと思った、「こいつらbound-identifier=?で比較しないとまずくね?」と。

単純にこの二つの手続きを置き換えるだけなら何の問題もなく、こんな記事も書いてはいないのだが、そうは問屋が卸してくれなかった。問題になるのは、with-syntax等でラップされた式の中で束縛されたテンプレートを返すようにした場合である。例えばこんなの:
(syntax-case x ()
  ((_ (k ...))
   (with-syntax (((e ...) (gen)))
     #'(lambda (x) (syntax-case x (k...) e ...)))))
具体的にはsyntax-rulesがこんな感じの定義なんだけど、パターン変数がそのままマクロによって生成されるsyntax-caseで使われているのがまずい。現状の実装ではマクロの展開が走ると問答無用で識別子がリネームされてしまうので、パターンのkとテンプレートのkが別の識別子として束縛されてしまう(つまりbound-identifier=?#fを返す)。

あれ、待てよ。パターン変数として束縛された識別子はマクロ環境で束縛されているので、リネームの際にそれを避けるようにすれば無用なリネームが発生しないことになるな。その方向でいくか。やはり問題は一度書き出すとなんとなく解決策がでてくるものだな。まだ解決してないけど。

2015-10-07

Timezone related bugs

Since Sagittarius 0.6.7, it has timezone object. The rationale behind this is pretty simple. Before it used localtime and some other C APIs to get proper local timezone offset. I wasn't unhappy until I needed to handle other timezone offsets. I don't remember exact situation but related to time conversion. Unfortunately, on POSIX there is no way to treat timezone as objects (as far as I could research), so I've decided to implement own timezone handling.

Timezone is related to physical location. Of course, you can change your computer's setting to deceive yourself. I wasn't one of those people and that bit me. I've faced 2 timezone related bugs. Both could only happen in particular places, Japan and country where no summer time. Let me share the bug story. Starting from the first one.

There is IANA TZ database and I'm using it to find proper timezone offset. If you build Sagittarius from repository, then the pre-build process, not cmake but dist.sh, downloads TZ database and compiles it to S-expression. It's not too bad. The only bad thing is that it can't handle any rules not written in the TZ database, such as Japan.

Asia/Tokyo timezone has rather weird rules. According to the TZ database, Japan had summer time between 1948-1951. On the comment, it clearly says this is used only on US military base, for your information, so officially Japan didn't have it, though. Funny thing is that this rule, named Japan, is associated to Asia/Tokyo timezone and it's active now (2015 current). Additionally, the rule doesn't have definition of after 1951. So what should happen? Maybe I didn't read all the comment or how the rules defined properly so I missed something. Anyway, the previous behaviour of handling this situation was signaling an error. Which, of course, caused problem. The biggest one was Sagittarius couldn't be built in Japan.

Thanks to the comment on this blog, I could notice there was such a problem. Other than that, this would stay until I'd decide to go back to Japan. The fix was rather simple, instead of signaling an error, it simply returns default timezone offset without considering old timezone offset. I think that's good enough. (NB: the timezone object would consider the when, means if you ask timezone offset of before 1835 in the Netherlands, then it would return GMT+0:19:32. No need for this? just for fun.)

The second one was rather stupid mistake. Again it was in Japan. I've found a tweet that said time-utc->date didn't consider timezone offset but just return UTC. I saw this when I fixed the first bug so I thought it might be related. Well, kinda but not quite. This bug happened only on Windows (including Cygwin) running on a timezone without summer time (Japan for example).

The reproducing was one easy step. Change system timezone to 'Osaka, Sapporo, Tokyo' then get local timezone. Aha, it returned UTC. But why? Well, very simple. To get local timezone name on Windows (and Cygwin), Sagittarius uses GetTimeZoneInformation and check the return code. If the return code was not TIME_ZONE_ID_UNKNOWN then it returns standard timezone name, otherwise UTC. Hey, MSDN says if the timezone doesn't have summer time then this return code would be returned! That's why if it's in Japan, it returned always UTC. Careless (or stupid) mistake.

Even though I'm using CI services for Linux, Windows and OS X, this type of physical location related bugs are really hard to find. (Not even sure if I can change system timezone per build on the services) I've just seen importance of feedback in a fresh light.

08 Oct 2015
typo fix. the light wasn't from jewel meat...