Let's start Scheme

2015-03-29

Emulate modular programming

It'll be long to put on reddit (mostly piece of code) and kinda pain in the ass to format code for reddit. So I've decided to write an article as an answer for this: R7RS modular programming struggle..

I'll answer the easier one first. The different behaviour is because the define-library is wrapped by begin. Larceny is using SRFI-78 (a.k.a van Tonder expander) and this expander creates a library at runtime. Thus, when the cond-expand is expanded, the library (dummy) is not created yet. Sagittarius, on the other hand, it creates a library at compile time. Thus, whenever define-library or library is found on the code, then the library is created at that time. So cond-expand can find the (dummy) during expansion. I guess, if you want to make the code work as expected, then removing begin should be enough.

NOTE: On R7RS (or even R6RS), the library definition can not be anywhere in program so the example code itself is not portable. For example, Chibi would raise an error if you write this on its REPL (which I think pretty much inconvenient).

NOTE: Removing begin works only on REPL because van Tonder's expander reads all expression first then expands it. Thus, at compile time there is no library created yet.

It is basically impossible to do the ML like modular programming on R7RS Scheme, if I understood the question correctly. But there is a technique that emulate it. The key is eval. It is easy to see on the code so code first.
;; without SRFI-39
(define-library (peano)
  (import (scheme base) (scheme eval))
  (export one add
          load-peano
          make-peano-impl)
  (begin
    (define-record-type peano-impl (make-peano-impl one add) 
      peano-impl?
      (one peano-impl-one)
      (add peano-impl-add))
    (define global-impl #f)
    (define (load-peano env)
      (let ((impl (eval '(load-peano-impl) env)))
        (set! global-impl impl)))
    (define (one) (peano-impl-one global-impl))
    (define (add a b) ((peano-impl-add global-impl) a b)))
  )


;; peano-numeral.sld
(define-library (peano-numeral)
  (import (scheme base) (peano))
  (export one add load-peano-impl)
  (begin 
    (define (load-peano-impl)
      (make-peano-impl 1 +))))

;; peano-symbol.sld
(define-library (peano-symbol)
  (import (scheme base) (peano))
  (export one add load-peano-impl)
  (begin 
    (define (load-peano-impl)
      (make-peano-impl
       '(s z)
       (lambda (x y)
         (if (eq? 'z x) y
             `(s ,(add (cadr x) y))))))))

(define-library (four)
  (import (scheme base) (peano))
  (export four)
  (begin
    (define (four)
      (let ((two (add (one) (one))))
        (add two two)))))

(import (scheme write) (scheme eval) (four) (peano))
(load-peano (environment '(peano-numeral)))
(four)
;; -> 4

(load-peano (environment '(peano-symbol)))
(four)
;; -> (s (s (s (s z))))
eval can take an environment so we can use that. Specifying which implementation should be used via load-peano by passing library name, and the (peano) sets the actual implementation to the global context. If the implementation supports SRFI-39, then using parameter would make it thread safe.

NOTE: If the implementation supports CL like method dispatch, then this can be much simpler. (Well, using TinyCLOS provides it and it wouldn't be so difficult to make some syntax sugar for that and provide it with R7RS library system. I just don't have motivation since Sagittarius already has it...)

NOTE: Parameter in R7RS doesn't specify when parameter object (or procedure) received one argument. As far as I know, all R7RS implementations behave the same as SRFI-39 but there might be an implementation that doesn't accept it in future.

NOTE: If the implementation supports identifier-syntax which is added on R6RS, then one can be defined like this:
(define-syntax one (identifier-syntax ((peano-impl-one global-impl))))
Then four can be like this:
(define-syntax four
  (identifier-syntax
   ((lambda ()
      (let ((two (add one one)))
 (add two two))))))
So you don't have to write four with parenthesis.

2015-03-27

R6RS protobuf

仕事でprotobufを使っている部分があることに気づき、そういえばR6RS用のライブラリがあったということを思い出す。これ:r6rs-protobuf

っで、早速使ってみたのだが、まともに動かない。プロジェクトも1年近く更新がないし、バグ報告するよりは自分で直した方が早いという結論にいたりforkする。これ:ktakashi/r6rs-protobuf

とりあえず直したバグと機能追加
  • /* */形式のコメントのサポート
  • トップレベルのoptionで例外を投げない(無視する)
  • ネストしたmessage/enumのサポート
  • レコードコンストラクタの引数違いの修正
  • generate-temporariesで作られた識別子からのシンボル生成
    • 処理系によってはポータブルじゃないシンボルが作られるので
  • (srfi :13)と(rnrs)でぶつかる束縛の修正
  • レコード型名の使用
    • Psyntax系の処理系だとマクロとして定義されている
    • record-type-descriptorマクロを使用するようにする
  • テストケースの修正
    • 不正なimport文
    • evalにdefine等を渡す
  • テストケースの追加
  • contrib/sagittarius/protoc-scmの追加
  • 処理系毎のpretty-printの追加(Mosh、Sagittarius、Vicareのみ)
プロジェクトページを見るとGuileでは動いているみたいなのだが、にわかには信じられない話であるというレベルのバグり具合だった。ものすごく単純なデータ構造のみなのかもしれない。

とりあえず、本家Googleのexampleにあるaddressbook.protoは動くようになった。一応開発者用MLに投げたので取り込まれるかもしれないし、僕のリポジトリが本流になるかもしれない(流行のゾンビOSS化してたらの話)予定。MLで即日返信が来てたので、近いうちに取り込まれると思われる。

個人的には(今のところ)関係ないのだが、(protobuf private)ライブラリだけはLGPLにしておいてほしかったなぁと思う。 Sagittarius用のライブラリを生成する際に、スタンドアローンでいけるオプションをつけたのだが、このライブラリがGPLv3なライセンスなので生成されるライブラリも(全部ひっくるめて一個のファイルにするので)必然的にGPLv3になってしまうという。個人的にProtobufを仕事用のスクリプト以外につかうことはないとは思うので(自分で何か作るならmsgpackとかあるし、自作のr6rs-msgpackはMIT互換の2項BSDライセンスなのでかなり自由だし)、どうでもいい話なのかもしれないが。

2015-03-26

Comparison of er-macro-transformer

In future, I don't know if it would be near or far yet, R7RS will have explicit renaming macro as its low level hygienic macro. So I thought it's nice to have some comparison. (though, I've just made one for now.)

There are some R7RS implementations which support er-macro-transformer. For now, I've used Sagittarius, Chibi, Gauche and Chicken (with R7RS egg). The main part that I was wondering for now is comparison of renamed symbols. So I made the following script to compare:
(import (scheme base) (scheme write) (scheme cxr))

(cond-expand
 (chibi (import (chibi)))
 (gauche (import (gauche base)))
 (sagittarius (import (sagittarius)))
 (chicken #t)
 (else (error "not supported")))

(define-syntax comp
  (er-macro-transformer
   (lambda (form rename compare)
     (define (print . args) (for-each display args) (newline))
     (let ((a1 (cadr form))
           (a2 (caddr form)))
       (print "eq?:               " (eq? a1 a2))
       (print "simple compare(0): " (compare a1 a2))
       (print "simple compare(1): " (compare a1 'a))
       (print "rename (eq?):      " (eq? (rename a1) a1))
       (print "rename (compare):  " (compare (rename a1) (rename a1)))
       ;; is a bound?
       a1))))

(define-syntax rename-it
  (er-macro-transformer
   (lambda (f r c) 
     (let ((cont (cadr f))
           (target (caddr f)))
       `(,@cont ,(r target))))))
(define-syntax rename-it2
  (er-macro-transformer
   (lambda (f r c) 
     (let ((cont (cadr f))
           (target (caddr f))
           (rest (cdddr f)))
       `(,cont ,(r target) ,@rest)))))

(guard (e (else (display "a is not bound") (newline)))
  (let ((a 1))
    (rename-it (rename-it2 comp a) a)
    (display "a is bound") (newline)))
It basically renames a symbol a and compare it in the macro.If you expose the renamed symbol, then implementations may raise an unbound error.
The result of above code is the following:
Chicken:
eq?:               #f
simple compare(0): #t
simple compare(1): #f
rename (eq?):      #f
rename (compare):  #t
a is not bound

Chibi:
eq?:               #f
simple compare(0): #t
simple compare(1): #f
rename (eq?):      #f
rename (compare):  #t
a is bound

Gauche:
eq?:               #f
simple compare(0): #t
simple compare(1): #f
rename (eq?):      #f
rename (compare):  #t
a is not bound

Sagittarius:
eq?:               #f
simple compare(0): #t
simple compare(1): #f
rename (eq?):      #t
rename (compare):  #t
a is not bound
In this case, Gauche and Chicken have the same behaviour. On Chibi, a is bound (which I think wrong behaviour since the macro rename-it is bound on global environment and referring the local environment. But there is no proper definition for explicit renaming yet, so who knows). Sagittarius doesn't rename if the given argument is not either a symbol or global defined identifier. I'm not sure if this is actually correct (feels not), but again there is no definition and R7RS mode is working properly with this behaviour. So I don't care much for now.
Updated on 29 March 2015: I've changed Sagittarius behaviour to rename (most of) identifiers as well so for this case it's similar to Chicken or Gauche.

During testing, I've got couple of bugs on Gauche and Chibi. Gauche's one was C level assertion error and Chibi's one was SEGV.

So far, there are not many difference among the implementations. Maybe I need to investigate more to find out (if I have time and motivation).

2015-03-25

R6RS compliant

Recently, I've improved R6RS compliance on Sagittarius. To check how much improved, I've run the test cases of nausicaa-oopp. Unfortunately, original Github repository seems gone somewhere but if you have it on your local machine, then checkout the revision '26c6994' which it still supported other R6RS implementations.

The library is R6RS portable yet so hard to run. As far as I know, there are only a few implementations which can run the test without raising an unexpected error (Racket and Vicare, I've tested). Now, it's time to add Sagittarius to the list.

Conclusion first, this is the result: https://gist.github.com/ktakashi/d6232d5ac7ae6a5d6fc1

To run the test, I need to patch some libraries. One is getenv and the other one is adding required argument otherwise Sagittarius compiler would raise an error during the compilation. There are couple of failures but these are either too specific test or enhancement of Sagittarius. The error of catch-syntax-violation can be consider test case's bug and some are Sagittarius enhancement. Sagittarius actually doesn't check duplicated bindings on let or letrec. Some are because of eval doesn't inherit current environment. (So the class <n> and <t> can't be seen). The test case itself expects specific value of &syntax's subform slot. It might be better to have some convension but R6RS doesn't specify this. The same goes catch-assertion test cases. The fixnum issue is that on 64 bit environment, Sagittarius uses 62 bits as fixnum (this seems the same on Vicare, so it fails as well).

I've also made an execution option that given script runs strict R6RS mode. Which basically reads all expression ahead and wrap with special syntax so that compiler can expand macro first and compile the expressions.
# Run the `script.scm` on strict R6RS mode
$ sagittarius -r6 script.scm
This mode is similar with the behaviour of Andre van Tonder's expander. Thus the script can contain library in it. Be careful, define-library can't be in it.

Now, I believe I can say, Sagittarius is also the R6RS implementation :) (unless otherwise there's a bug which is always the case.)

2015-03-23

Dynamic compilation (failure)

I don't have much problem with current performance of Sagittarius but it's always better to be faster. And I thought (yes, just thought), a good idea has come up. Well, conclusion first, it wasn't at all so if you want to read successful story, don't waste your time.

The idea was like this: if I can compile procedures to C, then it would be faster. Sounds fine, right? Of course there is always caveats:
  1. Calling Scheme procedure from C is expensive
    • I've learnt this from JIT experiment
  2. Macro can not be simply mapped to C procedure
  3. Which compiler should use for this?
2 and 3 are later stage issue, so I've checked item nr 1 first.

To check it, I've created a Gist: https://gist.github.com/ktakashi/8d9271600348db136877 There are 3 revisions but the most important ones are first and second one. The last one is just squeezing a bit of performance. The first revision is simply mapped procedure call to internal apply function. This is super slow as I expected. So I've switched to the second revision which uses CPS to do things. Well, as you can see it, it's not bad but not faster than pure Scheme. (NOTE: hashtable-map is implemented in Scheme world.)

Now, I didn't get satisfied result so I've also created very simple tail recursive fact in C. It's like this:
static SgObject fact(SgObject *SG_FP, int SG_ARGC, void *data_)
{
  SgObject m = SG_MAKE_INT(1), r = SG_MAKE_INT(1);
  SgObject n = SG_FP[0];
  
  while (TRUE) {
    if (Sg_NumEq(m, SG_FP[0])) {
      return Sg_Mul(m, r);
    } else {
      SgObject t1 = Sg_Add(m, SG_MAKE_INT(1));
      SgObject t2 = Sg_Mul(m, r);
      m = t1;
      r = t2;
    }
  }
  return SG_UNDEF;  /* dummy */
}
 
static SG_DEFINE_SUBR(fact__STUB, 1, 0, fact, SG_FALSE, NULL);
Tail recursive call can be a mere loop, so this must be much much much (times 100) faster than pure Scheme (this was my hope). So I've compared.
;; Scheme implementation
(define (fact n)
  (let loop ((m 1) (r 1))
    (if (= m n)
        (* m r)
        (loop (+ m 1) (* m r)))))

;; Expression to compare
(time (dotimes (i 10000) (fact 1000)))
Then I've got the incredible result!!
Scheme implementation

;;  (dotimes (i 10000) (fact 1000))
;;  0.000000 real    0.000000 user    0.000000 sys
C implementation

;;  (dotimes (i 10000) (fact 1000))
;;  4.692019 real    2.483000 user    1.622000 sys
Hooray!!! It's incomparable! WHAAAATTT???!!!

Well, if I just close my computer without evaluating this result, I would do the same thing in future. It's painful but I need to digest it... I guess there are couple of reasons but the biggest thing is that the VM is extremely turned especially those basic arithmetic operations. For example, addition of fixnum doesn't call C function but just do it on VM. If I disassemble the fact, then I can only see 21 VM instructions and only MUL uses C function call.
(disasm fact)

;; size: 21
;;    0: CONSTI_PUSH(1)
;;    1: CONSTI_PUSH(1)
;;    2: LREF_PUSH(1)
;;    3: LREF(0)
;;    4: BNNUME 5                  ; (if (= m n) (* m r) (loop (+ m ...
;;    6: LREF_PUSH(1)
;;    7: LREF(2)
;;    8: MUL
;;    9: RET
;;   10: LREF(1)
;;   11: ADDI(1)
;;   12: PUSH
;;   13: LREF_PUSH(1)
;;   14: LREF(2)
;;   15: MUL
;;   16: PUSH
;;   17: SHIFTJ(2 1)
;;   18: JUMP -17
;;   20: RET
The rest is simply done on VM. Thus, there is no overhead of calling C function.The C code version, on the other hand, it requires 3 C function calls for each iteration, plus inside of the C function. 3 times difference can be a huge difference (well, it actually is).

If I couldn't get any performance improved in this micro benchmark, then it wouldn't be any improvement in practice either. So, I just record this result and move forward...

2015-03-18

R6RS非準拠部分

Vicareの作者のブログを眺めていたらSagittariusでは(最初のリリースから)放置していた非互換を指摘されていた。こんなの。
;; from: http://marcomaggi.github.io/weblog/2015-February-16.html#fn-2
#!r6rs
(import (rnrs))

(define (fun)
  (mac))
(fun)
(define-syntax mac
  (syntax-rules ()
    ((_)
     (begin
       (display 123)
       (newline)
       (flush-output-port (current-output-port))))))
原因は分かっている。Sagittariusではスクリプトとして読まれたファイルは全て読まれた順に評価されるが、R6RSは「全て読み込んでマクロを展開してから実行しろ」という風に定めている。ちなみに0.6.2(か0.6.1からだっか)からはbeginで囲むと動く。ついでに、libraryフォームの中でも動く。このケースであればオプションを追加してそのオプションが指定された場合は一度ファイルを全部読み取って何かしらで包んでやればよいということになる。beginではまずいのでそれように何か足してやる必要はある。考えているのはAndre van Tonderの展開器で使われているprogramとか。

じゃあ、それを入れればいいじゃん?という話になるのだが、話はそんなに簡単でもない。現状ではトップレベルで定義されたdefine-syntaxのコンパイル及びトップレベルのマクロの展開しかしない。つまり、let(rec)-syntaxを用いてトップレベルに定義されるマクロは展開されないのである。つまり、これはbeginで包んでやっても動かない。
#!r6rs
(import (rnrs))
(define (fun)
  (mac))
(fun)
(let-syntax ()
  (define-syntax mac
    (syntax-rules ()
      ((_)
       (begin
         (display 123)
         (newline)
         (flush-output-port (current-output-port)))))))
これもlet(rec)-syntaxで定義されているマクロを評価してやればいいじゃん?と思うかもしれないが、話はそんなに簡単でもない。この例であれば動くのだが、例えば内側にdefineがあって、その中で局所マクロが使われていた場合に困る。

と、ここまで書いて案が浮かんできた。問題になるのは、局所マクロの評価をした後にdefineがコンパイルされない(すると問題になる)という部分だったのだが、ライブラリ及びトップレベルのbegin展開時に対応するコンパイル時環境を式に保持させればちょっとした手間でいけるような気がしてきた(現状よりさらにメモリを喰うようになるのが気になるが・・・)。ちょっと頑張ってみようかな。

New SRFI process proposal

Recently, one of the SRFI editors declared that he'll retire SRFI. When I read the post on c.l.s., it was shocking to me and on the same time I saw how much burden it was even he wasn't really a member of Scheme community for many years. Thank you so much for your service.

I think SRFI is one of the most important things for Scheme community. It can be pre-standard (see R6RS and R7RS-large). As a Scheme user, it is useful and convenient to write portable libraries. I can use it and just put a note that says this library requries SRFI-n. (well, if you aren't interested in writing portable libraries, then not that much. But I am.) So I hope it remains.

Needless to say, there are bunch of problems on the current SRFI process (not really the process but I use the word). Most of them are listed here. In short, it is heavily relied on only one person. If someone is willing to take over, then history might repeat again. So, in my opinion, we would need a new process.

How about using GitHub? GitHub has the ability to generate a web site. So finalised SRFIs can be located there. For new drafts, SRFI editors can create a new repositories for them and give the authors to commit right so that they can update their SRFI by themselves. The discussion can be done by issue based. All issues need to be closed before finalised. If mailing list is wanted, then  we can use Google group.

What can be the pros and cons for this? I can think of some.

Pros:
  • No maintenance needed for server machine
  • Reducing the task of SRFI editors
    • making repositories
    • finalising SRFIs
  • More open
  • Easy to hand over
Cons:
  • Too open?
  • Less control
  • Trusting SRFI authors too much?

In my opinion, the process should be outsourced as much as possible and amount of task for SRFI editors should be as less as possible.

Who should do it? Well, I would like to say 'me' but due to the personal reasons and luck of knowledge (especially domain transfer), it's kinda hard. (if I can, then earliest would be after October this year.) Plus, I'm not sure if this approach is acceptable for other Schemers. So for now, just thinking about it.

2015-03-05

Amazon面接記

1月の終わりから3月の初めまでの面接記。誰かの参考になるといいなぁ。

1月中旬、AmazonリクルータからLinkedIn経由でAmazon欧州採用イベントのメールを受け取る。このときは「まぁ、どうせ1000通以上送ったメールの一つだろう」くらいな気持ちで、興味があるという旨だけ3行くらいで返す。

同1月。そのリクルータの上司(?)からコーディングテストがあるから受けろといわれる。内容は、ストリームから文字列トークンの切り出し+αの問題と、配列操作+αの問題の2問。46時中プログラムを書いていればそんなに難しい問題ではないと思う。時間制限1時間(最長90分)の制限があった。2問目が難しいという脅しを受けていたのだが、40分で完了。(今イベント最速だったらしい) 今回は欧州採用イベントということで、コーディングテストが先だったらしい。よく知らん。

しばらく間があいて、すっかり忘れていたころにMixerイベントと面接の3月。メールの内容に寄れば2月中にアメリカから電話がかかってきて、面接でどんなことが聞かれるとかのヒントがあるとかないとかだったのだが、それはなかった。(これがなかったので、2月の間はマジで頭から抜けてたという話もある)

っで、面接。ここからは記憶がまだ新しいので多少詳しく書く。

面接は1対1、45分を4回。これを半日で行う。正直かなりの苦行だった。面接官は1日8回x3日なので頭が融けるんじゃないかな?ちなみに、面接官はTPM(Technical Programme Manager)かSDE(Software Development Engineer)のどちらか。内容はテクニカルなこと一辺倒。

最初の面接。データベースモデルの話。既存のテーブルモデルを拡張して新機能を入れたい。どんな風に設計する?という感じのもの。最初に大まかなアイデアを出して、面接官が突っ込んで、修正を入れてという形式だった。正直あれでよかったのかは自信ない。その後に今までのプロジェクトでイニシアチブを取って何かをしたことがあれば教えてくれみたいな質問。普通に答える。

次の面接。Amazonのシステムの一部がクラックされた上にチームの開発者も消された、しかもサーバのバックアップも燃えた(どんな状況だ?) 。そこで新たに雇われた君にそのシステムを設計してほしい。どのような手順でやるかを列挙した後に、細かい部分の説明をしてもらう。割と意味不明な状況だなぁと思いながら、要は上から下まで全体像を描けるかというものだと理解してそんな感じで答える。最後に本番用サーバ台数が不明だけど、どれくらい要る?という質問があったが、
毎秒のリクエストと1リクエストにかかる時間、さらに要求レスポンスから適当に割り出す。1リクエスト200msと適当に仮定してしまったのでサーバが1000台いるとかになって、流石にこれはありえんなぁと思いつつ。残りの時間は、締め切りとプロジェクト要求のプライオリティについての質問。まぁ、これも普通に答える。

3つ目の面接。コーディングテストその1。プロダクト、顧客及びその画面が見られた回数を保持するクラスのリストを、k個の顧客毎に最も多く参照されたプロダクトを保持するクラスのリストに変換するというお題(日本語むずい)。最初は優先度付きキューを使ってO(n*nlogn)で解こうとしたのだが、kはランダムなあんまり大きくない数字といわれたので、普通にバッファ持たせてO(nk)で解いた。リスト処理のお題をSchemerに出しては駄目だなという感じで、ささっと解法説明。んで、紙の上にコーディング(これに30分くらいかかった)。 終わった後に最初の面接と同じ質問。まぁ、定型文ですよね。

4つ目の面接。コーディングテストその2+システムデザイン。コーディングテストは配列の要素間の最大差を求める問題。愚直にやるとO(n^2)かかるけど、O(n)の解法を見つけられるかというもの。多少頭をひねってなんとかひねり出す(この段階では既に頭がまともに働かなかった)。システムデザインはある機能を実装するのにどんなコンポーネントを使ってどのようなシステムを構築するかというもの。正直こんなハイレベル(上流?)の設計はしたことがないのでしどろもどろ。正直にやったことねぇっすとゲロる。っで、2つ目の面接と同じ質問と今までで最も誇れる成果は何?という質問。仕事の成果なんて覚えていないので、Sagittariusのことをここぞとばかりに出してみた。興味を引いたらしいw

んで、結果。受かったw
それなりに経験積んでて、46時中プログラミングしてればいけるっぽい。正直なんで受かったのかは謎だが、こんな僕でもそこそこいけてるプログラマを自負してもいいだろうか?