時の羅針盤＠blog

2013-11-30

SSHクライアントを実装したった

そろそろ(自分のために)要るなぁと思っていたので「えいや！」と作った。と書くとちょっと自分がすごいことしてる感じが出るが、要するにRFCに書かれていることを地道に実装しただけである。ちなみに全ての要求はまだ満たしていない。

以下のように使える。

(import (rfc ssh))

(define transport (make-client-ssh-transport "localhost" "22"))
(define user "guest1")
(define pass "pass1")

(ssh-authenticate transport +ssh-auth-method-password+ user pass)
(let-values (((status response) (ssh-execute-command transport "ls -l")))
  (print (utf8->string response)))

ssh-execute-commandはコマンドの戻り値と出力を受け取る。出力はバイナリなので適当に変換する必要がある。他にもShellを起動したりチャンネルを自前で開いたりするAPIがある。

まだまだおもちゃ程度の処理しかできない(ので、しばらくはドキュメントに載らないｗ)。そして、本当にほしいのはSFTPだったりするので次はsubsystemセッションの確立とSFTPの実装かね。

以下は適当な情報
ソースはsitelib/rfc/sshディレクトリ以下。あまりの汚さに精神汚染を起こしても当方は責任を取らない(多分未来の自分に対する警告・・・)。これを実装するために以下の機能及びライブラリが追加された。

(binary data)ライブラリ(自分で言うのもなんだけど、すごく便利ｗ)
DSA鍵署名及び検証

2013-11-20

Binary data structure read/write library

Currently I'm implementing SSH (for now client only) on Sagittarius and have noticed it would be convenient to have a library which handles binary data structure read/write. So I've written (binary data) library. (not sure if the name should be '(binary structure)' or '(binary io)', or something else).

Here is the simple example;

;; The definition is from RFC 4250-4254
;; atom datum
(define-simple-datum-define define-ssh-type read-message write-message)
(define-ssh-type <name-list> (<ssh-type>)
  names '()
  (lambda (in)
    (let* ((len (get-unpack in "!L"))
           (names (get-bytevector-n in len)))
      (string-split (utf8->string names) #/,/)))
  (lambda (out names)
    (let ((names (string->utf8 (string-join names ","))))
      (put-bytevector out (pack "!L" (bytevector-length names)))
      (put-bytevector out names)))
  :parent-metaclass <ssh-type-meta>)

;; composite data
(define-composite-data-define define-ssh-message read-message write-message)
(define-ssh-message <ssh-msg-keyinit> (<ssh-message>)
  ((type   :byte +ssh-msg-kexinit+)
   (cookie (:byte 16)) ;; array of byte
   (kex-algorithms <name-list>)
   (server-host-key-algorithms <name-list>)
   (encryption-algorithms-client-to-server <name-list>)
   (encryption-algorithms-server-to-client <name-list>)
   (mac-algorithms-client-to-server <name-list>)
   (mac-algorithms-server-to-client <name-list>)
   (compression-algorithms-client-to-server <name-list> (name-list "none"))
   (compression-algorithms-server-to-client <name-list> (name-list "none"))
   (language-client-to-server <name-list> (name-list))
   (language-server-to-client <name-list> (name-list))
   (first-kex-packat-follows :boolean #f)
   (reserved :uint32 0)))

So the idea of the library is that structured data are either simple datum or composite of simple datum. Thus if we define how to read/write the simple datum, then composite data's read/write are already decided. This might not be always true but as far as I know most of the case.

BTW, I think the naming of the macro is ugly so if you have a better suggestion it's very welcome :)

2013-11-19

マクロバグリターンズ

えらく久しぶりに発見した気分ではある。二つあって、一つは(とりあえずやっつけで)片付けたのだが、もう一つに苦戦している。

問題となるのは以下のようなコード。

(import (rnrs))
(define-syntax renaming-test
  (syntax-rules ()
    ((_ var val)
     (begin
       (define dummy val)
       (define (var) dummy)))))
(define dummy #f)
(renaming-test a 'a)
(print (a))
(print dummy)

まぁ、見れば分かるとおり、最後のdummyは#fを返してほしいのだがaを返してくるというバグである。要するにリネームが上手いこといっていないのである。

現状ではリネームは展開時にのみ行われているのだが、パターンのコンパイル時にどこにも束縛されていない識別子はリネームしてしまっていいのではないか？という気がしている。上記の例なら、パターン変数であるvarとval、束縛されている_、begin及びdefineはリネームするとまずいのだが、残り(dummy)はリネームしてもマクロ外にもれることはないわけなのだから(むしろ漏れるとまずい)。ちょっとそんな感じでやってみるかね。あぁ、だめだ。それだと以下のようなパターンで困る。

(let ((dummy #f)
      (hoge #t))
  (define (print . args) (for-each display args) (newline))
  (let-syntax
      ((renaming-test (lambda (x)
                        (syntax-case x ()
                          ((_ var val)
                           #'(begin
                               (define dummy val)
                               (define (var) dummy)
                               (display hoge) (newline)))))))
    (renaming-test a 'a))
  (print (a))
  (print dummy))

これだと、dummyはリネームされてほしいけど、hogeは変更されたくない。ただ、このパターンってマクロが構文を知ってないとどうしようもないような。違うかな？dummyとhogeが意味的に違うってのを構文の情報なしにどう知ればいいんだ？

2013-11-15

セルフホスティング

現状でもSagittariusはほぼセルフホスティングしているのだが、もう少し発展させたものにしたいなぁと思ってきたのと、微妙な問題点に気づいたのでメモ。

0.4.11までは(実は0.4.11は試験的に違うが)Schemeで書かれたVM上でコンパイラをコンパイルしてCのコードを生成していたのだが、これだとVMのコードを変更するたびにC側とScheme側の両方を変更しなければならなくて正直面倒だった。そこで、とりあえずの下地として、コンパイルされたコードをCに変換するライブラリを0.4.11では導入した。

とまぁ、Sagittariusが変なことをしていない処理系だったらこれで話は終わるんだけど、実は変なことをしている処理系なのでここで話が終わらないことに気づいたのだ。SagittariusのコンパイラはVMが使用するフレームのワード数を知っていて(VMから取るんだけど)、コンパイル時に余計な環境の束縛を行わないようにしている(多分以前そうしたっていう記事書いた)。これが問題になる。ちょっといい例が思いつかなかったので微妙な例だが、こんなの。

(disasm (lambda (x)
          (let ((y (get x z))) 
            (print (let ((w (get y z)))
                     (get w (let ((e (get x)))
                              (get e x))))))))
;; size: 40
;;    0: FRAME 6
;;    2: LREF_PUSH(0)
;;    3: GREF_PUSH #<identifier z#user (0x80501990)>; z
;;    5: GREF_CALL(2) #<identifier get#user (x805019d8)>; (get x z)
;;    7: PUSH
;;    8: FRAME 6
;;   10: LREF_PUSH(1)
;;   11: GREF_PUSH #<identifier z#user (0x805018b8)>; z
;;   13: GREF_CALL(2) #<identifier get#user (0x80501900)>; (get y z)
;;   15: PUSH
;;   16: FRAME 18
;;   18: LREF_PUSH(2)
;;   19: FRAME 4
;;   21: LREF_PUSH(0)
;;   22: GREF_CALL(1) #<identifier get#user (0x80501810)>; (get x)
;;   24: PUSH
;;   25: FRAME 5
;;   27: LREF_PUSH(10) <-- !!! this !!!
;;   28: LREF_PUSH(0)
;;   29: GREF_CALL(2) #<identifier get#user (0x805017b0)>; (get e x)
;;   31: LEAVE(1)
;;   32: PUSH
;;   33: GREF_CALL(2) #<identifier get#user (0x80501870)>; (get w (let ((e (get x))) (get ...
;;   35: LEAVE(1)
;;   36: PUSH
;;   37: GREF_TAIL_CALL(1) #<identifier print#user (0x80501948)>; (print (let ((w (get y z))) (g ...
;;   39: RET

普通ならLREF_PUSH(10)というのはスタックに詰まれた変数の10番目をスタックに積むという意味なのだが、この場合は途中にあるフレームを考慮したら10番目になった変数の参照を意味している。なんでこんな風になっているかと言えば、まぁ歴史的理由が大きいのだが、Sagittariusには一つ外側の環境という概念が存在しないからである(その方がパフォーマンス的に有利だったから)。VMのスタックはプッシュとポップ以外では基本変更されないので、そこを(個人的には)上手く使った(と思っている)トリックである。

では、普通のセルフホスティングでは何が嬉しくないかといえば、コンパイラやビルトインライブラリにこういったケースが無いとは言い切れないため、先に計算されたオフセットがずれる可能性があるからである。となれば、解決策は一つで、ホストはまずターゲットコンパイラAをコンパイルしてそのコンパイラでもう一回コンパイルするというものだろう。Aは一つ前のVMインストラクションで構成されるが、吐き出すインストラクションはターゲットが必要とするものになるといった寸法である。多少回りくどいなぁとは思うが、仕組み上回避不可っぽいので諦めるしかないだろう。

とりあえず、メモとして記録。

2013-11-07

コンパイラマクロ

実は材料は最初からあったんだけど、気が向かなかったのと必要に迫られるほどタイトな性能を要求してなかったので放置してたものの一つ。っが、気が向いたのでえいや！っと作ることにした。まぁ、気が向いた理由は2chでRacketとChickenはあるという話を見たからなのだが・・・

とりあえず、以下の様に使える。

(import (rnrs) (core inline))
;; map is defined in (core base)
(define-inliner map (core base)
  ((_ p arg)
   (let ((proc p))
     (let loop ((l arg) (r '()))
       (if (null? l)
           (reverse! r)
           (loop (cdr l) (cons (proc (car l)) r)))))))

手続き名とそれが定義されているライブラリを指定し、実際の展開部分はsyntax-rulesのようなパターンマッチで記述する。っで、比較のためにある版とない版のコンパイル結果がこれ。

;; あり
(disasm (lambda (x) (map values '(1 2 3 4 5))))
;; size: 26
;;    0: GREF_PUSH #<identifier user#values x80414678>; values
;;    2: CONST_PUSH (1 2 3 4 5)
;;    4: CONST_PUSH ()
;;    6: LREF(2)
;;    7: BNNULL 5                  ; (if (null? l) (reverse! r) (lo ...
;;    9: LREF_PUSH(3)
;;   10: GREF_TAIL_CALL(1) #<identifier reverse!#user x804146d8>; (reverse! r)
;;   12: RET
;;   13: LREF_CDR_PUSH(2)
;;   14: FRAME 4
;;   16: LREF_CAR_PUSH(2)
;;   17: LREF(1)
;;   18: CALL(1)
;;   19: PUSH
;;   20: LREF(3)
;;   21: CONS_PUSH
;;   22: SHIFTJ(2 2)
;;   23: JUMP -18
;;   25: RET

;; なし
;; size: 7
;;    0: GREF_PUSH #<identifier values#user x802ba300>; values
;;    2: CONST_PUSH (1 2 3 4 5)
;;    4: GREF_TAIL_CALL(2) #<identifier map#user x802ba330>; (map values '(1 2 3 4 5))
;;    6: RET

インライン展開されていることが分かる。実際に効果があるか、といわれるとなくは無いがベンチマークレベルで多用しないと目に見えないレベル、の効果だったりする。

これだと高階関数を使用する手続きのインライン展開にしか使えず、定数畳込みはできない。実はもう一段低レベルのマクロがあってdefine-inlinerはそれのラッパーなのだけど、外に見えるようにはしていない。理由は今一APIが気に入らないからだったりする。低レベルのAPIの方が設計が難しい気がしないでもない・・・

2013-10-28

How to write portable code on R7RS

There was a discussion (or rather question about) 'cond-expand'. I was also wondering about 'cond-expand' why it has 'library' form even though it can't help to write portable script (not a library).

Since draft 8 (or 9?), R7RS dropped 'import' syntax from (scheme base) which means users can't write the code like following;

(import (scheme base) (scheme write))

(cond-expand
 ((library (srfi :1))
  (import (srfi :1)))
 ((library (srfi 1))
  (import (srfi 1))))

(define (print . args) (for-each display args) (newline))
(print (iota 10 1))

Interestingly, this works most of the R7RS implementation (well, I only know Chibi and Sagittarius :-P) and Gauche (probably next release supports R7RS). However it's still not portable since the *correct* behaviour should be an error.

Then what is the proper way to make this portable? The answer is simple, just write the stub library like this;

;; somewhere load path whare your favourite implementation can search.
(define-library (srfi-1)
  (export iota)
  (cond-expand
   ((library (srfi :1))
    (import (srfi :1)))
   ((library (srfi 1))
    (import (srfi 1)))
   (else
    ;; To make code absolutely portable
    ;; you need 'begin' :)
    (begin (define iota ...))
    ))
 )

For me, it's inconvenient so I will probably not write strictly portable code on R7RS. Even WG1 member is asking to implement it the way how Chibi is doing now (not sure if this is about what I'm talking about though);

> As far as I can tell, there is no way in a program to use cond-expand to
> control what libraries get imported.

That appears to be correct. I consider that an oversight on the WG's part.
Chibi actually supports this, and I would urge you to support it too.
(from http://lists.scheme-reports.org/pipermail/scheme-reports/2013-October/003802.html)

Even though it's inconvenient, however, R7RS at least provides a way to write portable library which R6RS doesn't. For this perspective, it's not so bad (well, still I don't understand why it dropped 'import' and asking to go non-standard way even if it would be de-facto. If they think it's an oversight then they should put it on errata).

2013-10-24

Port position for transcoded port

I'm planing to support port-position and set-port-position! for transcoded textual ports and checked some major R6RS implementation how they act. The result was rather interesting.

First of all, I write the implementations I checked and its result. (I don't put Sagittarius because it's obviously not supporting it yet :-P)

Petite Chez 8.4

#t
#t
3
�

Larceny 0.9.7

#t
#t
1

Error: no handler for exception #<record &compound-condition>
Compound condition has these components:
#<record &assertion>
#<record &who>
    who : set-port-position!
#<record &message>
    message : "position not obtained from port-position"
#<record &irritants>
    irritants : (#<INPUT PORT test.txt> 2)

Terminating program execution.

Mosh 0.2.7

#f
#f

Ypsilon 0.9.6-update3

#t
#t
3
�

Racket 5.2.1

#t
#t
3
�

The test code;

(import (rnrs))

(call-with-input-file "test.txt"
  (lambda (p)
    (display (port-has-port-position? p)) (newline)
    (display (port-has-set-port-position!? p)) (newline)
    (when (port-has-port-position? p)
      (get-char p)
      (display (port-position p)) (newline)
      (set-port-position! p 2)
      (display (get-char p)) (newline))))
#|
test.txt (UTF-8)
あいうえおかきくけこ
|#

Mosh doesn't support the port-position so test was skipped.Except Larceny, the other implementations simply set the position of underlying binary port. So it returned the invalid character. On the other hand, Larceny is checking the position of textual port and if it mismatches then raises an error. (although, if I change the setting position to 0, then it reads an invalid character, so seems not really working.)

I'm not sure which is the expected behaviour but at least the way Chez, Ypsilon and Racket are doing is easy enough to implement.

2013-10-23

DBM用インターフェース

こんな意見をいただいた。

個人的には Sagittarius に欲しいライブラリは gdbm とかかな。リレーショナルデータベースはちょっとしたスクリプトには豪華すぎる。 sqlite があるから気軽な用途にも使えるんだろうけど私は SQL とかようわからんし。
— (32) 齊藤敦志 (@SaitoAtsushi) October 18, 2013

GDBMは外部ライブラリが必要なのでWindows対応しづらいという点から直接のサポートは別の方法を取るとして、とりあえずインターフェースを作った。APIはGauche互換(というかほぼ流用)で、こんな感じで使える。

(import (rnrs) (dbm) (clos user))

(define-constant +dumb-db-file+ "dumb.db")

(define dumb-class (dbm-type->class 'dumb))

(let ((dumb-dbm (dbm-open dumb-class :path +dumb-db-file+
                          :key-convert #t :value-convert #t)))
  (dbm-put! dumb-dbm 'key1 #t)
  (dbm-get dumb-dbm 'key1)
  (dbm-close dumb-dbm))

Sagittarius本体でサポートするのはPythonのdbm.dumbに影響を受けた(dbm dumb)。ひょっとしたらGaucheのfsdbmみたいなのも入れるかもしれないが、当面は予定がない(訳：自分が使わない)

どうでもいい情報としては、DBMが開いてるかとかのチェックをCLOSの:beforeでやってること辺りか。わざわざcall-next-method呼ばなくてもいいので便利である。

2013-10-18

Enbug

Even though 0.4.10 has just been released today I found a critical (caused SEGV) bug.... ;-(

The code is like this;

(import (rnrs))
(define save #f)
(let* ([p (make-custom-binary-input/output-port
    "custom in"
    (lambda (bv start end)
      (bytevector-u8-set! bv start 7)
      (set! save bv)
      1)
    (lambda (bv start end)
      1)
    #f #f #f)])
  (put-u8 p 10)
  (flush-output-port p)
  (get-u8 p)
  (close-port p))
(print "SEGV!!")
(print save)

I've never seen such use case however SEGV is worse than unexpected result (it is unexpected but you know...). I know exactly why this happens and how to resolve this. The reason why I'm writing this is making this for my admonition.

The reason why this happens is because it's using stack allocated bytevector for *invalid* performance optimisation. I was so eager to make Sagittarius uses less memory so did this. However once C code calls Scheme code then there are always possibilities that the passed value would be saved out side of the scope. This is the typical case.

I just need to say this to myself, DON'T BE LESS CONSIDERED!!!

Sagittarius Scheme 0.4.10 リリース

Sagittarius Scheme 0.4.10がリリースされました。今回のリリースはメンテナンスリリースです。

修正された不具合

set-port-position!がファイルポートに対して正しく動作しない不具合が修正されました
(- 0 )が常に負の整数を返す不具合が修正されました
(least-fixnum)が返す値をリーダーが巨大数読む不具合が修正されました
-8388609が8388607として読まれる不具合が修正されました
bitwise-xorに負数を与えると不正な結果を返す不具合が修正されました
(bitwise-arithmetic-shift 0 65)が巨大数の0を返す不具合が修正されました
bitwise-arithmetic-shift-rightに64ビット環境で巨大な32ビットに収まらない値を渡した際に不正な値を返す不具合が修正されました
fxdiv0-and-mod0が特定の値に対して不正な値を返す不具合が修正されました
fxbit-set?がR6RSの正誤表にある動作をするように修正されました
open-inflating-input-portに小さなバッファを指定して展開を行うとエラーが投げられる不具合が修正されました
file-executable?がWindows環境でSEGVを起こす不具合が修正されました
file-stat-atime、 file-stat-ctime及びfile-stat-mtimeがWindows環境でPOSIX時間のナノ秒を返さない不具合が修正されました
copy-directory*がトップディレクトリにあるファイルを正しく処理しない不具合が修正されました
url-server&pathが返すパスの先頭に//がつけられている不具合が修正されました

改善点

メモリ使用量が少なくなりました
ビルドプロセス時のBoehm GC及びlibffi探索に可能であればpkg-configを使用するようになりました
RSA鍵の比較が可能になりました
RSA鍵のimport-public-key及びimport-private-keyがバイトベクタを受け付けるようになりました
parse-pemにユーザがオブジェクトの構築を指定可能にする:builderキーワードが追加されました
with-argsマクロにオプショナル変数を指定した際に、リストにない引数が渡されてもエラーを投げずにその変数にパックするようになりました

新たに追加された機能

file->bytevectorが(util file)に追加されました
ジェネリックな書庫ライブラリ(archive)が追加されました
Zipファイルを操作するライブラリ(archive core zip)が追加されました
TARファイルを操作するライブラリ(archive core tar)が追加されました
GZIPライブラリ(rfc gzip)が追加されました

新たに追加されたドキュメント

(getopt)のドキュメントが追加されました

2013-10-11

ジェネリックな書庫ライブラリ

一つ前の投稿のやつだけど、早速作ってみた。どうせ作るような気がしたので、なら早い方がいいだろうというだけの理由。

とりあえずこんな感じで使える。

(import (rnrs)
        (srfi :26)
        (archive))

;; for this example it's tar
(define-constant file "test.tar")
(when (file-exists? file)
  (delete-file file))

;; use tar. for zip then 'zip.
(define type 'tar)

(call-with-output-file file
  (lambda (out)
    (call-with-archive-output type out
      (lambda (zip-out)
        (append-entry! zip-out (create-entry zip-out "test.scm"))
        (append-entry! zip-out (create-entry zip-out "test-lib/bar.scm")))))
  :transcoder #f)
        

(call-with-input-file file
  (lambda (in)
    (call-with-archive-input type in
      (lambda (zip-in)
        (do ((e (next-entry! zip-in) (next-entry! zip-in)))
            ((not e) #t)
          (print (archive-entry-name e))
          (unless (string=? "test.scm" (archive-entry-name e))
            (print (utf8->string 
                    (call-with-bytevector-output-port
                     (cut extract-entry e <>)))))))))
  :transcoder #f)

書庫を作る際は現在のところファイル名を受け付けるが、展開する際はポートに吐き出すようになっている。これは、書庫のフォーマットによって必要な情報が異なるので。実際に使い出して必要そうなら多分キーワード引数で指定するとかするようにするかもしれない。当面要りそうなのは展開部分なので、とりあえずといった感じ。next-entry!が末尾に来た際に#fを返すべきかEOFを返す返すべきかは悩みどころではあるが、#fの方が後々楽じゃないかなぁとは思っている。まぁ、好みだろう。

仕組みはDBIと似ていて、書庫のタイプごとに（archive $type)ライブラリが定義されている。現状ではtarとzipのみ(RARとかLZHとか誰か書いてくれないかなぁ・・・)。後はテストとドキュメントの整備か。テストとか使用感の関係でひょっとしたら次のバージョンでは明文化しないかもしれないが・・・リリース来週だし・・・

2013-10-10

書庫と圧縮ライブラリ

日本語で書くとちょっとかっこいいｗ

一つ前の記事で書いたけど、パッケージシステムがあるといいよなぁと思い始めたのでその準備段階として書庫と圧縮展開ライブラリの増強をすることにした。とりあえずzip、tarとgzipを追加。それぞれ、(archive core zip)、(archive core tar)と(rfc gzip)という感じになっている。書庫ライブラリにcoreと付いているのはこの上にジェネリックなインターフェースを構築してやろうかなぁと目論んでいるため。っが、書いてて要らないかもと思っていたりもしているので、実装されるかは目下のところ微妙(多分する)。

とりあえず、以下は簡単な使い方。
まずはtarとgzip

(import (rnrs)
        (srfi :26)
        (archive core tar)
        (rfc gzip))

(define-constant gzip-file "test.tar.gz")
(when (file-exists? gzip-file)
  (delete-file gzip-file))

;; archive and compress
(call-with-output-file gzip-file
  (lambda (out)
    (let ((h (make-gzip-header-from-file gzip-file :comment "comment")))
      (call-with-port (open-gzip-output-port out :header h)
        (lambda (gout)
          (append-file gout "test.scm")
          (append-file gout "dir/test.zip")))))
  :transcoder #f)

;; expand
(call-with-input-file gzip-file
  (lambda (in)
    (call-with-port (open-gzip-input-port in)
      (cut extract-all-files <> :overwrite #t)))
  :transcoder #f)

gzipな出力ポートを開いてそこに追加していくという方式でtar.gzができる。tarはシーケンシャルアクセスなので割りと直感的な操作でかなり楽に行ける。make-gzip-header-from-fileは特に呼ばなくてもよくて、その際はheaderキーワードを削除すればよい。指定しない場合はzlibのウィンドウビット16以上(31を使用)のオプションを利用して空のGZIPヘッダが付くようになる。

tarは現状のところUSTARフォーマットのみをサポートしているので、ファイル名は最大で255バイトまでになる。(prefixフィールドを使用している。コマンドでも展開できるから正しいよね、多分。)

次はzip

(import (rnrs)
        (srfi :26)
        (archive core zip))

(define-constant zip-file "test.zip")
(when (file-exists? zip-file)
  (delete-file zip-file))

;; archive
(call-with-output-file zip-file
  (lambda (out)
    (let ((centrals (map (cut append-file out <>)
                         '("test.scm" "dir/test.zip"))))
      (append-central-directory out centrals)))
  :transcoder #f)

;; expand
(call-with-input-file zip-file
  (cut extract-all-files <> :overwrite #t)
  :transcoder #f)

zipはランダムアクセス可能という特性があるのだが、それを可能にしているのは末尾に付いた情報なので、ファイルを追加するたびに生成される情報を保持して最後に足してやる必要がある。tarとの違いを意識しなくてもいいようにジェネリックなインターフェースを作って操作を統一したいというのがあるので、多分作られる。

あとRAR辺りを作ったら何でも来いになる感じがある。あぁ、LZHとかもあったな。多分サポートされないけど・・・(^^;

2013-10-06

Archive APIs

I'm thinking to make a package system for Sagittarius. For this, I first need archive APIs such as zip and tar. To make things easy for later, I want interface and implementation separated like DBI.

I'm more like go and getter type so making flexible interface from the beginning is not my strong point. So I've checking the other archive library implementations. So far, I've found Java's ZipInputStream/ZipOutputStream seems fine. So current idea would be like this;

;; Interface
(define-class <archive-entry> ()
  ;; might need more
  ((name)
   (size)
   (type)))
;; might not needed
(define-class <archive-input> ()
   ((port)))

(define-class <archive-output> ()
   ((port)))
;; for input
(define-generic next-entry) ;; get entry
;; this should be default method implementation
;; so that subclass can extend.
(define (read-contents input entry)
  ;; returns bytevector read from port.
  ...)

;; for output
(define-generic add-entry)

If it's possible then it's better to dispatch port type however right now the only way to make custom port is R6RS way and it's always makes the same class object which can't be used for CLOS method dispatcher.

Right now, I need only expander so first try to implement both tar and zip expander I guess.

2013-09-30

省メモリ計画(スタック編)

自分でも次があるとは予想していなかったｗ

ふとシャワーを浴びながら気づいたのだが、Sagittariusではほぼ全てのSchemeオブジェクトがヒープに作られる。しかし、多くのオブジェクトは使用後すぐに不必要になりGCを待つことになる。これはパフォーマンス的にはメモリ使用量的にも嬉しくない。

なぜこのような作りになっているかと言えば、単なる手抜きである。しかし、初期化用のC APIを用意してやればいいような気がしてきた。全てのオブジェクトに対して用意するのは手間だが、とりあえず頻繁に使用されるかつ使用後すぐに破棄されるようなものだけに絞ってやればそこそこ成果が出るのではないかと思っている。

では何が上記の条件を満たすようなものなのか？実は既にいくつか候補があって、トランスコーダ、ポート、ハッシュテーブル辺りがとりあえずのやってみると効果がありそうなものとなっている。なぜか？ポートはキャッシュを読み込む際に問答無用で作られるので、たとえば(rnrs)を読み込むと20個以上のオブジェクトが作られることになる。トランスコーダはUTF8からUTF32に変換する際に作られているうえに、この処理はreadで常に起きている。ハッシュテーブルはいたるところで共有オブジェクト等の保持に使われていて、たとえば1つのキャッシュファイルの読み込みに2つ使用されるなどかなり使われている。

とりあえず実装してみて効果があれば順次足していくという方向で行くことにする。

2013-09-28

Why Sagittarius?

While ago, I saw a comment on Reddit which said 'why Sagittarius?'. At that time, I've just thought 'why not?' and 'then why Racket or other implementations?'. Now I still don't have that much reason but at least there should be a good reason to be chosen! So trying to convince people who are looking for a good reason to switch or start trying new Scheme implementation.

N reasons why you should use Sagittarius.

1. Multiplatform
Sagittarius is supported on number of platforms, such as Windows, Cygwin, Linux, QNX, FreeBSD*1 and Mac OS X. A lot of implementations don't have native Windows binary support. So if you need to run Scheme on Windows, then this is one of your choice!

2. Architecture independent FFI support
Sagittarius is using libffi for its FFI so most of platforms can use FFI.

3. MIT license
GPL or LGPL is not suitable for a lot of situation (say Racket is LGPL).

4. Short term release cycle
Currently Sagittarius is being released once a month so if you find a bug, you would get the fixed version next month.

5. Both R6RS and R7RS are supported
As far as I know, Sagittarius is the only implementation which supports both R6RS and R7RS (currently).

6. Reader macro and replaceable reader
If you don't like S-expression but like Scheme or want to use some of libraries written in Scheme (or Sagittarius' builtin libraries), this would be your choice!

You will probably find more reasons once you start using it. So why don't you give it a shot?

*1: Current FreeBSD ports has Boehm GC 7.1 however this version doesn't work on it. 7.2 or later needs to be installed manually.

2013-09-24

省メモリ計画(ポート編)

多分ポート以外にはないと思うけど・・・あってもコーデックくらいかね、次・・・

SagittariusはCで書かれているのだがポート周りはカスタムポートとかの兼ね合いから可能な限り柔軟にしようと思いオブジェクト指向風に書かれている。単に構造体が自身を操作するための関数ポインタを持っているだけだが。

先日ポート位置の不具合を直した際に4つほど関数ポインタを足したのだが、現状のポートのサイズが不安なことになってきた。現在のHEADでポートのサイズは72バイト(32ビット環境)。単なるオブジェクトとしてはちとデカイ気がする。しかも、これは実装部分を除いた共通インターフェースだけなので、実際には100バイトを超えてくる。(バイナリポートが88バイトと一番でかい・・・)。

これがC++ならクラスが持つメンバー関数のサイズなんて気にしなくてもいいのだが、そこはないものねだりになる。そこで、とりあえず何かいい案はないかなぁと考えたのだが、ポートの構造体に仮想テーブル的なポインタを持たせて実際の操作用関数は静的領域に確保してしまうというのはどうだろうか？問題になるのは関数へのアクセス部分になるが、マクロにするか、関数呼び出しにしてしまうか悩んでいる。マクロにすると仮想テーブルを外に出さないといけないが、関数呼び出しにしてしまうとパフォーマンスが心配になる。

とりあえず書いてみてから悩むことにする。

パッケージとか

Sagittariusは自分が仕事で使うために書き始めた処理系なのだが、ありがたいことにいろんな方からバグ報告をいただいたりパッチをいただいたりしてきた。宣伝とか名前を広める系の活動が苦手で、数回comp.lang.schemeに投稿したのを除けばほとんど何もしていないし、自分でも「自分が最大のユーザー」であればいいかとか思ってて特に気にしていなかった。

ところがここ数日(本当に2,3日)でArch LinuxのAUR(Arch linux User Repository)に乗ったり、Mac OS XのHomebrewに乗りそうだったりと、4年前(公開したのは2年半前)には想像もしていなかったことが起きた。どちらの環境も持っていないので(Arch Linuxはインストールに挫折した・・・)、その意味でもありがたいことだ。

こうして本格的に自分以外の人の手によって広まっていくのを(広まってない？)目の当たりにしていると、雛が巣立っていくような感覚を覚えて嬉しいような寂しいような感じである。もちろん自分が最大のユーザーであることは多分変わらないだろうし(譲る気もないｗ)、開発自体を誰かに譲るとかも考えていないのだが・・・

そういえば、開発者MLってないの？って聞かれたのだが、あった方がいいのだろうか？登録者自分だけっていうことになりそうな雰囲気ありまくりだし、作ったことすらないのでどうすればいいのかさえ知らないけどｗ

2013-09-13

訛り

(元ネタ Island Life - 訛りとか)
本題とは関係ない部分でなんとなく引っかかっていたのがなんとなく分かった気がしたのでつらつらと書いてみることにした。

オランダでは英語はほぼ全ての人にとって第二外国語であるといえる。もちろんイギリス人もいるし、アメリカ人やカナダ人、オーストラリア人だって住んでいるので例外もままある。そうすると、まぁ大抵の人がしゃべる英語は訛っているわけだ。

もちろん個人差はあるし、中にはすごく綺麗(典型的)な英国訛りでしゃべる人もいれば、がちがちのインディアン訛りで何言ってるのか理解するのに苦労するなんて人もいる。(インド人を例にあげたのに特に理由はない。別にスペインでもフランスでもいい。ロマンス系言語訛りはひどく聞きづらいし。)

僕の職場はかなり多国籍で東は日本(俺だよ！)から西はカナダや南米とまぁ多種多用である(最近は中東出身の同僚が増えてきた感もあるが。) 会社自体はそんなに大きくないので、人は少ないのだが、これだけ種類があるとまぁ慣れる。自分の英語の発音がどれほど訛っているのかというのは実に知りようがなくて(だれも指摘しないし)、少なくとも意思疎通は問題なくできるレベルではあると思う。ただ、確実にいえるのは、いい悪いは別にして、昔と思うと発音が大分変わったなぁということ。Tomatoがトメイトからトマートになったとか、そんなレベルではあるが。

本題に絡みそうなところに無理やり戻すと、北米(特にアメリカ)出身の人は特に訛りや別名に対して非寛容であるというのが経験から学んだこと。たとえば、aubergineとかcourgetteはほぼ通じない。ついでに同じwaterでも「ウォータ」では通じない(飛行機内で通じなかった経験あり)。個人的にPGが対象にしているのはおそらく米国内のみの話じゃないかなぁと思ったりはする。あの国ほどに言語に非寛容な国は後フランスぐらいしか知らん。

2013-09-08

SRFI 110

Today the SRFI 110 will be the final state (according to the latest ML topic). So for the celebration, I have supported it on Sagittarius.

Basically I have only added compatible layer to the reference implementation and run some scripts. There is one exception which is not satisfying requirement. That is, again the same as previous curly-infix SRFI, hash-bang directive. I don't have any intention to support these SRFIs in core Sagittarius so it's always uses reader macro or replaceable reader. So to use this SRFI on Sagittarius, it must look like this;

;; #!sweet for compatibility
#!sweet
#!reader=sweet

define factorial(n)
  if {n <= 1}
     1
     {n * factorial{n - 1}}

print(factorial(10))

define a 4

define g(x y)
  {x * y}

let <* x sqrt(a) *>
! g {x + 1} {x - 1}

It might not be a big issue as usual.

The good thing about this SRFI is that it can be used in real world. If you are familiar with Python or Ruby which I don't know much about, this might be a good alternative. So the next step for this would be an Emacs lisp to support this :)

Well, again (the same as SRFI 105) I don't think I use this SRFI but it's always good to have choices for users.

2013-08-28

TLSライブラリのバグ

WebSocketをTLSで使うためにSNIがサポートされてないといけないという話からバグを発見。元々は、wssでアクセスするとフリーズするという話だったのだが、これSNIが問題ではなく単にバグを踏んでいただけだったという話。

問題はTLSのレコードが複数のハンドシェイクメッセージを持つことが可能であることに起因する。現状の実装では1レコード1メッセージを期待していて、複数のメッセージが乗っていた場合2レコード目以降を捨ててしまうというバグである。(単にRFCの読みが甘かったという話でもある。)

これ結構大きめな問題で、設計からやり直しかねぇと思いながら15分くらい考えたらなんとなく隙間を縫っていけそうな解決案が思い浮んだのでメモ。現状レコードで運ばれてきたメッセージはとりあえず一度に全部取得し、その後先頭バイトを見てメッセージの振り分けをしている。ここで、アプリケーションメッセージ以外のメッセージは取得した内容をバイナリポートに変換して扱いやすくしている。

問題になるのは、変換したポートが空になるまで読んでいないことなのだ。1メッセージ読んだ後に1バイト先を見てやりEOFでないならセッションオブジェクトにでも保存しておけば次に読むのはメッセージであるということが分かるのでソケットにアクセスしにいって無限に待つということもなくなるのではないか、という案。レコードを読む部分の処理とセッションオブジェクトの変更のみで残りの部分は特に問題なくいける気がする。

とりあえず、明日試してみることにする。しかし、原因を追究するためにRFCを読み直したり、パケットログを取ったりといろいろやったが、ほぼ空回りというのが自分らしいというか・・・

Syntax highlighter