時の羅針盤＠blog

2016-12-27

Anxiety

I thought I've written exactly the same blog post last year, but I guess I didn't. Maybe I was smart enough not to show this negative feeling.

It's almost the end of year. Every year, I've been thinking the same thing in this season and anxious. That is my skill, in one word. It might also be called my career path, knowledge or experience. I just don't know how I can describe what I'm worrying for.

Since 2004, I've been working as a software developer. There are couple of years of blank periods but most of my career is being Java developer, good or bad. When I started my career, I was just an ordinary newbie who had a bit of programming skill and assigned to the *deadline is already decided* project. Everything was new including somewhat deadline was there but no specification situation, and there were lots of things to learn, like how to capture screen shot and put it into excel sheet... At that moment, struts 1.0 was there and the concept of DI has just come. After 2 years, I quit the first job and became server administrator... sort of.

In year 2009, I've moved to The Netherlands. At that time, I had some blank as a Java developer. So I didn't know when CI is appeared. DI was still there but a bit brushed up. I could use Java annotation instead of writing huge application-context.xml. I still had lots of things to learn, however I just noticed that I was just learning the usage of libraries. If those libraries disappeared in 5 years, then this knowledge is just useless. Luckily, I've met Scheme and started writing own implementation to build some tools I can use on daily routine.

Writing a Scheme implementation gave me lots of challenges. I wasn't a good C programmer, well am still not, though. How function pointer works, how x86 works, how to retrieve stack area information, difference between POSIX and Windows, those were all new for me. I've never read C89 or C99 specification, so I will never be a C expert, but at least now I know where to look at if I need to find something.

If you write an implementation, then you also need to write libraries. Fortunately, or unfortunately, there's not much Scheme code. So I needed to write loads of code. If I see the code written in very first period of my Scheme life, those look ugly. I can't say I understand how Scheme works perfectly and I can write universal beautiful code. All what I know is how to write workable code and at least that works for me. Maybe I can say I also now the basic concept of paradigms which are used in Scheme programming, such as first class object and CPS.

7 years later, I'm still anxious. Maybe I feel like I know nothing essential. I know numbers of libraries and how those works. I know how to implement TLS or other network protocols. But are these essential? I only have vague idea what's essential knowledge is. It should be knowledge that I can use; even if I need to make software with a language I've never used; in the different IT field; after 20 years so on. And maybe I've already known that there's no such thing. And maybe the fact that I knew this reality makes me more anxious.

If I become a high skilled programmer, would this be gone? It might be better not to think about this and just study hard since I know I need to know more.

2016-12-25

OAuth

ちょっとしたメモ。

SagittariusはCommon Lispから移植したOAuth1.0を扱うライブラリを持っている。っが、このライブラリ将来OAuth2に対応させれるようにしたのかキーワード引数が大量にあり今一使い勝手が悪いい。っで、最近職場でOAuth2上に構築されたプロトコルの一つであるOpenID Connectを使うということが持ち上がっていることもあり、OAuth2のライブラリがいるかなぁという機運が高まってきている。

割と周知の事実だと思うが、OAuth1.0とOAuth2は互換性がない。RFC5849で定義されているOAuth1.0はRFC6749で定義されているOAuth2によって廃止されているのではあるが、名前の似た二つのプロトコルとしてみた方がよい。事実、RFC6749には以下の段落がある：

The OAuth 1.0 protocol ([RFC5849]), published as an informational document, was the result of a small ad hoc community effort. This Standards Track specification builds on the OAuth 1.0 deployment experience, as well as additional use cases and extensibility requirements gathered from the wider IETF community. The OAuth 2.0 protocol is not backward compatible with OAuth 1.0. The two versions may co-exist on the network, and implementations may choose to support both. However, it is the intention of this specification that new implementations support OAuth 2.0 as specified in this document and that OAuth 1.0 is used only to support existing deployments. The OAuth 2.0 protocol shares very few implementation details with the OAuth 1.0 protocol. Implementers familiar with OAuth 1.0 should approach this document without any assumptions as to its structure and details.

太字にした一文があるため、廃止されているとはいえ両方サポートした方がよさげな感じがしている。例えばWikipediaのList of OAuth providersのページによれば、OAuth1しかサポートしていないプロバイダーも結構ある(Bitbucket、Flickr、Tumbler辺りは個人的に使うかもしれない)。

どちらのプロトコルもクライアントを実装するだけならそんなに難しくないと思っていて、問題になるのはAPIの設計だと思っている。中身を見れば全くの別物なのだが、名前や歴史的経緯を見ると両社は密接なつながりがあるように見える。そうすると、ユーザー(クライアント)としてはシームレスに使えた方がいいかなぁとか考えてしまうわけだ。最終的にアクセストークンを使って保護されたリソースにアクセスするわけだし。そうするとこんな感じにライブラリを構成するのがいいだろうか？

(rfc oauth)

(rfc oauth v1.0)

client and so?

(rfc oauth v2.0)

client and so?

もしくは、無難に分けるか

(rfc oauth1)
(rfc oauth2)

無難に分けた方がいい気がするなぁ、変に統合すると混乱しそうだし。現在ある(net oauth)ライブラリはそのままにするが、deprecatedという形にすればいいかな。うまいことやれるなら新しいライブラリを下請けにする形にしたいが、そこまで手が回るかは疑問。

2016-12-19

トップレベルの継続

この記事はLisp Advent Calendar 2016の19日目の記事です。

最近c.l.s.に面白い投稿があった。これである。要約をすると、R7RSに以下の一文を追加しようという話になる。

It is an error to invoke the continuation of a top-level expression or the expression of a top-level definition more than once.
[訳]トップレベルの式もしくは定義の継続を二度以上起動することはエラーである。

この一文を入れる根拠が以下のコードになる（行の関係上短縮記述を使う)。

(define-library (foo)
  (import (scheme base))
  (export reload count)
  (begin
    (define counter (vector 0))              ;; (1)
    (define (count) (vector-ref counter 0))  ;; (2)
    (define reload (call/cc (lambda (k) k))) ;; (3)
    (vector-set! counter 0 (+ 1 (count)))))  ;; (4)

(import (scheme base) (scheme write) (foo))  ;; (5)

(unless (>= (count) 10)
  (display (count))
  (newline)
  (reload reload))
  
;; end of the file

トップレベルの継続はいろいろ落とし穴があるが、確かにこれはという感じものである。では、何が問題になるのか見ていこう。

継続とは

継続はSchemer以外のLisperには馴染みのない概念かもしれないので簡単に説明しておく。Schemeの仕様書を読むとなんとも面倒な言葉で書かれているが、厳密には多少違うが言ってしまえばコールグラフのことである(Clojure/Conjではスタックだと断言していた。まぁその通りである。)。上記のコードでは以下のようにプログラムが実行される：

  (??) --> (1) --> (2) --> (3) --> (4) --> (??)
                          ^^^^^
                           捕捉

call/ccは上記のコールグラフで捕捉と書かれた部分の継続をreloadという大域変数に保存している。reloadを起動することによって(4)からの処理を再開することができる。

問題

さて、上記のコールグラフで何が問題になるのだろう？(5)があるのにわざわざ(??)書いたのには理由がある。この部分はとても曖昧なのだ。

例えばこのプログラムが１ファイルに書かれていたとしよう。そして、処理系は一つの式を読み込み、実行という順序でファイルを処理したとする。この場合に(??)にくるのは次の式を読み込むという処理になるはずだ。そうすると、保存したreloadを起動した際に起きると予測されるのは、コメント;; end of the fileが読み込まれEOFが返されることであると言える(注：unlessから始まる式は既に読まれている)。ファイルの読み込みがEOFを返した場合、通常そのファイルは全て実行されたと解釈されるので、処理は終了するのが筋である。とすれば、この継続をこの位置で起動すると処理が終了すると言える。(必ずしも処理が終了するわけではないことに注意)

さて、c.l.s.の投稿ではLarcenyで上記のプログラムを走らせた場合想定通りに動いたとされている。これはどういうことか？簡単に言えばLarcenyはvan Tonderの展開器を使っているからとなる。もう少し突っ込んで解説をすると、van Tonderの展開器はR5RS上でポータブルにライブラリ機能を追加している。R5RSにはライブラリ機能は存在しないので実行時になんとかやっているわけだ。例えば上記のプログラムは概ね以下のように解釈される。

(begin
  (let ()
    (define counter ...)
    (define (count) ...)
    (define reload ...)
    (vector-set! ...)
    (register-library '(foo) ...)
  )
  (resolve-import ...)
  (unless (>= (count) 10) ...))

このようにプログラムが展開された場合、(??)に当たる部分は何になるだろうか？そう、(register-library ...)の部分である。そして、プログラム全体が一つの式として解釈されているので、次の式を読み込む必要がない。これによりLarcenyでは見かけ上期待した結果が帰ってくるということになる。この問題はこの２つだけが予測される結末ではないのが面白いところである。

インポート

ライブラリはインポートされることで使用可能になる。では、ライブラリの定義が別ファイルにあった場合はどうなるのだろう？ここからは完全に処理系依存の挙動になるので、拙作Sagittariusの挙動はこうなるであろうというのを例に上げる。Sagittariusではライブラリが別ファイルにあった場合、import句がライブラリのコンパイル等を解決する。その際、現状では継続の境界を作る。

継続の境界とは、C側でSchemeのプログラムをCのスタックをまたぐように呼び出す際に作られるある種の境界線のことである。これが発生しかつ、継続の起動がこの境界をまたぐとSagittariusではエラーを投げる。つまり、ファイルがわかれている場合のケースではSagittariusではエラーになる。他の処理系ではエラーにならないかもしれない。

REPL

REPL上ではどうだろうか？REPLでは(vector-set! ...)が実行された後に次の式を読み込むために一旦制御が入力待ち状態になる。つまり、reloadの起動は(vector-set! ..)を実行し入力待ちの状態になるはずである(少なくともSagittarius上ではそうなる)。これももちろんREPLの実装依存になるだろう。

まとめ

トップレベルの継続を捕捉するのはいろいろ落とし穴がある。R7RSに上記の文言が追加された方が幸せになれるかもしれない。

2016-12-02

R7RS-largeについて

この記事はLisp Advent Calendarの2日目の記事です。

Schemeの最新規格であるR7RSは2つのパートに分かれている。R7RS-smallとR7RS-largeである。通常の文脈でR7RSといった場合はR7RS-smallを指すことが多い。ではR7RS-largeとはなんなのか？

目的

R7RS-largeの目的は以下である：

Working group 2 will develop specifications, documents, and proofs of practical implementability for a language that embodies the essential character of Scheme, that is large enough to address the practical needs of mainstream software development, and that can be extended and integrated with other systems.
The purpose of this work is to facilitate sharing of Scheme code. One goal is to be able to reuse code written in one conforming implementation in another conforming implementation with as little change as possible. Another goal is for users of this work to be able to understand each other's code based on a shared and unambiguous interpretation of its meaning.
The language is not necessarily intended for educational, research, or embedded use, though such uses are not prohibited. Therefore, it may be a "heavyweight" language compared to the language designed by working group 1.
From Charter for working group 2

とんでもなく要約すると、R7RS-smallだけでは実用的なプログラムを書けないので実用的なライブラリ等を整備するよ、という感じである。

どんな感じで進んでるのか

基本的にはWG2Docketsにある項目をSRFIに書き起こして議論し、最終的に投票をするという感じで進んでいる。現状ではRedDocket(データ構造)の投票までが終了している。それらのライブラリ名はWG2のMLに投げられただけで、今のところ明文化はされていない。コミュニティが小さい＋(残念ながら)あまり興味のある人がいないというのが大きな問題だろう。最新のSRFIではOrangeDocket(数値)に関するものが出てきている。

ちなみに、風の噂で流れている(た？)ERマクロがR7RS-largeに入るという話は、YellowDocket(構文)に入っているので、Orangeが終わったら着手される可能性がある。RedがR7RS-small制定から3年かかっているので、決まるのは単純計算で6年後ということにはなるが…

決定されたライブラリ

上記のMLを見ればわかるんだけど、文量を増やすために現在までにR7RS-largeに入ることが決定したライブラリとその概要を書いてみたりする。上記のリストからコピーしている(タイポ含む）

SRFI 1 (scheme list)
古くからあるリストSRFI。ほぼ全ての処理系で使えるといってもまぁ過言ではないくらい有名なやつ。
SRFI 133 (scheme vector)
SRFI-43だとR7RS的に互換性がないからという理由で割と最近作られたベクタSRFI。
SRFI 132 (scheme sorting)
ソートSRFI-32が棄却されて12年経った後に出てきたソートSRFI。大体一緒なんだけど、細かいところでAPIが違う。
SRFI 113 (scheme set)
リスト、ベクタがあるのにセットがないのはどうよっていうので出てきたSRFI。
SRFI 114 (scheme set char)
ほぼ間違いないSRFI-14のタイポ。これまた古くからある文字セット用SRFI。
SRFI 125 (scheme hash-table)
R7RSのハッシュテーブルはSRFI-69の後継にあたるSRFIになった。R6RSのハッシュテーブルがあるのにという気持ちでいっぱいの僕としては気に入らない決定の一つ。
SRFI 116 (scheme list immutable)
不変リストなSRFI。中身はほぼSRFI-1と一緒なんだけど、受け取るものが不変リストであるというところが違う。一応通常のcarとかが不変リストを受け取ってもよしなに計らってくれることを推奨しているけど、サポートしてる処理系あるのかな？
SRFI 101 (scheme list random-access)
ランダムアクセス可能なリストSRFI。正直なんでこれがR7RS-largeに入ったのかはよく分からない。
SRFI 134 (scheme deque immutable)
不変な双方向キューSRFI。
SRFI 135 (scheme textual)
O(1)アクセスを保証するテキスト処理SRFI。R7RS-small的には文字列のO(1)アクセスは明示的に要求していないとしているので、パフォーマンスが気になる場合はこっちを使うという感じ。
SRFI 121 (scheme generator)
Gaucheのgeneratorが元のSRFI。
SRFI 128 (scheme lazy-seq)
ほぼ間違いなくSRFI-127のタイポ。GaucheのlseqにインスパイアされたSRFI。SRFI-41のストリームとは多少違うらしいが、ほぼ気にしなくてもいいらしい。
SRFI 41 (scheme stream)
遅延ストリームSRFI。このSRFIはSRFI史上初のR6RSライブラリを使って参照実装が実装されている。
SRFI 111 (scheme box)
ボックスSRFI。ボックスは名前が示す通り単なる箱で、Schemeのオブジェクトをなんでも格納しておくことができるもの。これを使うと明示的に参照渡しが可能になるはず。まぁ、あると便利だけど、なければ(vector obj)で代用できてしまうようなもの。コンパイラが頑張ると最適化してくれるかもしれない。
SRFI 117 (scheme list queue)
単方向キューSRFI。元々はキューという名前だけだったんだけど、SRFI自体がリストを基に実装されていることを前提にしていたので改名されたという経緯がある。SLIBにあった奴とほぼ同じ。
SRFI 124 (scheme ephemeron)
実装者泣かせSRFIの一つ。キーと値を持つデータ構造で、基本的な部分では弱い参照と一緒なんだけど、値がキーを参照していた際でもGCされるという特徴を持つもの。噂によるとまともに実装されているのはRacketとMIT Schemeくらいらしい。
SRFI 128 (scheme comparator)
SRFI-114はあまりにも醜いという理由から作られたSRFI。ちなみに片方あればもう片方は実装できる。

個人的に番号が新しい（100番以降くらい）のは時期尚早感のあるものが多いと思っている。というか、使える処理系が少ない。Sagittariusくらいじゃないかなぁ、一応全部サポートしてるの（手前味噌）。

控えているライブラリ

以下には入るかもしれないし、入らないかもしれない議論すらされてないライブラリを載せる。既にSRFIになってるものはそのSRFI＋その他の提案。まだ提案なだけなものリストだけで提案は省略。

OrangeDocket: 数値関係

Numeric types and operations:

Integer division: SRFI-141
Bitwise integer operations: SRFI-142, SRFI-60 or R6RS
Fixnums: SRFI-143 or R6RS
Flonums: SRFI-144 or R6RS
Compnums: John's proposal
Random numbers: SRFI-27 plus Gauche's generator
Prime numbers: Gauche's prime number library

Numeric and semi-numeric data structures
Enumerations (なんでこれここにあるんだろう？)
Formatting (同上)

YellowDocket: 構文関係

Low-level macros （有名どころの低レベルマクロがリストされてる）
Syntax parameters: SRFI-139
Matching (パターンマッチ）
Combinators
Conditional procedures （if等をdefineで定義した感じのなにか？）
cond guards: SRFI-61
Lambda* (Scheme Workshop 2013に出た論文がもとっぽい）
Void (eq?で比較可能なNULLオブジェクト）
Named parameters: John's proposal or SRFI-89
Generalized set!: SRFI-17
and-let*: SRFI2
receive: SRFI-8
cut/cute: SRFI 26
Loops: SRFI-42, foof-loop or Chibh loop
Generic accessors/mutators: SRFI-123
Conditions （例外、なぜにR6RSではないのか…）

GreenDocket: ポータブルに書けない何か

File I/O
Threads: SRFI-18
Real-time threads: SRFI-21 (分ける必要あるのか？)
Socket: SRFI-106
Datagram channels (UDP sockets)
Timers: SRFI-120 (なんでこれがあるんだろう？)
Mutable environments
Simple POSIX （Windowsは完全無視ですね、わかります）
Access to the REPL (これ完全にR6RS殺しに来てるな…）
Library declarations
Symbols
Interfaces
Process ports （紛らわしいけど、プロセスI/Oをポートにリダイレクトする話）
Directory ports （ディレクトリをポートみたいに読む、正直紛らわしい）
Directory creation/deletion
Port operations
System commands （Process portsと被り気味な気がする）
Pure delay/force (スレッドを使ったdelay/forceっぽい）

BlueDocket：ポータブルだけど高度なものたち(portable but advanced things)

Time types: SRFI-19
Binary I/O
Character conversion （要するにR6RSのコーデック的なもの）
Parallel promises (pure delay/forceじゃだめなん？)
Pathname objects
URI objects
Unicode character database
Environment （紛らわしいけどusernameとかメモリとかを引くためのもの）
Trees (リストにあるけど提案なし）

他にもあるけど、多分色付きの名前の奴が優先だとは思うので省略。

2016-11-29

Syntax parameter 2

Couple of months ago, I've written about syntax parameter. Now, I noticed that this implementation isn't really useful in certain (probably most of the) cases. For example, consider like this situation: You want to parameterise user inputs and your macro need to cooperate with auxiliary macros something like this:

;; default input is '()
(define-syntax-parameter *input* (syntax-rules () ((_) '())))
(define-syntax aux
  (syntax-rules ()
    ((_ body ...)
     (let ((user-input (*input*)))
       ;; do what you need to do
       body ...))))
(define-syntax foo
  (syntax-rules (*input*)
    ((_ (*input* alist) body ...)
     (syntax-parameterize ((*input* (syntax-rules () ((_) 'alist))))
       (foo body ...)))
    ((_ body ...) (aux body ...))))

(foo (*input* ((a b) (c d))) 'ok)
;; *input* is '()

I would expect that in this case *input* should return ((a b) (c d)). And, indeed, Racket returns ((a b) (c d)) as I expected.

The problem is simple. My implementation of syntax-parameterize is expanded to letrec-syntax with bound keyword. However since the above case crosses macro boundaries, *input* always refers to globally bound identifier. Now, my question is; can it be implemented portable way? If I read "Keeping it Clean with Syntax Parameters", then it says it's implemented in a Racket specific way. (If I read it properly, then syntax parameter is really a parameter object which can work on macro.)

Hmmm, I need to think how I can implement it.

2016-11-21

ニュースを読もう

ここ数週間の空き時間を使ってちょっとしたサイトを作っていた。news-reader.nl
ちなみに、ニュースを比較しながら読むとか、ダイジェストだけ知りたいとかいうモノグサな思いから生まれたものだったりする。今のところシンプルな機能しかないけど、そのうち自動でカテゴリわけしたりとか、似ている記事を列挙したりとかできたらいいなぁとは思っている。

適当によく見るニュースサイトを載せてるだけなので、これがあるといいというのがあればリクエストください。

これだけだとなんなので、ちょっと内側も覗いてみたりする。

構成としてはリバースプロキシとしてNGINX、フロントエンドはAngularJS、バックエンドには当然Sagittariusが使われている。NGINXは静的ファイルを返すようにして、Angujarが適当にAJAXでゴニョゴニョする。っで裏で動いているPaellaがPostgreSQLにアクセスしてデータを返すと。

ニュース自体はRSSを取ってきて、タイトルと説明を表示している。裏でCronが一時間に一回走るようになっているので、適当に更新される。(今のところ削除されないから、記事が貯まるとサーバーがパンクするような気もするが、まぁこれは後で考えよう…)

デプロイはAnsibleである程度自動化されている。テスト中はフルオートだったんだけど、本番にデプロイする際に手作業が発生したので。まぁ、そのうち気が向いたら直すことにしよう。(二度目があるとも思えんしなぁ)

広告載せてるけど、これかなりうざいのでサーバー代をお布施するつもりがないのであればアドブロック推奨で。

2016-10-19

Connection and session model

CAUTION: It's rubbish. Just my brain storming.

I'm not even sure if this is a proper model name or not. But I often see in low level communication specification, there is a connection and sessions (or channels) belong to the connection. Sometimes (or most of the time?) channels can be recovered on different connections, but here I discuss non recoverable channel for my ease.

I've written couple of scripts which handles this type of things already. For example AMQP. I think one of the pros for this model is preventing actual connection count. Recently, I've faced a concurrency problem caused by reading data from one single connection. At that moment, I didn't want to create much of connections (but in the end realised that there's no other way, though). Then I've started thinking about it.

Now I'm thinking can this be done in generic way or at least something which can be reused instead of writing the same mode everywhere? Suppose,a connection have a port. If a session is derived from the connection, then connection gives the session a session id or so. Whenever data are read from sessions, then actual connection dispatches read data according to the required session id. Something like this:

(import (rnrs))

(define-record-type connection
  (fields port ;; suppose this is input/output port
          sessions)
  (protocol (lambda (p)
              (lambda (port)
                (p port (make-eqv-hashtable))))))
(define-record-type session
  (fields id connection))

(define (open-connection port) (make-connection port))
(define (open-session connection)
  (let* ((sessions (connection-sessions connection))
         ;; FIXME
         (id (+ (hashtable-size sessions) 1))
         (session (make-session id connection)))
    (hashtable-set! sessions id session)
    session))

(define (read-from-session session)
  ;; IMPLEMENT ME
  (read-for-session
   (session-connection session)
   (session-id session)))

To keep it clear, here connection is the physical connection and sessions are logical layers derived from a connection. A channel is a communication pipe between connection and session.

Obviously, this doesn't work as it is even if I implement reading part, because:

There's no way to know which message should go which session
There's no negotiation how to open a session, this type of things are usually client-server model.

To know how to dispatch messages, connections need to know how message frames look like. So connections should have something like receive-frame field which contains a procedure receiving a message frame?

(define-record-type connection
  (fields port
          ;; suppose this returns 2 values, message id and payload
          receive-frame
          negotiate-session
          sessions)
  (protocol (lambda (p)
              (lambda (port receive-frame negotiate-session)
                (p port receive-frame  negotiate-session (make-eqv-hashtable))))))

Is it good enough? But what if there are like these 2 process is simultaneously running; opening a session and reading payload from a session. So it might be a good idea to have channel like this?

(define-record-type connection
  (fields port
          receive-frame
          ;; takes session-id, channel-id and connection.
          ;; returns physical-session-id
          negotiate-session
          sessions
          channels)
  (protocol (lambda (p)
              (lambda (port receive-frame negotiate-session)
                (p port receive-frame
                   negotiate-session
                   (make-eqv-hashtable)
                   (make-eqv-hashtable))))))
(define-record-type session
  (fields id channel-id connection
          (mutable physical-session-id)))
(define-record-type channel
  (fields id buffer))

(define (open-connection port) (make-connection port))
(define (open-session connection)
  (let* ((sessions (connection-sessions connection))
         (channels (connection-channels connection))
         ;; FIXME
         (session-id (+ (hashtable-size sessions) 1))
         (channel-id (+ (hashtable-size channels) 1))
         (channel (make-channel channel-id
                                ;; IMPLEMENT ME
                                (make-buffer)))
         (session (make-session session-id channel-id connection #f)))
    (hashtable-set! sessions session-id session)
    (hashtable-set! channels channel-id channel)
    session))

Hmmm, would it be enough? I'm not sure yet. And I'm not even sure if this is useful. Maybe I need to implement a prototype.

2016-10-10

SXMLオブジェクトビルダー

なんとなくSXMLをレコード変換するフレームワーク的なものがあると便利かなぁと思って作ってみた。まだ作りこみが甘い部分もあるが、必要そうな部分は動いているのでちょっと紹介。

ライブラリは(text sxml object-builder)としてみた。基本的な使い方はこんな感じ。

(import (rnrs)
        (text sxml ssax)
        (text sxml object-builder)
        (srfi :26))

(define-record-type family
  (parent xml-object))
(define-record-type parents
  (parent xml-object))
(define-record-type child
  (parent xml-object))
(define-record-type grand-child
  (parent xml-object))

(define family-builder
  (sxml-object-builder
   (family make-family
    (parents make-parents
     (? child make-child
        (? grand-child make-grand-child))))))

(define sxml (call-with-input-file "family.xml" (cut ssax:xml->sxml <> '())))

(sxml->object sxml family-builder)
;; -> family object

family.xmlはこんな感じ。

<?xml version="1.0" ?>
<family>
  <parents father="Daddy" mother="Mommy">
    <child name="Big bro">
      <grand-child name="Bekkie">Boo!</grand-child>
      <grand-child name="Kees">Bah!</grand-child>
    </child>
    <child name="Sis"/>
  </parents>
</family>

sxml-object-builderマクロはなんとなくいい感じにオブジェクトビルダーを構築してくれる。基本的には(tagname constructor . next-builder)みたいな構文。next-builderが空リストだと渡ってきたSXMLそのまま使う感じ。後は量指定子だったり、複数タグにしたり。最終的にはXSDと同等の表現力を持てたらなぁとは思っている(ほぼカバーしてるつもりだけど、仕様書を真面目に読んだことがないので何が足りてないのかが分かっていないという…)。

ここではレコードを定義してるけど、constructorは３引数受け取る手続きなら何でもいい。便利かもしれないということで、xml-objectなるものをデフォルトで用意してるけど、特に使う必要はない。

これだけだとイマイチ嬉しくないんだけど、もうひとつSchemeオブジェクトからSXML変換するライブラリを考えていて、それがあれば多少嬉しくなる予定ではいる。

2016-10-06

スペシャリテ

料理の話ではない。

Scheme処理系は世の中に山ほどある。RnRS準拠（n≧5)に絞ったとしてもかなりある（というかR5RSが多い）。そこで、有名処理系が選ばれる理由を考えてみた。

有名所の処理系

R5RS

Chicken

Eggが便利
Cに変換すれば高速に動く
コミュニティが大きい

R6RS

Chez

現状最高速の処理系

Guile

GNU公式拡張言語
コミュニティが大きい

IronScheme

.Netで動く

Larceny

高速
R7RSハイブリッド

Racket

高速
コミュニティが大きい
Planetが便利
リッチな環境がついてくる(DrRacket)

Mosh
Ypsilon
Vicare

ファンが多い？

R7RS

Chibi

コンパクト
C拡張が書きやすい

Gauche

日本語ドキュメント
手軽？

Kawa

JVMで動く

あんまりよく考えてないけど考察

Chicken、Guile、Racketはコミュニティの多きさ≒問題解決のしやすさとライブラリの豊富さが大きいかなぁと。Chezは元商用の処理系だけあって速い。Gaucheは日本語ドキュメントが大きいかなぁ（日本限定で考えれば）

KawaとIronSchemeは環境固定されるのでその環境で使いたいならほぼ一択状態。流石にJVMだと他にもあるけど、知名度的にはKawa一択だろう。

Larcenyがどれくらい使われているかはちと分からんけど、超有名な処理系なのでそこそこユーザーはいるんじゃないかなぁ。

Mosh、Ypsilon、Vicareはリリースが年単位でされてないから(MoshとVicareはリポジトリの更新があるけど、Ypsilonに関しては多分放棄されてる）、正直これらを選ぶ理由が見つからない。バグリポートしても直らんし。強いて言えばファンが多いくらい？

スペシャリテ？

じゃあどうするか？そもそも知名度が低いという点でどうしようもないんだけど、宣伝するにしても「それなら〇〇でいいじゃん」ってことにならないようにしたい。っで、現状ある武器で考えると以下かなぁ：

R6RS/R7RSハイブリッド
リーダーマクロ
そこそこ手軽
Cトランスレータ有り（バイナリ走らせるのにランタイムがいるけど）

挙げてみて思ったけど、ユーザー視点でいいことないなぁ。これだと「〇〇でいいじゃん」にしかならない。将来的に入れようかなぁと思っているのは日本語ドキュメントなんだけど、入れたところでGaucheの牙城を崩せる気はしないしなぁ。

地道にライブラリを増やすとか、環境を整えるとかなぁ。

2016-09-28

C translator (2)

Starting 0.7.8, Sagittarius provides experimental C translator. Currently it's more or less toy quality (though, just using is not a problem). There are 2 way of translations:

Dump all required library cache
Doing the same as precompiled files

Both solutions have problems. The first one requires huge amount of static area and generated files are huge. The second one can't translate macros.

Dumping cache file is not a good solution so I want to discard it. To do so, second solution must support macro translation. The challenge of doing it is that macro contains shared structures and may contain subrs (Scheme procedures defined in C). Translating shared structure to C may not be so difficult but subrs. Subrs themselves don't have any information where it's defined, so they only have their name and C pointer. As far as I know, there's no subr in macro so this may not be a problem as long as users (me mostly) don't do some black magic (and I know I can...).

One of the benefits (or maybe this is the only) of translating C is that considerably short amount of starting time. Sagittarius is, unfortunately, not so fast implementation, and its starting time is not so fast even when it loads only cached libraries. For example, one of my daily routine scripts takes 500ms to just showing help message (8000ms if it's not cached) and translated version takes 130ms. It's not that much difference (yet) but still it's almost 5 times faster than raw Scheme.

However, I think it's still slow or doing some unnecessary things. Consider the following script:

(import (rnrs))

(define (main args)
  (display args) (newline))

If I run this script, then it loads 32 libraries currently. But the only bindings required in this script are display and newline. So if the C translator is smart enough to detect them, then the result script should look like this:

;; imaginary script... 
(define (main args)
  (#{(core)#display} args) (#{(core)#newline}))

Then it doesn't require any unnecessary library loading.

Hmm, I think I have a lot to do.

2016-09-19

Scheme Workshop 2016 奈良

行ってきた。朝一で奈良入りでもいけなくはなかった気がしないでもないが、せっかくだしということで奈良に２泊３日してきた。奈良の観光の話でもいいけど、流石にどこも回れなかったのでワークショップの話。

前日の夜にAlexからスケジュールの変更のメールが届く。発表者二人が飛行機に乗り遅れたので、その二人を最後に回すため。結局間に合わずWillとKathyによりカラオケスライドメソッド(後述)が発動した。

ホテルから徒歩１５分という距離なのに９時１５分のオープニングに間に合わず。油断しすぎでした。

【招待講演１】
プログラムの正当性を検証するプログラムにLispを使ったという話(だと思う)。スケジュールが移動して自分の発表が一発目になったのでそれに意識が行き過ぎてたから話が頭に入らなかった…これ聞きながら、今年のはこんなにアカデミックなのかぁとものすごく緊張した記憶しかない。

【自分の】
そのうちスライド挙げます。単にライブラリの紹介。

【Nash: a tracing JIT for Extension Language】
GuileにトレーシングJITを実装したという話。プログラムで使用される時間の大部分はタイトなループによって発生しているということに注目したJITらしい。本家にマージされるかは不明らしい。されてほしいような、されてほしくないような(Sagittariusが選ばれる可能性が下がるという意味で)。

【Ghosts in the machine】
現在のプログラミング環境はコンピュータ黎明期と比較して劣っている、ということをClojureのREPLを例にして示す話。時間内に終わらなかった発表一つ目。休憩中に発表者に質問してどこを目指しているのか質問して聞いたりしていた。視点としては面白いし、一意見として終わらせるには惜しい気がするけど、何から始めるといいんだろう？となるくらいには壮大なビジョンだった。

【R7RS update】
AlexによるR7RSの近況。まだ見ぬSRFI-142とSRFI-143が並んでいたのでArthurで止まっているのか、Johnがまだ提出してないのかは謎。Red Docketは終わったらしいのだが、正式な発表はまだされてない気がするけど、c.l.sで見逃したかな？

【招待講演2】
Guixという比較的新しいパッケージマネージャの話。インストール履歴をリビジョンで管理しているのでうっかり何かを壊しても動くリビジョンに戻せるというのが他のパッケージマネージャとの大きな違いかな。後はGuileで書かれているのでパッケージの定義もS式(Guileで動くプログラム)というのも大きな特徴かな。

【Function compose, Type cut, And the Algebra of logic】
発表者のコミュニケーション能力があまり高くなかったので何が何だかさっぱり分からなかった。ただ、発表資料自体は面白いことが書いてあったので、論文を後で読むことにする。

【Multi-purpose web framework design based on websocket over HTTP Gateway】
SagittariusにWebsocketを入れたこともあり一番気になってた発表なんだけど、今一発表が雲を掴む様な感じでよく分からなかった。また、コミュニケーションの問題も多少あり、納得のいく答えも得られず。後で論文を読む。

ここからカラオケスライドメソッドが発動する。ちなみに、カラオケスライドとは自分が作ったスライドではないものを発表するもと思えばよい。命名はWill。日本のカラオケが流れてくる文字を歌うことに引っ掛けた上手い名前だと個人的には思った。

【miniAdapton: A Minimal Implementation of Incremental Computation in Scheme】
AdaptonというMemoriseのテクニックに一つをSchemeで実装した話。入力ツリーの一部が変更されても共通部分は再利用しているような感じだった。リポジトリも公開されてるし、ソース見た方が早いかな。しかし、代理の発表(論文の共著者であるが) なのにえらくうまいことやるなぁと前回見た時も思ったなぁ。あれくらいうまくしゃべれるようになりたい(なんか違う)。

【Deriving Pure, Functional One-Pass Operations for Processing Tail-Aligned Lists】
リストの共通サフィックス部分を調べるスクリプトをエレガントに書いたという話。ベンチマークがないのでこれが実際にどこまで効くのかは微妙。ナイーブな実装、エレガントな実装ともにO(N)なので、コンスタントの部分だけなのだが、正直論文の実装だとヘルパー手続きの作成にかかるコストの方が高い気がしないでもない。まぁ、コードは論文に載っているので手元でベンチマークとってJasonにメール投げてみればいい話な気がする。

どうでもいいのだが、僕が喋る英語はオランダ語訛りらしい(Arthur談)。どうも子音の発音がネイティブに比べて強いのだそうだ。普通母語に引っ張られるが、第三言語(第二が英語)に引っ張られるのは珍しいんじゃないの？という話をしていた。確かに寡聞にして聞かない話ではある。

2016-08-30

ログ

そういえばSagittariusにはずっとログを吐き出すためのライブラリがないということをふと思い出した。附属させてるライブラリがログを吐き出すのはさすがにどうかと思うのであんまり考えてなかったのだが、Paellaみたいなのが何のログも吐かないというのはいろいろ面倒だなぁと思ってはいた。

あまりいいデザインというのも思い浮かばないんだけど、なんとなくこんな感じのロガーがあればいいかなぁと思い30分くらいで作ってみた。こんな感じで使える。

(import (srfi :18) (util logging) (util file))

(define (print-log logger)
  (trace-log logger "trace")
  (debug-log logger "debug")
  (info-log logger "info")
  (warn-log logger "warn")
  (error-log logger "error")
  (fatal-log logger "fatal")
  (terminate-logger! logger))

(print-log (make-logger +info-level+ (make-appender "[~l] ~w4 ~m")))
(print)
(print-log (make-async-logger +debug-level+ 
         (make-appender "[~l] ~w4 ~m")
         (make-file-appender "[~l] ~w4 ~m" "log.log")))

(print (file->string "log.log"))
#|
[info] 2016-08-30T16:02:29+0200 info
[warn] 2016-08-30T16:02:29+0200 warn
[error] 2016-08-30T16:02:29+0200 error
[fatal] 2016-08-30T16:02:29+0200 fatal

[debug] 2016-08-30T16:02:29+0200 debug
[info] 2016-08-30T16:02:29+0200 info
[warn] 2016-08-30T16:02:29+0200 warn
[error] 2016-08-30T16:02:29+0200 error
[fatal] 2016-08-30T16:02:29+0200 fatal
[debug] 2016-08-30T16:02:29+0200 debug
[info] 2016-08-30T16:02:29+0200 info
[warn] 2016-08-30T16:02:29+0200 warn
[error] 2016-08-30T16:02:29+0200 error
[fatal] 2016-08-30T16:02:29+0200 fatal
|#

ロガーがログレベルと同期をコントロールして、アペンダーが実際にログを吐き出す。どこかでみたことあるようなモデルではある。まぁ、本職はその言語を使うのでなんとなく似通ったのだろう。make-appenderでつくられるアペンダーは標準出力（正確にはcurrent-output-portが返す値）に吐き出す。アペンダーの第一引数はログのフォーマット。現状手の込んだ出力はできないが、当面はこれでも問題ないだろう。ちなみに、~wの後ろに続く4はSRFI−19で定義されているフォーマットの一つ。現在は一文字しかみないが、そのうちなんとかするかもしれない。

make-async-loggerはスレッドを一つ消費する代わりにログの書き出しをバックグラウンドでやってくれる。アペンダーが複数あったり、処理が重い（メール送るとか）等あるときに役に立つと思う。

とりあえずこれを適当に使ってるスクリプト等に組み込んで使用感を確かめていくことにしよう。

2016-08-29

Ephemeron and reference barrier

Recently, there's the post on SRFI-124 ML about reference barrier. The SRFI doesn't require reference-barrier procedure due to the non-trivial work to implement. Now, there's a post which shows how to implement portably, like this:

(define last-reference 0)
(define (reference-barrier x)
  (let ((y last-reference))
    (set! last-reference x)
    y))

It seems ok, but is it? The SRFI says like this:

This procedure ensures that the garbage collector does not break an ephemeron containing an unreferenced key before a certain point in a program. The program can invoke a reference barrier on the key by calling this procedure, which guarantees that even if the program does not use the key, it will be considered strongly reachable until after reference-barrier returns.

Due to the lack of knowledge and imagination, I can't imagine how it should work precisely on like this situation:

(let loop ()
  (when (some-condition)
    (reference-barrier key1)
    (reference-barrier key2)
    (do-something-with-ephemerons)
    (loop)))

Should key1 and key2 be guaranteed not to be garbage collected or only the last one (key2 in this case)? If the first case should be applied, then the proposed implementation doesn't work since it can only hold one reference. To me, it's rather rational to hold multiple reference since it's not always only one key needs to be preserved. However if you extend to hold multiple values, then how do you release the references without calling explicit release procedure? If the references would not be released, then ephemerons would also not release its keys. Thus calling reference-barrier causes eternal preservation.

Suppose this assumption is correct, then reference-barrier should work like alloca with automated push so that the pushed reference is on the location where garbage collector can see them as live objects. In this sense, allocated references are released automatically once caller of reference-barrier is returned and multiple references can be pushed. Though, it's indeed non-trivial task to implement. I hope there's no errata which says reference-barrier is *not* optional, otherwise it would be either a huge challenge or dropping the support...

2016-08-24

暗黙の総称関数 (部分解決編)

昨日の続き

総称関数が暗黙的に大域定義されるという話なのだが、昨日のアイデアを元に直してみた。現在のHEADでは以下のようにしても大域には定義されない。

(import (rnrs) (clos user))

(let ()
  (define-generic foo)
  foo)
foo ;; -> &undefined

define-genericは今まで無条件に大域に束縛を挿入していたが、現在はトップレベルで定義された際のみとしている。これは別に難しいことではなかったので割愛(実現するのに他の不具合を直す必要があったが)。

問題はdefine-methodである。現状ではほぼうまく動くのだが、以下のような使い方をすると動かない。

(import (rnrs) (clos user))

(let ((foo 'foo))
  (define-method foo (o) o)
  (print (foo 1)))
;; -> error

define-methodが局所定義された際には局所定義された総称関数をまず探し、なければ大域定義されたものを探すという手順を取っている。そして両方とも見つからなかったらdefine-genericを挿入する。ちなみにここで挿入されたものは局所定義になるので、大域には定義されない。上記の例がうまく動かないのは、メソッド名と局所変数名が同一だから。マクロ展開は当然コンパイル時に行われるので、同名の変数を探すことはできるが、その変数が何を指しているかというのは、指しているものがリテラルである場合を除き、実行時まで分からない。この使い方をすることはそうないだろうと踏んで、現状ではこれで妥協することにした。何か妙案が浮かんだ時にまた直すかもしれない。(どうでもいいが、投げられるエラーが&assertionなのはおかしい気がしてきた。少なくとも&errorじゃないと…)

これとは直接は関係ないのだが、変数名の重複をチェックするようにした。以下のようなものが動なる。

(let ((a 1) (a 2)) a) ;; -> error
(let (("a" 1)) 1)     ;; -> error

もともと単にサボっていただけなのだが、総称関数が以下のように定義された際に混乱を招きそうだったので：

(let ()
  (define foo)
  (define-generic foo)
  (define-method foo (o) o)
  (foo 1))
;; ???

そもそもエラーなので何が起きても問題ないんだけど、変な混乱を招くよりはコンパイラに怒られた方がいろいろ楽な気がしたというのが本音。

完璧な解決ではないが、メソッドの局所定義とか(個人的には)やらないだろうし、9割くらいの混乱を未然に防げるんじゃないかなぁ。

2016-08-23

Implicit generic function creation

define-method would create a generic function implicitly if there's none defined yet. It's convenient and I've written libraries (probably only one) depending on this behaviour. (c.f. (binary data))

Convenience would usually be a trade-off of consistency, at least in my case. For example, this is a long standing bug (though, I've just issued):

(import (clos user))

(let ()
  (define-generic foo))
foo
;; -> should be &undefined

This is because define-method would create a generic function during macro expansion, and it would be an unexpected result in this case if it didn't:

(begin
  (define-generic foo)
  (define-method foo (o) #t))

In this case, define-method should not make implicit generic function, but macro expansions are done in the same compilation time as define-generic. Thus, it's impossible to know if it should create or not. To let define-method know it shouldn't, define-generic also inserts binding into current environment (a library) during macro expansion.

Can't I do better? Creating global binding during macro expansion is rather ugly, but I don't I can get rid of it (or maybe the future?). Yet, I think I can avoid to create unwanted one like above example. It's still just an idea but if define-generic and define-method can see if they are used in a scope, then it seems there's a way. Since Sagittarius has current-usage-env and current-macro-env procedures, it is possible to access compile time environment during macro expansion. Thus, it should even be able to detect whether or not define-method should create an implicit generic function or not.

This should work, let's see.

2016-08-22

Cトランスレータ

ふとスタンドアロンのバイナリができるといいかなぁと思ってこんなものを書いてみた(使うには0.7.8のHEADが必要)。以下のように使う：

$ ./scheme2c -o out.c foo.scm
$ gcc `sagittarius-config -L -I -l` -O2 out.c

中身はほぼなんでもいいんだけど、引数を受け取るにはSRFI-22のmainがないとたぶんうまいこと使えないはず(command-line手続きで引数が取れない)。ちなみに出来上がりCファイルは超巨大になる可能性がある。参考までに、このスクリプト自信をCに変換したら45MBのファイルになった(64bit環境)。ファイルが巨大になるのはこのスクリプトが何をやっているかを見ればすぐにわかるのだが、端的に言うとキャッシュファイルを引っ張ってきてCのバイト列にしてるから。

なんでこんなものを書いたかというと、特に理由はないんだけど、バイナリ一個で動くといいかなぁと思ったから。今のところランタイムとして最低でも DLL か .so (OSXなら .dylib) がいるがこの辺はそのうち気が向いたら何とかする予定。

キャッシュをそのままダンプしているので、性能的なメリットは一切ない。強いて言えば多少ファイルアクセスが減るくらい。今のところsagittarius-configは Windows 版にはつけてないので VC でバイナリを作りたい場合は多少の工夫が必要。ちなみに、インストール時にパスが決まるからインストーラでやらないといけないというのがついてない理由。

作っておけば気が向いたときに改善されていくのではないかというメソッドなので、しばらく実用にはならない可能性が高い。

2016-08-11

不要レジスタの除去とSchemeコード

VMにレジスタを追加するとスレッドセーフなパラメタと同様な感じで使えるということがあり、SagittariusではVMにいくつかマクロ展開器用のレジスタを持たせていた。あまりいい手ではないし、無駄にVMのサイズを増やすことになるのでいつかは何とかせねばと思いつつもかなりの期間放置していた。っが、最近いい方法を思いついたのでえいや！と除去。VMのサイズが５ワード減って752バイト(64ビット環境)になった。焼け石に水もいいところである。

【やったこと】
基本的には非常に簡単で、VMのレジスタをScheme側でパラメータ化してそれらを使っていたCの実装を全てScheme側に移動させただけ。Scheme側といってもプレビルドされているものなので、本体サイズが小さくなるということは残念ながらない。単にVMのサイズが多少減るのと、自己満足度が多少上がっただけである。

言うは易し行うは、キャッシュのせいで、多少難かった。除去したVMレジスタはマクロ展開器ようにしぶしぶ追加したもの。こいつのせいでマクロのキャッシュが無駄に複雑になっていた(今でも不必要に複雑だが)。SchemeのパラメータをCから呼び出すということは可能な限り避けたいと思ったので、マクロ展開器に関するコードをごそっとScheme側に移動させる必要があった。そうすると、Cで作られた展開器と密接につながっていたキャッシュの読み出しと書き出しが壊れる。キャッシュ機構自体が無駄に複雑なので面倒なデバッグに突入。後は気合でって感じだった。

これによるパフォーマンスの低下があるかなぁと思ったけど、顕著に表れるようなものはなかったのでよし。

【Schemeコードの混在】
VMのレジスタにはリーダーマクロやR7RSのincludeのためにあるものもあった(取っ払ってやった、更に2ワード減ったぜ)。こいつらはコンパイラが依存していたり、load手続きが使用していたりするので、単純にプリコンパイルされたSchemeに放り込むことができなかった。気合でパラメータ関連をCで書いて何とかするという手もなくはなかったんだけど、年を取ると楽な方に逃げたくなる。ということで、スタブファイル内にSchemeコードが書けるようにしてみた。

Schemeコードのプリコンパイル自体はすでにあるのだから、ちょっと手を入れればいけるだろうと思って手を入れてみたら動いたという感じ。残念ながらCで書かれた手続きからSchemeの手続きを呼ぶことはできないし、SchemeからCの関数を直接呼び出すこともできない。それでも、混在可能というのは非常に楽である。依存関係があるので純粋にSchemeで書くより多少気は使うものの(ちなみにマクロは使えない)、Cでごにょごにょやるより遥かに楽である。これを使って(sagittarius)ライブラリ内に簡易パラメータを作り、load周りのVMレジスタをそいつで実装。それに伴って関連するCのコードをごそっと消してやった。すっきり。

流石にこれはパフォーマンスに影響がでて、例えば(rfc http)のように大量に他のライブラリ(数えたら131個あった)に依存するものをキャッシュなしで読み込むと大体10～15％程度パフォーマンスが落ちた。Cでは手続き内でやっていた処理がScheme側に移動したことで手続き呼び出しに変わったこと等によるオーバーヘッドだろうなぁとは思っているものの、まぁしょうがないか。キャッシュになってさえいればほぼ変わらないので(キャッシュはCなので当たり前だが)、初回起動のみが遅いと思えばそこまで気にすることでもないかなぁ。そもそもコンパイルとマクロ展開が遅いでそっちを先に何とかしないとという話である。(text sql)のコンパイルとか環境によっては10秒かかるし…2000行以上にわたるマクロを10秒で展開すると思えばそこまで悪くないともいえるのか？っがコンパイル待ってる間にコーヒー飲み終わる勢いなのは流石になぁ…

この調子でVMの不要もしくはあってほしくないレジスタを削除していきたいところである。

2016-08-03

WebSocketクライアント

連投の二つ目

なんとなくやる気とか刺激とか取り戻すためにWebsocketのライブラリを書いてみた。WebSocketはRFC6455で定義されているプロトコル(詳細はRFC読むべし)で、実装もそんな大変そうでもないので書いてみた。

書いてる途中でSaitoAtsushiさんの実装の存在を思い出してライブラリの名前を確認したら、(rfc websocket)とまるかぶりだった。Pegasus用のformulaまで書いていただいているのにこのままぶつけるのもなぁと思い確認してみたところ

@tk_riple 実際のところ、すでに入れている人がどれだけいるのかなぁという気もするんですよね。本家でやるなら私自身も本家を使うようになる (私の実装はもうメンテされず、相対的に価値が暴落する) ので、そういう「普通の名前」は本家の方で取って欲しいという気持ちがあります。
— 齊藤敦志 (@SaitoAtsushi) 2 August 2016

とのことだったので、遠慮なくぶつけさせていただいた。他の名前の候補としては、(rfc :6455 websocket)とか(rfc websockets)（Cのlibwebsocketsに倣って)とかあったけど、「普通の名前」（RFCの名前）を使うことにした。

簡単な使い方。

(import (rnrs) (rfc websocket))

;; Creates WebSocket object
(define websocket (make-websocket "wss://echo.websocket.org"))

;; Sets event handlers
(websocket-on-open websocket
  (lambda (ws) (display 'CONNECTED) (newline)))
(websocket-on-text-message websocket
  (lambda (ws text) (display text) (newline)))

;; Connects to the server
(websocket-open websocket)

;; Sends a message
(websocket-send websocket "Hello")

;; Close it
(websocket-close websocket)

ユーザーレベルAPIは基本的にWebSocketオブジェクトを返すので、以下のようにも書ける。

(websocket-close 
 (websocket-send 
  (websocket-open 
   (websocket-on-text-message
    (websocket-on-open (make-websocket "wss://echo.websocket.org")
     (lambda (ws) (display 'CONNECTED) (newline)))
    (lambda (ws text) (display text) (newline))))
  "Hello"))

どっちがいいかは好みだろうけど（流石に下のはあまり使わないか？）。ドラフトのまま絶賛放置中（期限切れてるから破棄されてるの？）のWebSocket over HTTP/2にも頑張れば対応できるようにはしてある（プラグイン書くだけ）。

すでにあるライブラリにぶつけにいったということもあり、例外とかかなり頑張って作っている。こんなに例外階層作って、しかも例外ハンドリングをまじめにやったのってたぶん初めてじゃないかなぁ。ライブラリ自体の構成は割とスタンダードで、ユーザーレベルAPI、中間レベルAPI、低レベルAPIという感じになっている。低レベルAPIはサーバー書くときに便利に使え（現状ではサーバーをどうするかあんまり考えていない）、中間レベルは何かしらプログラム的にやるのに、ユーザーレベルはJavaScriptのWebsocketみたいな感じで使える。

作って２日なので、作りこみが足りない部分はあるかもしれないが、簡単なチャットサーバーとクライアントみたいなのは作れたので紹介してみた。

例外ハンドリング

連投の一つ目。

一つ前の記事でguardの例外の再送出をwith-exception-handlerで受けると無限ループに陥る問題を解決した話。結論を先に書くと、raise、raise-continuable及びwith-exception-handlerをSchemeで実装して、継続の境界を作らないようにした。

ぼ～っと考えて実装したら動いちゃった系の解決方法で、特に苦労とかなかったんだけど、実装前に気になっていた点が以下：

Cでwith-exception-handlerを使っているか

使ってなかった。なんでこれC側にあるんだろう状態だった。

C側で例外投げたらどうなるの

この記事の肝、気になったら続きを読んで。

一つ目は特筆することもなく。Cで書くと複雑怪奇になるんだけど（なってた）、使ってないしSchemeに移動させてコードがかなりすっきりした。VMのサイズも１ワード減っていい感じだと思われる。(core)ライブラリだけでは使えなくなったけど、これに依存するコードはないと思うのでまぁ、問題ないだろう。

二つ目はC側のコールスタック。継続の境界はSg_Apply系の関数で作られるんだけど、コールスタックの深いところで例外が投げられたらどうする？ってのをぼ～っと考えていた。はい、longjmpを使うだけでした。

具体的にはこんな感じで例外が投げられたとする。

Call stack
  +----------+  +---------+  +---------+  +---------+
--| Sg_Apply |--| C func1 |--| C func2 |--| C func3 |
  +----------+  +---------+  +---------+  +---------+
                                            ^^^^^^^
                                             C error

こんな感じでCのraiseは呼ばれた時点でVMのスタックにSchemeのraiseを呼び出す継続フレームを入れる。

Before
   PC = CALL
   Stack
   +--------------------------+
   | Return to previous frame |
   +--------------------------+
                :
After
   PC = RET
   Stack
   +--------------------------+
   | Return to calling raise  | --> PC to return = CALL raise
   +--------------------------+
   | Return To previous frame |
   +--------------------------+
                :

with-exception-handlerとraise-continuableの組み合わせだと、呼び出し元に戻る必要があるので一つ前のフレームを飛ばすわけにはいかない。この状態にしておいて、longjmpを呼び出し、VMのループを再起動する。もともとVM自体はVMループへのjmpbufを持っているのでそれを使うだけ。割とお手軽に解決できてしまった。

このバグのおかげで別のバグをつぶすこともできたし、YpsilonとMoshのバグを発見することもできたのでいいバグ（？）だった。

2016-07-29

Exception handling in C world

The following piece of code goes into infinite loop on Sagittarius.

(import (rnrs))
(with-exception-handler
 (lambda (k) #t)
 (lambda ()
   (guard (e (#f #f))
     (error 'test "msg"))))

I couldn't figure it out why this never returned at first glance. And it turned out to be very interesting problem.

The problem is related to continuation boundary and exception handlers. If you print the k, then only the very first time you'd see error object with "msg" after that it'd be "attempt to return from C continuation boundary.". But why?

This happens when you capture a continuation and invoke it in the different C level apply. Each time C level apply is called, then VM put a boundary mark on its stack to synchronise the C level stack and VM stack. This is needed because there's no way to restore C level stack when the function is returned. When a continuation is captured on the C apply which is already returned, then invocation of this continuation would cause unexpected result. To avoid this, the VM checks if the continuation is captured on the same C level apply.

Still, why this combination causes this? The answer is raise is implemented in C world and exception handlers are invoked by C level apply. The whole step of the infinite loop is like this:

guard without else clause captures a continuation.
When error is called, then guard re-raise and invoke the captured continuation. (required by R6RS/R7RS)
Exception handler is invoked by C level apply.
VM check the continuation boundary and raises an error on the same dynamic environment as #3
Goes to #3. Hurray!

There are 2 things I can do:

Create a specific condition and when with-exception-handler received it, then it wouldn't restore the exception handler. [AdHoc]
Let raise use the Scheme level apply. (Big task)

#1 probably wouldn't work properly. #2 is really a huge task. I need to think about it.

Syntax highlighter