時の羅針盤＠blog

2021-06-25

HTTP クライアントライブラリ

前回のポストから二ヶ月近く空いてしまった。要るもの追加してたら時間がかかったのと、まぁ家庭の事情というやつなので仕方ない(と自分に言い訳)

ここ数年、PostMan みたいなので REST API を叩くみたいなことをしていたのだが、それ以前は Scheme で書く元気があったのに急にやらなくなったなぁという気がしていた。原因をなんとなく考えて、多分 Sagittarius に付属している HTTP 周りのライブラリが貧弱かつイマイチ使い勝手が悪いというのが原因だと結論づけた。使い勝手に関しては完全に自分の主観なので、現状の素朴な実装の方がいい場合もあるかもしれないが、本業では Apache Http Client とか Spring Framework の WebClient とか使っているので真面目なアプリ/スクリプトを書く際はその辺の使用感が欲しいというのはある。

ということで作った。(net http-client) がライブラリ名である。こんな感じで使う。


(import (rnrs)
	(srfi :13)
	(rfc pem)
	(rsa pkcs :8)
	(rsa pkcs :12)
	(rfc x.509)
	(crypto)
	(net http-client)
	(util concurrent)
	(security keystore)
	(control threading))

(define pooling-config
  (http-connection-pooling-config-builder
   (connection-request-timeout 100)
   (time-to-live 3)))

(define keys&certs
  '(("eckey.pem" "eccert.pem")
    ("key.pem" "certificate.pem")))

(define (->private-key file)
  (let-values (((param content) (parse-pem-file file)))
    (pki->private-key (import-private-key PKCS8 content))))

(define (->certificate file)
  (let-values (((param content) (parse-pem-file file)))
    (make-x509-certificate content)))

(define (make-key-manager)
  (define (idrix-eu p)
    (define node (socket-parameter-socket-node p))
    (cond ((string=? node "prod.idrix.eu") "eckey.pem")
	  (else #f)))
  (define (badssl-com p)
    (define node (socket-parameter-socket-node p))
    (and (string-suffix? ".badssl.com" node)
	 "1"))
  (let ((ks (load-keystore-file 'pkcs12 "badssl.com-client.p12" "badssl.com")))
    (for-each (lambda (key&certs)
		(keystore-set-key! ks (car key&certs)
				   (->private-key (car key&certs))
				   "password"
				   (map ->certificate (cdr key&certs))))
	      keys&certs)
    (key-manager
     (keystore-key-provider-add-key-retriever!
      (make-keystore-key-provider ks "badssl.com" badssl-com)
      "password" idrix-eu))))

(define client (http:client-builder
		(cookie-handler (http:make-default-cookie-handler))
		(key-manager (make-key-manager))
		(connection-manager
		 (build-http-pooling-connection-manager pooling-config))
		(follow-redirects (http:redirect normal))))

(define url0 "https://prod.idrix.eu/secure/")
(define url1 "https://client.badssl.com/")

(define (run url)
  (define request (http:request-builder (uri url) (method 'GET)))
  (http:client-send-async client request))

(define (print-it f)
  (print (future-get (future-map http:response-status f)))
  (let ((headers (future-get (future-map http:response-headers f))))
    (for-each (lambda (k)
		(for-each (lambda (v) (print k ": " v))
			  (http:headers-ref* headers k)))
	      (http:headers-names headers))
    (newline))
  (~> (future-map http:response-body f)
      (lambda (f) (future-map utf8->string f))
      future-get
      print))

(print-it (run url0))
(print-it (run url1))

頑張ってコネクションプールとかクライアント証明書とかの機能を実装していたので予定より一月遅れた(特に証明書周り、いろんなものが ECDSA 対応してなかった…)。

最近の HTTP ライブラリっぽく非同期で色々やるようにしてある。なので、その恩恵を受けるには (util concurrent) を使う。同期でもリクエストを投げれるので同期処理が必要でも大丈夫。

API はまだアルファ版なので、リリース時には変わっているかもしれないが、そこまで大きく変わることはないはず。ドキュメント書かないとな…

2021-05-06

ソケットライブラリ

Sagittarius は随分長らくソケット周りのライブラリが二つ(SRFI も入れると三つ)あった。(sagittarius socket) と (rfc tls) である。どっちもそれなりに良くできていると思っているのだが、タイムアウト系の設定、特にコネクションタイムアウトを入れるのが大変だなぁという感じがしていた。(フラグが全部オプショナル引数なので、コネクションタイムアウトを後方互換を保ったまま入れるには引数の最後につける必要がある)。また、今時普通のソケットと TLS ソケットを分けて作るのも面倒が多いなぁと思いつつあったので、色々統合した感じのライブラリを作った。

(net socket) ライブラリはなんとなく今風な感じでソケットの作成を行うライブラリである。使い方はこんな感じ。


(import (rnrs)
        (net socket))

(define option (tls-socket-options
		(sni* '("google.com"))   ;; SNI
		(read-timeout 1000000))) ;; 1s (the unit is micro second)

(define socket (socket-options->client-socket option "google.com" "443"))

(socket-send socket (string->utf8 "HTTP/1.1\r\n\r\n"))
(utf8->string (socket-recv socket 500)) ;; -> some HTML

(socket-shutdown socket SHUT_RDWR)
(socket-close socket)

ソケット関連のオプションは全てオプションビルダーに押し込めて必要なら指定する感じ。最近(ここ数年、下手すれば十年くらい？)のライブラリはコンフィグをビルダーで作ってみたいな感じが多いので、それっぽくした。使い心地はまぁそれなりに悪くない感じなので、このスタイルとは個人的に相性がいいのだろう。

前回書いたレコードライブラリは主にこのライブラリを作るために作られたと言っても過言ではなかったりする。(まぁ、既に複数ライブラリで使用しているが)

なんでこんなライブラリを作ったかというと今風な HTTP クライアントが欲しかったから。それはまた別の記事で書くつもり。

2021-04-22

レコードビルダー

R6RS のレコードは便利である。またポータブルに継承をしたい場合にはこれしか手がない。不便な点としてはフィールドが増えるとデフォルトのコンストラクタでは引数の数が増大するところだろう。何かしらのクライアントとか、コネクションみたいなものであれば、protocol を使って逃げる手もあるが、データ型だと面倒になってくる。引数の数が片手の数で足りなくなると尚更である。

ということで、そんな悩みを緩く解決するライブラリを作ってみた。これである。使い方は非常に簡単でこんな感じで使える。

(import (rnrs)
        (record builder))

(define-record-type base
  (fields a b))
(define-syntax base-builder
  (make-record-builder base ((a 'ok))))
  
(base-a (base-builder)) ;; -> ok

(define-record-type child
  (parent base)
  (fields c d))
(define-syntax child-builder
  (make-record-builder child ((a 'nok) (c 'ok))))
(base-a (child-builder)) ;; -> nok
(child-c (child-builder)) ;; -> ok
(base-a (base-builder (from (child-builder)))) ;; -> nok

デフォルト値なしかつ、作成時にフィールドを指定しなければ #f になる。

ライブラリ自体は一応ポータブルに書いたつもりではあるので、コピーすればそのまま他の R6RS 処理系でも使えるはず(直接 Record Type Descriptor を eq? せずに record-predicate 辺りを使ってやる必要がある気はするが、まぁ、確認してない)

なんでこんなライブラリを作ったかというと、この次に書く記事で必要になったからである。来週辺りに書けたらいいなぁ。

2020-11-02

XML signature

It's been a while writing a blog post. I wasn't really actively working on Sagittarius or Scheme, so there was nothing I can write. (and I was super busy with my life, incl. work). That's the excuse. I hope I will keep posting something, but I can't promise.

Even though I wasn't actively writing Scheme, I've been trying to enrich XML related libraries of Sagittarius. As we, probably, all know, Scheme has a great de-facto standard SXML and its libraries such as SSAX or SXPath. However, if you really need to handle serious XML which requires a lot of namespace handling, these libraries are not enough. So, I started working on writing a DOM library 2 years ago (though it's not even an introduction article :p). At that moment, the final destination of XML library was handling SAML, so XML signature.

After those 2 years, I've finally implemented a very simple XML signature library. While I was reaching this, I thought I needed to implement XPath, which is incomplete by the way, and spent more than a year for that. But at least, there's finally something I can show. Let's see how it looks like.


(import (rnrs)
        (text xml dom)
        (text xml dom writer)
        (text xml dsig)
        (rfc base64)
        (crypto)
        (rsa pkcs :10)
        (math))
;; RSA private key
(define private-key
  "MIIBPAIBAAJBAM7xaDmTsYZj1ZxJOVpAkCXKp/2SmprG1IA90cGs4wr1fiCRWHQ+\
   sdJwiX2j932CW7DpjOg4GEn2CrPwWIQLfdkCAwEAAQJBAInnc5YS5xVwiBPq8+5B
   4g1dHE+tl5uW7ls7VwGijXZp6Mi7D+GJJ57w6wo1vzjGNIFUAs07+17XBRpPeqaW
   MVECIQDz2t+jH7zB/wSbf3titZtyRIaYGCiV20sb9Xc/56QWHQIhANk/6Ncem83E
   wJpJTS3r+QFgkPVhQF0VEZJ0bI7fDAntAiEAuStZqH/AELu6Xu2V3uWyjTl1zuaB
   YxHrXeauT8tw8Q0CIQDVjbMuM1JodO33O/L4HywIpIoaC10fouRBGNzVnH/TCQIg
   ZoOzTnUmv2X4DaxbH4kfBg5/9e/mwK8wLZy2gn+a2A0=")
;; PKCS 10 format RSA public key
(define public-key
  "MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBAM7xaDmTsYZj1ZxJOVpAkCXKp/2SmprG\
   1IA90cGs4wr1fiCRWHQ+sdJwiX2j932CW7DpjOg4GEn2CrPwWIQLfdkCAwEAAQ==")
(define signing-key
  (import-private-key RSA
   (base64-decode-string private-key :transcoder #f)))
(define verify-key
  (subject-public-key-info->public-key
   (import-public-key PKCS10
    (base64-decode-string public-key :transcoder #f))))

(define keypair (make-keypair signing-key verify-key))

(define dom (xml-file->dom-tree "test1.xml"))

;; signing context, using exclude c14n canonicalisation, SHA256
;; and RSAWithSHA256 signature
(define sc (ds:make-signing-context ""
                                    *xmldsig:canonicalization-exc-c14n*
                                    *xmldsig:digest-sha256*
                                    *xmldsig:rsa-sha256*))
;; writing the signed DOM
(xmldsig:sign! dom sc keypair) ;; also return the given DOM
((make-dom-writer) dom)
#|
The above would write something like this (may change in the future 
due to the fact that the writing option is not decided yet)
<doc xmlns="http://www.ietf.org" xmlns:w3c="http://www.w3.org" xml:base="something/else">
    <e1>
        <e2 xmlns="" xml:base="bar/" xml:id="abc">
            <e3 id="E3" xml:base="foo"/>
        </e2>
    </e1>
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#"><SignedInfo><CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/><SignatureMethod Algorithm="http://www.w3.org/2001/04/xmldsig-more#rsa-sha256"/><Reference URI=""><DigestMethod Algorithm="http://www.w3.org/2001/04/xmlenc#sha256"/><DigestValue>d1AgwW6w5CuCx4jqTM3zZBheHCg0AdEt93OiG599yHQ=</DigestValue></Reference></SignedInfo><SignatureValue>KrNNBtxw4ppGVOCWWndW6INDexdXs5Ei1/GqiUFwofjwrGmKmEw4hrCLG7p86StJ5kfGHYncezvr
exggfQSVZw==</SignatureValue><KeyInfo><KeyValue><RSAKeyValue><Modulus>zvFoOZOxhmPVnEk5WkCQJcqn/ZKamsbUgD3RwazjCvV+IJFYdD6x0nCJfaP3fYJbsOmM6DgYSfYK
s/BYhAt92Q==</Modulus><Exponent>AQAB</Exponent></RSAKeyValue></KeyValue></KeyInfo></Signature></doc>
|#

The test1.xml looks like this

<?xml version="1.0" encoding="utf-8" ?>
<doc xmlns="http://www.ietf.org" xmlns:w3c="http://www.w3.org" xml:base="something/else">
    <e1>
        <e2 xmlns="" xml:id="abc" xml:base="bar/">
            <e3 id="E3" xml:base="foo"/>
        </e2>
    </e1>
</doc>

I'm trying to keep the user level API as simple as possible. So the only thing users need to do is preparing a key pair and choosing the signing algorithms. At this moment, it doesn't handle references and transformers. But this is good enough as the first step. (And implementing transformer requires XPath, so may come veeeeery later). Now, I can sign an XML document, so the next step would be verifying the signature. I hope I can write a post soon enough before I forget what I've done.

It took me more than 2 years to reach here, during the period, I was more or less demotivated to write Scheme for some reason and recently, I decided to write something every day (only weekday, weekends are too busy with family life...). It was very small steps I've been making but at some point, I could reach somewhere. I'm a bit touched by this :D

2019-09-10

Saven: あなたの悩みを救うかもしれないビルドツール

Scheme にはSNOW!、Akku等のリポジトリ的がある。個人的にはこれらに乗っかってプログラムを組んだ方が楽だと思っているのではあるが、これらのリポジトリには登録されていないライブラリの依存関係を手作業でインストールするのは面倒。個人的によく使う r7rs-postgresql、r6rs-redis はどこのリポジトリにも入ってなかったりする。GitHub にコードがあるだけで、パッケージのパの字も考慮されていない(製作者の顔が見てみたいものだ)。リポジトリが使えれば楽だが、使えない状況である。となれば、今欲しいのはリポジトリではなくビルドツールではないだろうか？

ライブラリの依存関係は割と頭の痛い問題だ。例えば、r6rs-redis は r6rs-usocket に依存していて、r6rs-usocket は r6rs-pffi と r6rs-psystem に依存している。拙作の Pegasus はこの依存関係を考慮するように設計されているのだが、如何せんこれらのライブラリは Pegasus に登録されていない(製作者の以下略)。となると、手作業での依存関係解決が必要になる。作っているものが一つであればそれも問題ないのかもしれないが、複数になると一気にその手間は跳ね上がる。環境が変わればやり直しというのもジワジワと効いてくる。

Saven は依存関係を解消するビルドツールとして作られた。現状は GitHub 常にあるライブラリを解決できる。例えば、r6rs-mongodb と r6rs-pffi に依存するモジュール foo の定義はこんな感じで書ける

---
# sav.yaml
name: foo
dependencies:
  - type: github
    name: ktakashi/r6rs-mongodb
    paths:
      - src
  - type: github
    name: ktakashi/r6rs-pffi
    paths:
      - src

後は、sav build や sav test のように使える。複数ライブラリを構築したい場合にも使える。例えば、上記モジュール foo は親モジュール bar を持つとする。bar はこんな感じになる。

---
# sav.yaml
name: bar
modules:
  - foo
  - baz

さらに、モジュールの参照もこんな感じで可能

---
# sav.yaml
name: baz
dependencies:
  - type: module
    name: foo
    scope: test # Only used by tests

Saven を使えばビルド時、主にテスト時、に気になる依存関係を解決してくれる。まだまだ足りない機能の方が多いが、既にかなり楽ができるようになった。GitHub にしかないライブラリの依存を解決するのに、サブモジュール機能を使う必要がなくなったのは大きい。いつも通り欲しい機能順に実装されていく予定。

以下はどうでもいい話
Saven は Java の Maven にとてもインスパイアされている。
YAML 以外にも使えるフォーマットあるんだけど、なんとなく YAML が一番楽(今のところ)
Saven は英語の save がオランダ語化されたもので実際の単語だったりする。

2019-07-08

ベンチマークしてみる

こんなツイートを見かけた。

charのlistをequal?とか適当なcdr再帰で比較するのと、そのlistと同じ文字列が表すsymbolをeq?で比較すんのって処理どっちが早いんだろう…
— Aaron先生(でがらしのすがた) (@takum97) 1 July 2019

個人的にシンボルに変換するのはあり得ないかなと思ってはいるのだが、equal?と再帰はどっちが速いか分からない(特に Sagittarius では C 側で実装されているし)。ということでベンチマークを取ってみた。単純な比較でいいかなぁと思ったので、スクリプトは以下のようなものにした。

#!r6rs
(import (rnrs)
        (bench time))

(define string (let-values (((o e) (open-string-output-port)))
                 (do ((i 0 (+ i 1)))
                     ((= i 1000) (e))
                   (put-char o (integer->char i)))))
(define char-list0 (string->list string))
(define char-list1 (string->list string))

(define (->symbol cl)
  (let-values (((o e) (open-string-output-port)))
    (put-datum o cl)
    (string->symbol (e))))

(benchmark 1000 #t (lambda () (equal? char-list0 char-list1)))
(benchmark 1000 #t (lambda ()
                     (let loop ((cl0 char-list0) (cl1 char-list1))
                       (cond ((and (null? cl0) (null? cl1)))
                             ((or (null? cl0) (null? cl1)) #f)
                             ((char=? (car cl0) (car cl1))
                              (loop (cdr cl0) (cdr cl1)))
                             (else #f)))))
(benchmark 1000 #t
           (lambda () (eq? (->symbol char-list0) (->symbol char-list1))))

(bench time)はこんな感じ(Sagittarius 用)

#!r6rs
(library (bench time)
    (export benchmark)
    (import (rnrs)
            (time))
(define (benchmark count expected thunk)
  (define (do-benchmark count expected thunk)
    (do ((i 0 (+ i 1))) ((= i count))
      (unless (equal? expected (thunk)) (error 'benchmark "invalid result"))))
  (time (do-benchmark count expected thunk)))
)

Chez 用も大体似たようなもの。以下が結果。

$ scheme-env run chez@v9.5 --loadpath=. --program bench.scm
(time (do-benchmark count ...))
    3 collections
    0.024401110s elapsed cpu time, including 0.000328248s collecting
    0.024408000s elapsed real time, including 0.000335000s collecting
    25477792 bytes allocated, including 25192304 bytes reclaimed
(time (do-benchmark count ...))
    no collections
    0.002436587s elapsed cpu time
    0.002442000s elapsed real time
    0 bytes allocated
(time (do-benchmark count ...))
    29 collections
    0.144383753s elapsed cpu time, including 0.000803779s collecting
    0.144402000s elapsed real time, including 0.000838000s collecting
    249044288 bytes allocated, including 244363280 bytes reclaimed

$ sash -L. bench.scm

;;  (do-benchmark count expected thunk)
;;  0.111818 real    0.213437 user    0.025128 sys

;;  (do-benchmark count expected thunk)
;;  0.037333 real    0.037329 user    4.0e-600 sys

;;  (do-benchmark count expected thunk)
;;  0.191468 real    0.268644 user    0.019184 sys

以外にも再帰が一番速いっぽい。Chez でやってもそうならまぁそうだろう的な適当な意見だけど。予想通りシンボルにするのは遅い。->symbolを見ればわかると思うが、普通にオーバーヘッドが大きいというのがある。メモ化するとかすれば何とかなるかもしれないが、

equal? でハッシュテーブルを作ったら意味ないだろうし、あまりいい実装が思い浮かばなかったので省略している。

特にまとめはない。

2019-05-24

R6RS ライブラリ周りの解説

ツイッターで R6RS のライブラリ解決の挙動について言及している呟きを見かけたので、ちょっと書いてみることにした。あくまで Sagittarius 内部ではこうしているというだけで、これが唯一解ではないことに注意。

R6RS 的なライブラリファイルの位置付け

R6RS 的にはライブラリはファイルである必要はない。ライブラリは式の定義が書かれていると記載されているだけで特にそれがどこにあるかということには言及していないからだ。これを都合よく解釈すれば、処理系の判断でライブラリを In Memory にしておいても問題ないし、マシン語の塊にしておいても問題ないということである。またライブラリは Top level のプログラムではないので、スクリプトファイルにその定義を置くことは許されていない(っが Sagittarius では利便性のため許していたりする)。

このことから、ライブラリを外部ファイルに置くというのは実は処理系依存の挙動であると言える。言い換えると、R6RS が提供する標準ライブラリ以外を使ったプログラムは処理系依存ということになるし、ライブラリ自体を記述することは処理系依存になるということでもある。

要するに R6RS 的にはライブラリという枠組みは用意するけど、それらをどう扱うかは適当に都合よく解釈してねということだ。ある意味 Scheme っぽい。これを踏まえつつ、Sagittarius ではライブラリの解決がどう行われるかを記述していく。

import 句

Top level のプログラムはプログラムの開始に import 句を一つ必ずもつ必要がある。import 句はこの宇宙のどこかにある指定されたライブラリを探し出す必要がある。Sagittarius では探索範囲を load path または In Memory としかつ、必要であればライブラリ名をファイル名に変換して探索する。処理としては

ライブラリ名で読み込み済みライブラリを検索
見つかったらそれを返す
見つからなかったら、ライブラリ名をファイル名に変換し、 load path 上を探し load する
1を試し、見つからなかったらエラー

というようなことをしている。

ライブラリ名の変換はファイルシステムの制約も入ってくるので、シンボルはエスケープしたりしている。

余談だが、load する代わりに read して eval した方がいいような気がする。load だと、ライブラリ以外の式も評価されてしまうというオマケが付くからなのだが、これに依存したコード書いてたか記憶がない…

library

ライブラリは書式以外は処理系依存であるということは上記ですでに触れたので、ここでは de-facto な挙動と完全処理系依存の挙動を記して、どうすればある程度ポータブルにライブラリを記述できるかを示すことにする。

De-facto な挙動

R6RSのライブラリは拡張子 .sls を使う
特定の処理系のみに読み込んで欲しい場合は .sagittarius.sls の様な .処理系名.sls 形式を使う
ライブラリファイル名は空白をファイルセパレータに置き換える。例 (rnrs base) なら rnrs/base.sls になる。

処理系依存な挙動

load path の指定。処理系ごとに指定の仕方が違う。既定の場所も処理系ごとに違う。
複数ライブラリを一ファイルに押し込める

処理系によっては許している(例: Sagittarius)
基本ポータブルにしたいなら使わないか、処理系依存の処理を押し込めたファイルのみに適用する

meta キーワード
R6RS で定められているが、処理系ごとに挙動が違う。以下のどちらかになる
1. 完全無視
2. フェーズを意識する
ので、よりポータブルにしたいなら、フェーズを意識して書いた方が良い。

meta キーワード

上記で meta キーワードについて多少触れたので折角なのでもう少し突っ込んでみることにする。

いくつかの処理系、知っている限りでは plt-r6rs と André van Tonder の展開器を使っている処理系だけだが、ではマクロ展開のフェーズ管理を厳密に行う。端的に言えば、マクロ展開時に import された束縛と実行時に import された束縛は別物として扱われるということである。例えば以下のコード

(import (rnrs)
 (only (srfi :1) make-list))

(define-syntax foo
  (lambda (x)
    (define (ntimes i n)  ;; >- expand (meta 1) phase
      (make-list (syntax->datum n) (syntax->datum i)))
    (syntax-case x ()
     ((_ y n)
      (with-syntax (((y* ...) (ntimes #'y #'n)))
        #'(display '(y* ...)))))))
(foo x 5)

フェーズに厳しい処理系だと上記はエラーになるが、フェーズに緩い(または自動でフェーズを検出する)処理系だと上記は(x x x x x) を表示する。これをポータブルに書くには SRFI-1 を import する部分を以下のように書き換える必要がある。

(for (only (srfi :1) make-list) expand)

Sagittarius はフェーズに緩いので、マクロが展開される環境 ≒ 実行時環境になっている(厳密には違うが、処理系自体を弄らない限り意識することなないはず)。

フェーズは低レベルマクロを使用しかつその中で (rnrs) 以外のライブラリを使うもしくは、(meta 2) 以上のコードを書く際に意識する必要がある。

(meta 2) はマクロ内で内部マクロを使いかつその内部マクロで別ライブラリの束縛を参照する必要があるので、普通にコードを書いていたらまずお目にかからないだろう。ちなみに、マクロ内で (meta 0) の束縛を参照するのはエラーなので、マクロ内部での内部定義はコピペが横行しやすい。

割とどうでもいいのだが、(rnrs) またはその下にぶら下がっているライブラリ(例: (rnrs base))は

(meta 0) と (meta 1) で export されているのだが、これを他のライブラリでやる方法は R6RS の範囲では定められていなかったりする。

論旨がまとまらなくなってきたからこの辺で終わり。

2019-04-12

R6RS MongoDB with transaction

I've made a portable R6RS MongoDB library, I think, last year. Now I needed to support transaction so decided to implement it. Since MongoDB 4.0.0, it supports transactions. However, their document doesn't say which command to use or how. The only thing it mentions is I need to set up replica sets.

If you there's no documentation, then what you can do is reverse engineering. So, I've written a couple of scripts to set up MongoDB with transaction and proxy server which observes the commands.

The followings are the scripts I made to investigate:

The first one sets up the server which receives client command and sends to the MongoDB server. It does also dump both requests and responses. The second one sets up the docker network and instances of MongoDB with replica sets and executes the script files with mongo shell. Then the third one prints wire protocol commands in, sort of, a human-readable format.

With this investigation, I've figured out that I need to add lsid, txnNumber, autocommit and startTransaction. Okay, I've never seen them on the document, so I have no idea how these options works, but just followed the example. Then, here comes the transaction support.

How to use
This is an example of a transaction:

#!r6rs
(import (rnrs)
 (mongodb))

(define conn (make-mongodb-connection "localhost" 27017))
(define collection "txn")

(open-mongodb-connection! conn)

;; create the collection a head
(mongodb-database-run-command db `(("create" ,collection)))

(let* ((session (mongodb-session-start conn)) ;; start session
       (db (mongodb-session-database session "test"))) ;; create a database with session
  (guard (e (else (mongodb-session-abort-transaction session)))
    (mongodb-session-start-transaction! session) ;; start transaction
    ;; okay insert documents using usual database procedure
    ;; NB: has to be command, not other procedures...
    (mongodb-database-insert-command db collection
         '#((("id" 1) ("foo" "bar"))))
    (mongodb-database-insert-command db collection
         '#((("id" 2) ("foo" "bar"))))
    ;; and commit
    (mongodb-session-commit-transaction! session))
  ;; session must be end
  (mongodb-session-end! session))

(let* ((db (make-mongodb-database conn "test"))
       (r (mongodb-database-query db collection '())))
  (mongodb-query-result-documents r))

I haven't implement any utilities of transaction related procedures. So at this moment, you need to bare with low-level APIs.

How it works
Maybe you don't want to know, but it's nice to mention. When a session is created, then it also creates a session id. Then the database retrieved from the session adds the session id to query messages (OP_QUERY, not OP_MSG). Once mongodb-session-start-transaction! procedure is called, then it allocates transaction number and after this, the database also adds transaction information.

If the MongoDB server doesn't support the transaction, then the session automatically detects it and doesn't send any session or transaction related command.

And again, I'm not sure if I implemented correctly or not.

Once the official document of the transaction commands is written, I'll come back and review.

2019-04-01

JSON 周りのライブラリ

宝クジが大当たりしました。

四月馬鹿お終い。

Sagittarius は意外にも JSON 周りのライブラリが充実している。開発者である僕の本業が Web 系だからというのも要因の一つだと思う。一昔前の XML みたいな位置に JSON がいるのが大きい。最近書いてるアプリでこれらのライブラリをふんだんに使って我ながら便利な物を書いたものだと感心したので宣伝を兼ねて自慢することにする(これくらいの勢いじゃないとブログ書くネタがないともいう)。

簡単なクエリ
(text json pointer) は簡単な JSON クエリを提供する RFC6901 を実装したもの。対象となる JSON の構造や配列の要素番号が予め分かっている時に使える。こんな感じ

(import (rnrs) (text json pointer) (text json))

(define id-pointer (json-pointer "/id"))
(id-pointer (call-with-input-file "a.json" json-read))

これで JSON オブジェクトが id フィールドを持っていれば引っ張ってくる。id-pointer はクロージャなので再利用可能。

複雑なクエリ
(text json jmespath) は割と複雑なクエリ言語を提供する。前にも紹介記事を書いてるので簡単な使い方はそっちを参照。JSON Pointer では書けないようなクエリを書く際はこっちを使う。例えば、JSON オブジェクトを要素にする配列から name フィールドと description フィールドのみを返すようなクエリはこんな感じで書ける

(import (rnrs) (text json jmespath) (text json))

(define name&desc (jmespath "[].[name, description]"))
(name&desc (call-with-input-file "b.json" json-read))
;; -> (("name of a object" "description of a object") ...)

これ以外にも便利な使い方や、組み込みのクエリー関数があって便利。

変更
(text json patch) は RFC6902 JSON Patch を提供する。他言語での実装と違うのは入力を変更しない点。関数型とかそう言うのではなく、副作用で実装するのは無理ゲー(と言うか不可能)だったからと言うのが真実。こんな感じで使う

(import (rnrs) (text json patch) (text json))

(define id-patcher (json-patcher '(#(("op" . "add) ("path" . "/id") ("value" . 1234)))))
(id-patcher (call-with-input-file "c.json" json-read))
;; -> #(("id" . 1234) ...)

id-patcher はクロージャなので再利用可能。

与太話
これらのライブラリは Scheme に於けるベクタ操作の貧弱さに辟易したので開発されたとも言える。Sagittarius でデファクトとして使っている JSON の S式表現は元々 Chicken Scheme にあった実装を持ってきている。理由は何故かは知らないが、これが JSON オブジェクトをベクタで表していたのが事の発端とも言える。これらのライブラリは元の表現が普通に alist でやられていたらきっと産まれなかっただろうので、人間万事塞翁が馬みたいな気持ちになるなる(何度変えてやろうかと呪ったか分からん…)
結果を見ればこの上なく便利なライブラリが出来上がったとも言えるのであの時のどす黒い感情はうまく浄化されたのだろう。ということにしておく。

2019-03-15

R7RS-large タンジェリン

タンジェリンってみかんじゃないのか…

2月にR7RS-largeのタンジェリンエディションがでた。ブログ記事を書こうとずっと思っていたのだが、所謂「life gets in」な状態だったので中々時間も取れずズルズルと一月以上経ってしまった(言い訳)。前回のR7RS-largeはレッドエディションだったのだが、レッドの次はオレンジなのに、いきなりタンジェリンになった。理由はこの辺(要するに準備できなかったらしい)。

さて、前置きが長いとダレるのでまずは結果。温州みかんになれたSRFIは以下：

SRFI 115 (combinator-based regular expressions) - (scheme regex)
SRFI 141 (comprehensive integer division operators) - (scheme division)
SRFI 143 (fixnum operators) - (scheme fixnum)
SRFI 144 (flonum operators, R6RS plus $) -$ (scheme flonum)
SRFI 146 (persistent tree and hash mappings) - (scheme mapping) (scheme mapping hash)
SRFI 151 (comprehensive bitwise operations on integers) - (scheme bitwise)
SRFI 158 (backward-compatible additions to SRFI 127 on generators) - (scheme generator) : 既存の置き換え
SRFI 159 (combinator formatting) - (scheme format)
SRFI 160 (comprehensive homogeneous vector library, including inexact-complex vectors) - (scheme vector @)

(scheme vector @) の @ の部分には u8, s8, u16, s16, u32, s32, u64, s64, f32, f64, c64, c128 が入る。(こいつらを vector と呼ぶと混乱する気がするがいいのかね？) 現在のHEADでSagittariusはSRFI 159とSRFI 160を除く全てをサポートした。除外しているSRFIについては記事の最後に理由(愚痴？)を書く。

この辺りのエディションから比較的新しいSRFIがR7RS-largeに取り入れられるようになるのかな？オレンジが何を入れようとしているのかよく分かっていないのであくまで個人的な感想である。そうは言っても、SRFI 143、144、151はR6RSの拡張みたいなものなので、R7RS-largeでも必要とされたと思えばいいのかもしれない。SRFI 158は多少毛色が違うというか、こうする事で既に決定したライブラリを拡張できるというのを示したとも言えるかもしれない。一度決定したらずっとそのまま言われるよりは柔軟でいい。

SRFI 115は古めのSRFIではあるのだが、どれくらいの処理系がサポートしているのかよく分かっていない。Chibi(参照実装元)とSagittariusはサポートしているが、他にあるのかな？(まぁ、そんな事言い出したら今回入ったSRFI自体どれくらいの処理系がサポートしているのやら…)

SRFI 146は個人的に好きでなかったのだが、この機会にサポートすることにした。ライブラリ名の規則が分かりづらいというのが主な理由だったのだが、いまだに妙な気分ではある。

新たに8つのライブラリが追加されたR7RS-largeだけど、どの処理系が追随するかはよく分かっていない。Chibiは次のリリースでタンジェリンをサポートするらしい。Sagittariusは上記の通り、SRFI 159とSRFI 160を除くR7RS-largeをサポートする。(ひょっとしたらSRFI 160もサポートするかもしれないが、今の所その予定はない。理由は愚痴から推測して)

ここから愚痴。
SRFI 159とSRFI 160は正直微妙だなぁと思っている。SRFI 159はよりSchemeっぽいformatを提供するライブラリなのだが、正直formatの方がいいなぁと思ったり。使い慣れてるし。さらにはR7RS-largeに入ったことでSRFIの議論が再開されたりしていて、なんか泥沼感が出ているし。
SRFI 160に関してはそもそもファイナルにすらなっていない。つまり、議論の途中なのに議長権限でリストに入れたとも言える。鶴の一声、鳴り物入りとか言えばいいのかもしれないが、どうにも唸り声が出てしまう。

2019-01-02

(My) best practice of conditions

I've been writing portable/non-portable Scheme libraries for a rather long time and kind of getting my best practice of how to make conditions for libraries. So let me share the opinion.

Disclaimer

This is not community approved best practice nor it can be applied to the latest standard (R7RS). So I don't have any responsibility for your implementation can't handle or claims that nobody is writing this type of code :p

Basics

Condition system

R6RS has a very nice, (yet it was very controversial), the feature called condition system. It is built on top of, again very controversial, record type system. The basic ideas are:

Conditions are inheritable (the absolute base is &condition)
Conditions are compoundable. (using condition procedure)

These 2 concepts make the condition system very beautiful compare with, say, Java's exception.

How to use

The very basic usage is like this:

(define-condition-type &foo &error
  make-foo-error foo-error?)

This defines a condition type of &foo. Then you can signal the condition with raise or raise-continuable procedure.

My practices

Currently, I'm using conditions with the following rules, which I think the best at this moment.

Should create at least one library specific condition
If you are creating a library named foo, then the library should have a specific condition such as &foo unless the library provides only utilities. (e.g. (srfi :1) provides only list utilities).
Must not use the error procedure
If you see the standard error, then it's a bad sign. The standard conditions are good for general purposes but not good for a library specific error signalling.
Should split conditions per phases
If a library has several phases, then the conditions should be split. For example, (text json jmespath) has compilation and evaluation phases. So the condition should be split into 2, one for compilation time, the other one for evaluation time.
Must not put too many fields
Conditions are records, thus users can put as many as fields onto it. A condition must contain minimum information or resources. If you need more information, then use &irritants
Should use composite then inheritance
Conditions are records (I'm saying it twice because it's important), means you can only inherit one base condition. However, sometimes you want to put meta information such as &i/o. In this case use the condition procedure instead of creating a new condition type which inherits &i/o.
For example, suppose you have &foo condition, which inherits &error. Now your library should also signale &i/o when an I/O operation failed. Theoretically, you have the following 2 options:
1. Composite &i/o instance
2. Create a new condition type which inherits &i/o
As long as your base condition is not a subtype of &i/o, then you should use option number 1. In this manner, library users can handle the error situation easier by just adding a guard clause with foo-error? (suppose your condition predicate of &foo is foo-error?). And users can still check I/O error with i/o-error?

Conclusion

I'm quite happy with the above rules whenever I use the libraries constructed with it (sort of dogfooding).

2018-12-20

(usocket): R6RS portable socket library

Motivation

When I wrote the MongoDB client library in R6RS portable manner, I have included socket procedures inside of the library. Now, I want to write Redis client, then I've noticed that it's very inconvenient that if there's no R6RS portable socket library. We have SRFI 106 for a socket library, however, it's not widely implemented especially on R6RS implementations. So I've decided to make it and it's here.

Portable R6RS socket library

Example

The library supports TCP and UDP client and server sockets. Most of the time, I only need TCP client, but it's nice to have in case it's needed. The very basic HTTP request call using this library would look like this:

(import (rnrs)
        (usocket))

(define client (make-tcp-client-usocket "google.com" "80"))
(put-bytevector 
 (client-usocket-output-port client)
  (string->utf8 "GET / HTTP/1.1\r\n\r\n"))

(utf8->string (get-bytevector-n (client-usocket-input-port client) 1024))
;; -> response from the server

(usocket-shutdown! client *usocket:shutdown-read&write*)
(usocket-close! client)

For portability, we don't provide any socket specific operation other than shutdown and close. Everything else needs to be done via input or output port.

Supporting implementations

The library supports the following implementation:

Sagittarius (0.9.4 or later)
Chez Scheme (v9.5)
Larceny (1.3)

Chez and Larceny require PFFI and psystem as their dependencies and they can only run POSIX environment (not on Windows, PR is always welcome :) ).

Who is using this?

As I mentioned above, I'm using this library to create R6RS portable Redis client. It's at least good enough to implement the client.

2018-12-02

SRFI-123の紹介

この記事は Lisp SETF Advent Calendar 2018 二日目の為に書かれました。

0x25歳になって最初に書く記事です(実際には投稿予約の機能を使っているので数日若いが…)。いい感じの区切りの歳の最初の記事としてはイマイチな題材かもしれないなぁとも思いつつ…

SRFI 123: Generic accessor and modifier operators は Taylan Ulrich Bayırlı/Kammer によって提唱された Gauche 風の総称アクセサを提供する SRFI です。取りあえず簡単な例を見てこれの何が嬉しいのかというのを確認してみましょう。

(import (rnrs) (srfi :123))

(define l (list (list 1) 2 3)) ;; Must not a literal list :)

(ref 1 0) ;; -> (1)
(ref* l 0 0) ;; -> 1
;; ~ is an alias of ref*
(~ l 0 0) ;; -> 1

(set! (ref l 1) 4)
(ref l 1) ;; -> 4

(set! (ref* l 0 0) 5)
(ref* l 0 0) ;; -> 5

こんな感じです。上記の例ではリストを使っていますが、その他のデータ又はレコード型でも似たような感じで動作します。

基本的にはref又はref*が適切なデータへのアクセス手続きに変換すると思えば良いです。上記の例ならば、refがlist-refに変換しています。

set!は SRFI-17 を利用して実現しています。例えば以下のコード

(set! (ref l 1) 4)
(set! (ref* l 0 0) 5)

は、それぞれこのように変換されます

((setter ref) l 1 4)
((setter ref*) l 0 0 5)

ちなみに、SRFI-17 については LISP Library 365 で書いたSRFI-17の紹介を参照するといいかもしれません。

余談
この SRFI のポータブルな実装を覗くと、もの凄く頑張って色々なデータ型のサポートをしているのが見て取れます。この頑張りは総称函数があれば不用だろうなぁいう感じがするので、この SRFI の前に総称函数の SRFI があればよかったのではとも思ったりします(もしくは、Tiny CLOS を使って実装するとか？)。

追記
(ref* 1 0 0)を修正(1:数字の一 -> l:小文字のL)

2018-10-08

JMESPath

JSON用のクエリーがほしいなぁと思いいろいろ探していたのだが、まともな仕様があるものが少ない。JSONPathは超有名なブログ記事のみで今一不安だし、JSONiqはちょっとしたクエリ書くだけにしては巨大すぎる感があるし、JSON Queryでググると山のようにオレオレ実装が出てきて嫌気がしたというのがある。(今思えばJSONiqでもよかった気はする)

そんな中で手ごろなサイズでそれなりに仕様が固まっていて、AWS CLIでも使われているらしいJMESPathなるものを知った。名前は微妙だけど(提唱者名が入ってる)AWS CLIで使われているということはそれなりに実績もあるんだろうし、準拠テストもあるしこれでいいかということで実装してみた。(ここまで前置き)

ということで、こんな感じで使える

(import (rnrs) (text json jmespath))

((jmespath "a") '#(("a" . "foo") ("b" . "bar") ("c" . "baz")))
;; -> "foo"

前に作った(text json pointer)と似たような使用感にしてある(再利用可能という意味で)。後、自分が必要だという理由でいくつか追加の手続きを追加している。名前は以下

parent
unique
remove
remove_entry
is_odd
is_even

名前見れば大体何するか分かりそうなので説明は割愛。詳しくはドキュメント読んで。

実装する上で辛かったこと
JMESPathは一応BNFが定義されているのだが、これが左再帰バリバリで書かれている。しかも、各言語での実装はBNFからパーサを作ってるタイプではないので(Javaの実装はANTLRを使っているのでBNFそのままだったが)、そのままPEGに移植することができない。しょうがないので左再帰を全部除去しつつ、ASTの意味が著しく異ならないようにするのにものすごく苦労した。この辺りの経験値はまだ低いなぁと実感。

また、仕様に書かれている例と準拠テストの挙動が違ったり、仕様がやけに曖昧だったりと結構落とし穴が多かった。(個人的にProjection周りの挙動が仕様からは読み取れなかったので、実装を確認する必要があった)

実装した後で発覚した事実
さて実装し終わったし使い倒すぞ！と意気込んでみたら、実はあまり必要ない感じになっている。結局JSONの構造が分かっていないと使えないものではあるので(GLOBみたいなネスとした曖昧マッチはない)、JSONPointerとベクタライブラリでことが足りるのではという感が出てきている。ベクタJSONの簡易な変形が目的なら使い出があるかもしれない。

2018-09-16

JSON パーサ

JMESPathの実装をしていてJSONのパーサをPEGで書いた方が便利だなぁということに気付いたので実装してみた。仕様は最新のJSONの仕様(RFC8259)を参照することにした(別にjson.orgのでもよかったんだけど、なんとなく。どうせ一緒だろ？)。

どうでもいいのだが、「\」を使った文字列のエスケープがかなり制限されてる気がする。「\a」とか「\c(意味のないエスケープ)」とかは仕様に従うならイリーガルになるっぽい。どうしようかな？

書きあえるにあたって気になるのはもちろんパフォーマンス。以前使っていたPackratの実装だと現在の実装に比べて30倍程度遅い(参照: JSONパーサの性能改善)。ということで書いた後にベンチマークをとってみる。こんな感じのコードで測る。結論だけ先に言えば、現在の実装はかなり速いので、いろんな角度で速度を計測した。I/O有、I/O無等。

(import (rnrs)
 (text json parser)
 (text json)
 (sagittarius generators)
 (util file)
 (srfi :127)
 (time))

(define (parse parser) (call-with-input-file "large.json" parser))
(define-syntax time-parse
  (syntax-rules ()
    ((_ parser)
     (begin
       (newline) (display 'parser) (display " from file")
       (time (parse parser))
       (let ((in (open-string-input-port (file->string "large.json"))))
  (newline) (display 'parser) (display " in memory")
  (time (parser in)))))))

(time-parse json-read)
(time-parse parse-json)

(call-with-input-file "large.json"
  (lambda (in)
    (let ((lseq (lseq-realize (generator->lseq (port->char-generator in)))))
      (time (json:parser lseq)))))

結果は以下：

% sash -Lsitelib bench.scm
json-read from file
;;  (parse json-read)
;;  0.313142 real    0.313000 user    0.000000 sys

json-read in memory
;;  (json-read in)
;;  0.097882 real    0.109000 user    0.000000 sys

parse-json from file
;;  (parse parse-json)
;;  0.484309 real    0.531000 user    0.000000 sys

parse-json in memory
;;  (parse-json in)
;;  0.337894 real    0.344000 user    0.000000 sys

;;  (json:parser lseq)
;;  0.269573 real    0.266000 user    0.000000 sys

最初の二つが、現状の実装I/O有、I/O無。続いてPEG版のI/O有、I/O無、正格評価。何をどうひっくり返しても手作りの温かみのある実装に勝てないという結論に至る。I/O有で60%、I/O無で2.5倍のパフォーマンス劣化になることが問題になるかならないかというのもある。PEG(というかパーサコンビネータ)の性質上ある程度の性能劣化はしょうがないかなぁと思っていたが、ここまであるとなぁ。先にPEGの最適化をするべきかもしれない…

余談だが、PEGを使うとJSONパーサが30分程度で書けることが分かった。そろそろ手放せなくなってきたしドキュメント書かないとなぁ。

2018-08-28

JSON Schema

JSON Schemaというものがある。これはみんな大好きJSONにXSDよろしく型をつけようというものだ。XSDより簡単っぽいのでとりあえず実装してみた。こんな感じで使える。

// product.schema.json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "http://example.com/product.schema.json",
  "title": "Product",
  "description": "A product in the catalog",
  "type": "object",
  "properties": {
    "productId": {
      "description": "The unique identifier for a product",
      "type": "integer"
    }
  },
  "required": [ "productId" ]
}
// input.json
{
  "productId": 1
}

(import (rnrs)
        (text json)
        (text json validator)
        (text json schema))

(define product-validator 
  (json-schema->json-validator 
    (call-with-input-file "product.schema.json" json-read)))
(validate-json product-validator
  (call-with-input-file "input.json" json-read))
;; -> #t

もう少し凝った例を見てみる。本当ならhttps://json-schema.org/learn/にあるのを直接使いたかったが、どうもあまりメンテされてないらしく不整合があって使えなかった。なので、ちょっと長いがSchemaをだらだら書く。

// address.schema.json
{
  "$id": "http://example.com/address.schema.json",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "An address similar to http://microformats.org/wiki/h-card",
  "type": "object",
  "properties": {
    "post-office-box": {
      "type": "string"
    },
    "extended-address": {
      "type": "string"
    },
    "street-address": {
      "type": "string"
    },
    "locality": {
      "type": "string"
    },
    "region": {
      "type": "string"
    },
    "postal-code": {
      "type": "string"
    },
    "country-name": {
      "type": "string"
    }
  },
  "required": [ "locality", "region", "country-name" ],
  "dependencies": {
    "post-office-box": [ "street-address" ],
    "extended-address": [ "street-address" ]
  }
}
// geographical-location.schema.json
{
  "$id": "http://example.com/geographical-location.schema.json",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Longitude and Latitude Values",
  "description": "A geographical coordinate.",
  "required": [ "latitude", "longitude" ],
  "type": "object",
  "properties": {
    "latitude": {
      "type": "number",
      "minimum": -90,
      "maximum": 90
    },
    "longitude": {
      "type": "number",
      "minimum": -180,
      "maximum": 180
    }
  }
}
// card.schema.json
{
  "$id": "http://example.com/card.schema.json",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "A representation of a person, company, organization, or place"\
,
  "type": "object",
  "required": [ "familyName", "givenName" ],
  "properties": {
    "fn": {
      "description": "Formatted Name",
      "type": "string"
    },
    "familyName": {
      "type": "string"
    },
    "givenName": {
      "type": "string"
    },
    "additionalName": {
      "type": "array",
      "items": {
        "type": "string"
      }
    },
    "honorificPrefix": {
      "type": "array",
      "items": {
        "type": "string"
      }
    },
    "honorificSuffix": {
      "type": "array",
      "items": {
        "type": "string"
      }
    },
    "nickname": {
      "type": "string"
    },
    "url": {
      "type": "string"
    },
    "email": {
      "type": "object",
      "properties": {
        "type": {
          "type": "string"
        },
        "value": {
          "type": "string"
        }
      }
    },
    "tel": {
      "type": "object",
      "properties": {
        "type": {
          "type": "string"
        },
        "value": {
          "type": "string"
        }
      }
    },
    "adr": { "$ref": "http://example.com/address.schema.json" },
    "geo": { "$ref": "http://example.com/geographical-location.schema.json" },
    "tz": {
      "type": "string"
    },
    "photo": {
      "type": "string"
    },
    "logo": {
      "type": "string"
    },
    "sound": {
      "type": "string"
    },
    "bday": {
      "type": "string"
    },
    "title": {
      "type": "string"
    },
    "role": {
      "type": "string"
    },
    "org": {
      "type": "object",
      "properties": {
        "organizationName": {
          "type": "string"
        },
        "organizationUnit": {
          "type": "string"
        }
      }
    }
  }
}
// card.json
{
    "familyName": "Kato",
    "givenName": "Takashi",
    "adr": {
        "locality": "locality",
        "region": "region",
        "country-name": "The Netherlands"
    },
    "geo": {
        "latitude": 10,
        "longitude": 90
    }
}

バリデーションコードは以下

(import (rnrs)
        (text json)
        (text json schema)
        (text json validator)
        (srfi :26 cut))

(let* ((validators (map json-schema->json-validator
                        (map (cut call-with-input-file <> json-read)
                             '("address.schema.json"
                               "geographical-location.schema.json"))))
       (validator (apply json-schema->json-validator
                         (call-with-input-file "card.schema.json" json-read)
                         validators)))
  (validate-json validator (call-with-input-file "card.json" json-read)))
;; -> #t

一応公式Githubにあるテストは全部通る(ドラフト7、オプショナル除く)。

余談だが、最近書いたYAMLパーサと組み合わせることもできる。以下はカードYAML

---
familyName: Kato
givenName: Takashi
adr:
  locality: locality
  region: region
  country-name: The Netherlands
geo:
  latitude: 10
  longitude: 90

っで、コード

(import (rnrs)
        (text json)
        (text json schema)
        (text json validator)
        (text yaml)
        (srfi :26 cut))

(let* ((validators (map json-schema->json-validator
                        (map (cut call-with-input-file <> json-read)
                             '("address.schema.json"
                               "geographical-location.schema.json"))))
       (validator (apply json-schema->json-validator
                         (call-with-input-file "card.schema.json" json-read)
                         validators)))
  (map (cut validate-json validator <>)
       (call-with-input-file "card.yaml" yaml-read)))
;; -> (#t)

YAMLはドキュメントのリストを返すのでmap辺りでリストを回す必要がある。便利に使えそうな雰囲気がある。

2018-08-22

キャッシュバグと手続きの同一性

こんなバグに遭遇した。

ASSERT failure /home/takashi/projects/sagittarius/src/closure.c:50: SG_CODE_BUILDERP(code)

C assertなので、いかんともしがたいやつである。

再現コードはこんな感じ。

;; lib1.scm
(library (lib1)
  (export +closures+)
  (import (rnrs))

(define (foo e)
  (unless (string? e) (assert-violation 'foo "string" e))
  (lambda (v) (string=? v e)))

(define +closures+ `(,foo))
)

;; lib2.scm
(library (lib2)
  (export bar buz)
  (import (rnrs)
          (lib1))

(define (bar s) ((buz) s))
(define (buz) (car +closures+))
)

;; test.scm
(import (rnrs) (lib2))

((bar "s") "s")

何が問題化というと、キャッシュと最適化の問題だったりする。Sagittariusではexportされた変数は変更不可能というのを利用して、キャッシュ可能なオブジェクト(文字列、リスト等)を本来なら大域変数の参照になるところを実際に値にするという最適化がなされる。手続きがSchemeで定義されたもの(closure)であればキャッシュ可能なのと、+closure+に束縛されているのがリストなので、コンパイラはこいつを値に置き換える。

バグの修正はCONSTインストラクションに渡されるオブジェクトをチェックするというものになるのだが(こんな感じ)、まだ完全には直っていない模様(くそったれ！)

このバグで気づいたのだが、Sagittariusでは手続きの同一性が保証されない。以下のコードは1回目と2回目の実行で結果が異なる。

(import (rnrs) (lib1) (lib2))

(eq? (car +closures+) (buz))

個人的にはこれはR6RSなら11.5 Equivalence predicatesにある以下の例の範疇だと思っているのだが、R7RS的には常に#tを返さないといけなかったはず。

(let ((p (lambda (x) x)))
  (eq? p p)) ;; unspecified

(let ((p (lambda (x) x)))
  (eqv? p p)) ;; unspecified

いちおう両方準拠を謳っているから直さないとまずいかねぇ…

2018-08-15

YAMLパーサー

個人的にはYAMLは好きではないのだが、世の中の流れはYAMLに行っているのは明白かなぁと思っている。ということで、SagittariusにはYAMLのサポートを入れることにした。こんな感じで使える。

# test.yaml
%YAML 1.2
---
receipt:     Oz-Ware Purchase Invoice
date:        2012-08-06
customer:
    first_name:   Dorothy
    family_name:  Gale

items:
    - part_no:   A4786
      descrip:   Water Bucket (Filled)
      price:     1.47
      quantity:  4

    - part_no:   E1628
      descrip:   High Heeled "Ruby" Slippers
      size:      8
      price:     133.7
      quantity:  1

bill-to:  &id001
    street: |
            123 Tornado Alley
            Suite 16
    city:   East Centerville
    state:  KS

ship-to:  *id001

specialDelivery:  >
    Follow the Yellow Brick
    Road to the Emerald City.
    Pay no attention to the
    man behind the curtain.

(import (rnrs)
        (text yaml))

(call-with-input-file "test.yaml" yaml-read)

#|
(#(("receipt" . "Oz-Ware Purchase Invoice")
   ("date" . "2012-08-06T00:00:00Z")
   ("customer"
    .
    #(("first_name" . "Dorothy")
      ("family_name" . "Gale")))
   ("items"
    #(("part_no" . "A4786")
      ("descrip" . "Water Bucket (Filled)")
      ("price" . 1.47)
      ("quantity" . 4))
    #(("part_no" . "E1628")
      ("descrip" . "High Heeled \"Ruby\" Slippers")
      ("size" . 8)
      ("price" . 133.7)
      ("quantity" . 1)))
   ("bill-to"
    .
    #(("street" . "123 Tornado Alley
Suite 16
")
      ("city" . "East Centerville")
      ("state" . "KS")))
   ("ship-to"
    .
    #(("street" . "123 Tornado Alley
Suite 16
")
      ("city" . "East Centerville")
      ("state" . "KS")))
   ("specialDelivery"
    .
    "Follow the Yellow Brick Road to the Emerald City. Pay no attention to the man behind the curtain.
")))
|#

YAMLは一ファイルの中に複数ドキュメント含むことを許しているのでリストを返すことにした。デフォルトでは(text json)が返す書式と同じものを返すが、オプショナル引数でその辺を制御することもできる。書き出しは以下のようにする。

;; suppose variable yaml is bound to a YAML document
(yaml-write yaml)

;; if it's read by yaml-read, then it should be like this
(for-each yaml-write yaml)

書き出しはあまりこみったことをしないので(複数ラインリテラルとか、ラベルとか)、完全に元のドキュメントに復元はしない可能性がある。(ラベルくらいは実装してもいいかなぁとはブログ書いてて思った。)

これ書いてて思ったのは、YAMLの文法は思った以上に機械に優しくないということか。ヒューマンリーダブルかどうかは議論する気はないが(個人的には読みづらいと思ってる)、一文字ずつ読む感じのPEGでの実装はやる気をなくすレベルであった(ついでに公式サイトにあるBNFは人にも機械にも辛い気がする)。

あとは適当に使ってみて不具合をつぶしていくかね。

2018-07-19

パイプラインアイデア編(2)

前回何となく書いたパイプラインのアイデアをもう少し進めてみた。

例えばこんな感じで使えるとうれしいだろうか？

(import (rnrs)
        (util concurrent))

(define-pipeline-catalogue pipeline-catalogue-a
  (* => pipe1)
  ((sync pipe1) (=> pipe2)
                (-> epipe1))
  ((async pipe2 5) (symbol? pipe3sym)
                   (string? pipe3str)
                   (=> pipe3gen))
  (pipe3sym => *)
  (pipe3str => *)
  (pipe3gen => pipe3sym)
  ;; = (async epipe1 1)
  (epipe1 (=> pipe1)
          (-> !)))

;; so something
(define-pipe (pipe1 input) 'output)
;; do correction
(define-pipe (epipe1 error) 'output)

(define-pipe (pipe2 input) "string")
(define-pipe (pipe3sym input) input)
(define-pipe (pipe3str input) (string->symbol input))
(define-pipe (pipe3gen input) 'symbol)
(define-pipe (epipe1 e) 'recover)

(define pipeline-a (instantiate-pipeline-catalogue pipeline-catalogue-a))

;; async call
(pipeline-send-message! pipeline-a 'message)
;; waits
(pipeline-receive-message! pipeline-a)

パイプラインカタログはパイプのつながり方を定義し、define-pipeは実際のパイプを定義する。(パイプという名前はいまいちだから、define-pipe-unitにしようかな？) カタログの定義が終わった段階では特に何もせず、インスタンスを作って初めて使用可能にする。まぁ、再利用可能にするため。同期と非同期はちと微妙な感じもする。

問題はパイプラインを作るたびにスレッドを10個とか作りそうなところがあることか？ネットワーク通信みたいな重たい処理だけにほしいので、基本同期の方がいいのかな？もう少し寝かせた方がいいかもしれない…

2018-07-16

【備忘録】Windows 10 上である程度まともな開発環境を作る

今月から新しい職場になったのだが、前職と違い開発環境がWindowsであった。噂にはMacが与えられる予定だったらしいのだが、偉いさんの鶴の一声で却下されたとか…まぁ、大企業あるあるだと思って前向きに考えることにした。っで、今日環境がWindows 7からWindows 10にアップグレードされたので、WSLを使ってそれなりにまともな開発環境をこさえる努力をすることにした。

【Ubuntu on WSLを入れる】
Micorsoft Storeが使えればそれをそのまま使えばいい。っが、今回はStoreがブロックされているので直接Zipファイルをダウンロードする方法をとらざるを得なかった。詳細は以下のStack overflowが詳しい：
Is there a way of installing Windows Subsystem for Linux on Win10 (v1709) without using the Store?

Ubuntuのバージョンが16.04だったので、do-release-updateを使って18.04にした。

【VcXsrvを入れる】
まともなターミナルエミュレータを使わないとまともな開発環境は作れない。ここでいうまともの定義は少なくともtmuxがまともに動く程度(だが、Windowsの標準ターミナルだと画面がちらつくのだよ)。いろいろオプションはあるが、Windows側にX11サーバを立てる方法が一番楽かなぁと思いそれにした。

VcXsrvは64ビットバイナリがSourceforgeにあるので、それを落とす。軌道はマルチウィンドウであれば後は適当でもいいと思う。

【xfce4-terminalを入れる】
Gnomeでもいいのだが、軽い方がいいかなぁと思い。

【起動スクリプトを書く】
デフォルトのubuntu.exeではWindows標準ターミナルが開くので、起動スクリプトを書く。こんな感じ。

Set objShell = WScript.CreateObject("WScript.Shell")
objShell.Run "%LocalAppData%\Microsoft\WindowsApps\ubuntu1804.exe run DISPLAY=localhost:0.0 xfce4-terminal --working-directory=/home/takashi -x /bin/zsh -i", 0
Set objShell = Nothing

Storeを使わなかった場合は適当に展開先のパスに置き換える。VBScriptを使ってるのは余計なコンソールを起動したくないから。

【個人的な設定】
tmuxのデフォルトシェルをzshにする。以下を.tmux.confに追加する。
set -g default-shell /bin/zsh

以下は職場で必要だった設定。

【CA証明書の追加】
職場のネットワーク環境は独自のルートCA証明書をもっていて、そいつをTrustedストアにいれてやる。以下のようにする。

$ mv certificate.crt /usr/local/share/ca-certificates/
$ sudo update-ca-certificates

拡張子が.crtじゃないと認識してくれない。

Syntax highlighter