er-macro-transformer on top of syntax-case

There are numbers of low level hygienic macros in Scheme world. The most famous ones are probably the followings:
  • explicit renaming
  • syntax-case
  • syntactic closure
Of course there are more (e.g. ir, reverse syntactic) but if you discuss low level hygienic macros, then above would usual be the ones.

R6RS has syntax-case and rumour says R7RS would have explicit renaming. According to this article, if implementations have one of them, then the rest can be implemented atop it. I'm wondering is it really true? Very lame conclusion is true. Because Sagittarius uses kind of syntactic closure and implements both explicit renaming and syntax-case. Now, can it be done in a portable way?

So I've wrote this. It seems this can work most of R6RS implementations except Racket. Though I haven't tested on Larceny, Guile and Vicare yet, they are using either Psyntax or van Tonder expander such as Mosh and IronScheme (Psyntax) or NMosh (van Tonder). So should work.

The basic idea of the implementation is very simple. rename procedure is a simple wrapper of datum->syntax. compare is free-identifier=?. Then wrap the returning form with datum->syntax*.

The initial revision of the Gist used mere datum->syntax. This wasn't good enough because the procedure should only accept datum not syntax object. The error was raised by Psyntax implementations and Racket. Then I've been suggested to walk thought the returning form.

I first thought this won't work because traversing and constructing a new form would return a list not a syntax object. However I was just didn't consider thoroughly. If I use syntax-case and quasisyntax (with-syntax could also be), then I can construct syntax object containing syntax object renamed by er-macro-transformer. So I've rewrite the code. Then most of the R6RS implementation seem working. Even Racket seems working if the macro is very simple like the one on the comment.

Now, my question is 'Is this R6RS portable?'. R6RS standard libraries 12.2 says:
The distinction between the terms “syntax object” and “wrapped syntax object” is important. For example, when invoked by the expander, a transformer (section 12.3) must accept a wrapped syntax object but may return any syntax object, including an unwrapped syntax object.
So transformer *MAY* return unwrapped syntax object. Though what I'm doing is returning wrapped syntax object so should be fine. What I couldn't read is whether or not a syntax object can contain syntax object(s) inserted by other transformer. If this is required feature, then this might be a bug of Racket. Otherwise this is not portable.

I hope it's a bug of Racket, then you (not me) can write an explicit renaming SRFI with sample implementation.

For convenience, embedded source.





(import (rnrs))

(let* ([pos 0]
       [p (make-custom-binary-input-port
           "custom in"
           (lambda (bv start count)
             (if (= pos 16)
                   (set! pos (+ 1 pos))
                   (bytevector-u8-set! bv start pos)
           (lambda () pos)
           (lambda (p) (set! pos p))
           (lambda () 'ok))])
  (get-bytevector-n p 3))
;;-> #vu8(1 2 3)


(import (scheme base) (scheme write))

(define-syntax assocm
  (syntax-rules ()
    ((_ key (alist ...))
     (letrec-syntax ((foo (syntax-rules (key)
                            ((_ (key . e) res (... ...)) (key . e))
                            ((_ (a . d) res (... ...)) (foo res (... ...))))))
       (foo alist ...)))))

;; a bit of trick to avoid unbound variable
(define (c d) (list 'c d))
(define d 1)

(assocm c ((a b) (b d) (c d) (d d)))
;; -> (c 1)
(define-syntax cdrm
  (syntax-rules ()
    ((_ (a . d)) d)))

(cdrm (c . d))
;; -> 1
(cdrm (assocm c ((a b) (b d) (c d) (d d))))
;; -> error
(define-syntax assocm/cps
  (syntax-rules ()
    ((_ k key (alist ...))
     (letrec-syntax ((foo (syntax-rules (key)
                            ((_ (key . e) res (... ...)) (k (key . e)))
                            ((_ (a . d) res (... ...)) (foo res (... ...))))))
       (foo alist ...)))))

(assocm/cps cdrm c ((a . b) (b . d) (c . d) (d . d)))
;; -> 1
;; I want to write like this!
(assocm/cps (composem cdrm/cps carm) c ((a . b) (b . d) (c . d) (d . d)))
(define-syntax composem (syntax-rules ()))

;; assume k is CPS macro
(define-syntax extract/cps
  ;; it's a bit awkward to have own name in literals
  ;; but this saves me a lot
  (syntax-rules (composem extract/cps)
    ((_ (composem k) args ...) (k args ...))
    ((_ (composem k ...) args ...)
     (extract/cps "flatten" () (k ...) (args ...)))
    ;; flatten nested composem
    ((_ "flatten" (cps ...) ((composem k ...) k* ...) args)
     (extract/cps "flatten" (cps ... k ...) (k* ...) args))
    ((_ "flatten" (cps ...) (k k* ...) args)
     (extract/cps "flatten" (cps ... k) (k* ...) args))
    ((_ "flatten" (cps ...) () (args ...))
     (extract/cps (extract/cps cps ...) args ...))
    ;; extract/cps keyword
    ((_ (extract/cps (composem k)) args ...) (k args ...))
    ((_ (extract/cps (composem k k* ...)) args ...)
     (k (extract/cps (composem k* ...)) args ...))

    ((_ (extract/cps k) args ...) (k args ...))
    ((_ (extract/cps k k* ...) args ...)
     (k (extract/cps (composem k* ...)) args ...))
    ;; short cut
    ((_ k args ...) (k args ...))))
(define-syntax assocm/cps
  (syntax-rules ()
    ((_ k key (alist ...))
     (letrec-syntax ((foo (syntax-rules (key)
                            ((_ (key . e) res (... ...)) 
                             (extract/cps k (key . e)))
                            ((_ (a . d) res (... ...)) (foo res (... ...))))))
       (foo alist ...)))))
(define-syntax cdrm/cps
  (syntax-rules ()
    ((_ k (a . d)) (extract/cps k d))))

(define-syntax carm/cps
  (syntax-rules ()
    ((_ k (a . d)) (extract/cps k a))))

(define-syntax valuesm
  (syntax-rules ()
    ((_ args) args)
    ;; this isn't really values...
    ((_ args ...) (args ...))))
(assocm/cps (composem cdrm/cps carm/cps valuesm) c ((a b) (b d) (c d) (d d)))
;; -> 1



Small Scheme or large Scheme?

2 topics about the size of Scheme were posted on c.l.s. One was rather branch of other topic:How many R6RS users and how much code out there?. And the other is indirectly suggesting it: Question about vote of RnRS (the poster mentioned about the change between R5RS and R6RS as drastic change so seems it's about the size.) Even though all what I wanted to say is already said by Taylan on the first topic I've mentioned, I want to write something about the size so bare with me :)

The point of this topic for me is the definition of small or large. The poster said it is useful enough for educational purpose. I agree with it. If I need to add a bit of my opinion about this, I would say the languages which has proper design for the topic of the class are useful enough for educational purpose as long as students don't use convenient libraries. Java is fine for OOP, C++ is fine for OOP and meta programming, Mathematica is excellent for math, etc. So they can also be programming languages for educational purpose, right? Now are they small?

The answer for me is no. Especially if you see C++'s specification, it's more than gigantic even human beings couldn't understand, IMHO (are there people who understand all the spec?). I wouldn't say it's one of beautifully designed language, but it's powerful enough to cover other purposes including professional use . Then, should Scheme be this much huge to make all programmer happy?

My answer is again no. It's too complicated, it's piling up features on top of other features on top of other, lemme quit. One of the reason why C++ is piling up those features, I believe, is that it doesn't have enough abstraction to make language self growing. Most of programming language don't have this type of feature such as defining new syntax. Or if they have it, it'd be rather complicated to use, either deliberately or accidentally. If it's deliberately, then the designer of the language doesn't want users to use it casually. If it's accidentally, then it's not considered well but made rather adhoc. The first decision is understandable, sometimes those things are not really needed and don't want users to summon daemons from their nose.

Now what's the definition of small in Scheme specification? In my opinion, there is no need to pile up features but it can grow by itself. Let me elaborate what it means.  Currently there're on going discussion about hashtable on SRFI. This might be a good example to do it. Is hashtable required by small language as Scheme? Well, my answer from bottom of my heart is yes but rational answer is no. Why no? It can be implemented by vector and record. Now one of the SRFIs also mentioning weak hashtable. This is rather interesting. Implementing this data structure can not be done neither in range of R6RS nor R7RS. So this seems required, right? Wait a sec, there's a draft SRFI about ephemeron. If you use this, then you can implement weak hashtable using vector, record and ephemeron. So the absolute requirement to have is this one. Hurrah, the language spec is kept small enough!

IMO, this is a bit too extreme. Each time users need to make own utility libraries for those common data structure is ridiculous. Then here comes R7RS-large process. The purpose, in my understanding, of this process is that keeping core language absolutely minimum and put a collection of those commonly used things in its specification. So implementations may or may not support all of them. Oops, again, what's absolutely minimum?

Unfortunately, I don't have generic answer for this and, I believe, neither most of Schemers do. The only thing I do have now is that it's not enought to be perfect language which can solve all problems in this world. Concurrency, networking, weak data, etc. These are not in the specification (maybe yet) but absolutely needed. If there's something lacking, then language specification should grow even if it's got bigger.

The world of computer is growing like speed of light. Problems to be solved are zillion. Too small wouldn't solve. We need small enough.


bound or free identifier

I've just fixed the bug of incorrect usage of free-identifier=? during macro expansion. As my memo and maybe good to share what I've learnt.

Firstly, have a look at this piece of code:
(import (rnrs))

(define-syntax foo
  (lambda (x)
    (syntax-case x ()
      ((_ (t1 t2))
       ;; what should be print?
       (begin (display (free-identifier=? #'t1 #'t2)) (newline)
              (display (bound-identifier=? #'t1 #'t2)) (newline))
      ((_ (t* ...) a b ...)
       #'(foo (t* ... t) b ...))
      ((_ a ...)
       #'(foo () a ...)))))
(foo 1 2)
If you can immediately see what's printed, then you can skip the next paragraph :)

The difference between bound-identifier=? and free-identifier=? is that the first one only sees where the given 2 identifiers were  created and the second one sees the actual bindings are the same or not. In the above example, identifier ts are created in different places, more precisely different macro expansion process. If you expand the macro manually, then it would look like this:
(foo 1 2)
;; -> (foo () 1 2))
;; -> (foo (t) 2)) ;; <- where the first t is created 
;; -> (foo (t t))  ;; <- the second one is created here in the different macro expansion
;; 'ok
If 2 identifiers whose names are same are created in different macro expansion, then comparing bound-identifier=? should return #f.free-identifier=?, on the other hand, should return #t because those identifiers are not bound to anything thus they have the same binding (unbound variable) and also have the same name (t). (This is what I understood the behaviour of those 2 procedures. Correct me, if I'm wrong.)

Now the bug was related to this difference. The detail is here. In the description, it says free-identifier=? returns #t against pattern variables t but this is correct behaviour. What I did wrong was using free-identifier=? to compare pattern variables during expanding template variables. Moreover, compiler for syntax-case renamed pattern variables which must be preserved so that expander can use bound-identifier=? to compare pattern variables.

If I know what's wrong, then fixing it is not a big problem. Just adding extra check for pattern variable and use bound-identifier=?. Done! I hope this would be the last article related to macro bug... (feeling won't though)





(syntax-case x ()
  ((_ (k ...))
   (with-syntax (((e ...) (gen)))
     #'(lambda (x) (syntax-case x (k...) e ...)))))



Timezone related bugs

Since Sagittarius 0.6.7, it has timezone object. The rationale behind this is pretty simple. Before it used localtime and some other C APIs to get proper local timezone offset. I wasn't unhappy until I needed to handle other timezone offsets. I don't remember exact situation but related to time conversion. Unfortunately, on POSIX there is no way to treat timezone as objects (as far as I could research), so I've decided to implement own timezone handling.

Timezone is related to physical location. Of course, you can change your computer's setting to deceive yourself. I wasn't one of those people and that bit me. I've faced 2 timezone related bugs. Both could only happen in particular places, Japan and country where no summer time. Let me share the bug story. Starting from the first one.

There is IANA TZ database and I'm using it to find proper timezone offset. If you build Sagittarius from repository, then the pre-build process, not cmake but dist.sh, downloads TZ database and compiles it to S-expression. It's not too bad. The only bad thing is that it can't handle any rules not written in the TZ database, such as Japan.

Asia/Tokyo timezone has rather weird rules. According to the TZ database, Japan had summer time between 1948-1951. On the comment, it clearly says this is used only on US military base, for your information, so officially Japan didn't have it, though. Funny thing is that this rule, named Japan, is associated to Asia/Tokyo timezone and it's active now (2015 current). Additionally, the rule doesn't have definition of after 1951. So what should happen? Maybe I didn't read all the comment or how the rules defined properly so I missed something. Anyway, the previous behaviour of handling this situation was signaling an error. Which, of course, caused problem. The biggest one was Sagittarius couldn't be built in Japan.

Thanks to the comment on this blog, I could notice there was such a problem. Other than that, this would stay until I'd decide to go back to Japan. The fix was rather simple, instead of signaling an error, it simply returns default timezone offset without considering old timezone offset. I think that's good enough. (NB: the timezone object would consider the when, means if you ask timezone offset of before 1835 in the Netherlands, then it would return GMT+0:19:32. No need for this? just for fun.)

The second one was rather stupid mistake. Again it was in Japan. I've found a tweet that said time-utc->date didn't consider timezone offset but just return UTC. I saw this when I fixed the first bug so I thought it might be related. Well, kinda but not quite. This bug happened only on Windows (including Cygwin) running on a timezone without summer time (Japan for example).

The reproducing was one easy step. Change system timezone to 'Osaka, Sapporo, Tokyo' then get local timezone. Aha, it returned UTC. But why? Well, very simple. To get local timezone name on Windows (and Cygwin), Sagittarius uses GetTimeZoneInformation and check the return code. If the return code was not TIME_ZONE_ID_UNKNOWN then it returns standard timezone name, otherwise UTC. Hey, MSDN says if the timezone doesn't have summer time then this return code would be returned! That's why if it's in Japan, it returned always UTC. Careless (or stupid) mistake.

Even though I'm using CI services for Linux, Windows and OS X, this type of physical location related bugs are really hard to find. (Not even sure if I can change system timezone per build on the services) I've just seen importance of feedback in a fresh light.

