2016-01-22

Dynamic compilation 2

Almost a year ago, I've wrote a post about compiling Scheme code to C. (See: Dynamic compilation (failure)) In the article, I've conclude that Sagittarius' VM is turned like crazy so ordinal C code wasn't match at all or something like that. After a year, I've noticed that there's a possibility that the compiler eliminated the expression and the VM just did some loop.

I'm not totally sure since when I've add code elimination and right now I don't have the version 0.6.2 (I think that's the version I've used) in my environment. So this might be totally bogus. Anyway, back then I used the following code:
(define (fact n)
  (let loop ((m 1) (r 1))
    (if (= m n)
        (* m r)
        (loop (+ m 1) (* m r)))))

(time (dotimes (i 10000) (fact 1000)))
Now, the fact can be marked as transparent or no side effect because it seems it doesn't have any side effect nor consicing. Let me check.
(procedure-transparent? fact)
;; -> #t
The procedure-transparent? is an internal procedure which is used by the compiler to eliminate dead code. So possibility is very high now. OK, let's check the VM instructions of the expression.
(disasm (lambda () (dotimes (i 10000) (fact 1000))))
;; size: 15
;;    0: CONSTI_PUSH(10000)
;;    1: CONSTI_PUSH(0)
;;    2: LREF_PUSH(1)
;;    3: LREF(0)
;;    4: BNGE 2
;;    6: RET
;;    7: LREF_PUSH(0)
;;    8: LREF(1)
;;    9: ADDI(1)
;;   10: PUSH
;;   11: SHIFTJ(2 0)
;;   12: JUMP -11
;;   14: RET
Bingo! There is no procedure call!

Now, I might have some hope to turn up the VM. So prepare the shared object which provides fact. The C code is the following:
#include <sagittarius.h>
#define LIBSAGITTARIUS_BODY
#include <sagittarius/extend.h>

static SgObject fact(SgObject *SG_FP, int SG_ARGC, void *data_)
{
  SgObject m = SG_MAKE_INT(1), r = SG_MAKE_INT(1);
  SgObject n = SG_FP[0];

  while (TRUE) {
    if (Sg_NumEq(m, SG_FP[0])) {
      return Sg_Mul(m, r);
    } else {
      SgObject t1 = Sg_Add(m, SG_MAKE_INT(1));
      SgObject t2 = Sg_Mul(m, r);
      m = t1;
      r = t2;
    }
  }
  return SG_UNDEF;  /* dummy */
}

static SG_DEFINE_SUBR(fact__STUB, 1, 0, fact, SG_FALSE, NULL);

SG_EXTENSION_ENTRY void CDECL Sg_Init_fact()
{
  SgLibrary *lib = Sg_FindLibrary(SG_INTERN("(fact)"), TRUE);
  SG_PROCEDURE_NAME(&fact__STUB) = SG_INTERN("fact");
  Sg_InsertBinding(lib, SG_INTERN("fact"), &fact__STUB);
}
/*
gcc -lsagittarius fact.c -shared -o fact.so -fPIC -O3
 */
Now, benchmark. I've added set! to prevent the compiler optimisation.
(load-dynamic-library "fact")
(import (time) (sagittarius control) (fact))

(define dummy)

;; Load C implementation first
(print fact)
(time (dotimes (i 10000) (set! dummy (fact 1000))))

;; Scheme implementation
(define (fact n)
  (let loop ((m 1) (r 1))
    (if (= m n)
        (* m r)
        (loop (+ m 1) (* m r)))))

(print fact)
(time (dotimes (i 10000) (set! dummy (fact 1000))))
#|
#<subr fact 1:0>

;;  (dotimes (i 10000) (set! dummy (fact 1000)))
;;  6.604181 real    14.084392 user    0.174243 sys
#<closure fact 1:0>

;;  (dotimes (i 10000) (set! dummy (fact 1000)))
;;  6.698120 real    14.32880 user    0.132117 sys
|#
Well, almost the same. C version is slightly faster. This is, I believe, because it doesn't have VM dispatch. But this is more or less error range.

Even if this had siginificant improvement, I still need to resolve loads of things to native shared object from Scheme code. Such as:
  • Mapping of Scheme procedure and C function
  • Calling Scheme procedure from C
  • Error handling especially unbound variables
  • C compiler detection
  • Etc. (macro, location so on)
In my experience, calling Scheme procedure from C is really slow and it's not call/cc friendly. There's a way to avoid some overhead, such as using CPS, however it'd be still the same or even slower then just running on VM (this is really proven by previous experience, unfortunately). On possible good future would be less consicing similar with the one mentioned unboxing in guile -- wingolog. Though, as long as I need to use CPS, it would most likely no more than trivial improvement.

It seems there's no easy way out for performance improvement. Maybe it's time to give up looking for this path.

No comments:

Post a Comment