I'm not totally sure since when I've add code elimination and right now I don't have the version 0.6.2 (I think that's the version I've used) in my environment. So this might be totally bogus. Anyway, back then I used the following code:
(define (fact n) (let loop ((m 1) (r 1)) (if (= m n) (* m r) (loop (+ m 1) (* m r))))) (time (dotimes (i 10000) (fact 1000)))Now, the
fact
can be marked as transparent or no side effect because it seems it doesn't have any side effect nor consicing. Let me check.
(procedure-transparent? fact) ;; -> #tThe
procedure-transparent?
is an internal procedure which is used by the compiler to eliminate dead code. So possibility is very high now. OK, let's check the VM instructions of the expression.
(disasm (lambda () (dotimes (i 10000) (fact 1000)))) ;; size: 15 ;; 0: CONSTI_PUSH(10000) ;; 1: CONSTI_PUSH(0) ;; 2: LREF_PUSH(1) ;; 3: LREF(0) ;; 4: BNGE 2 ;; 6: RET ;; 7: LREF_PUSH(0) ;; 8: LREF(1) ;; 9: ADDI(1) ;; 10: PUSH ;; 11: SHIFTJ(2 0) ;; 12: JUMP -11 ;; 14: RETBingo! There is no procedure call!
Now, I might have some hope to turn up the VM. So prepare the shared object which provides
fact
. The C code is the following:#include <sagittarius.h> #define LIBSAGITTARIUS_BODY #include <sagittarius/extend.h> static SgObject fact(SgObject *SG_FP, int SG_ARGC, void *data_) { SgObject m = SG_MAKE_INT(1), r = SG_MAKE_INT(1); SgObject n = SG_FP[0]; while (TRUE) { if (Sg_NumEq(m, SG_FP[0])) { return Sg_Mul(m, r); } else { SgObject t1 = Sg_Add(m, SG_MAKE_INT(1)); SgObject t2 = Sg_Mul(m, r); m = t1; r = t2; } } return SG_UNDEF; /* dummy */ } static SG_DEFINE_SUBR(fact__STUB, 1, 0, fact, SG_FALSE, NULL); SG_EXTENSION_ENTRY void CDECL Sg_Init_fact() { SgLibrary *lib = Sg_FindLibrary(SG_INTERN("(fact)"), TRUE); SG_PROCEDURE_NAME(&fact__STUB) = SG_INTERN("fact"); Sg_InsertBinding(lib, SG_INTERN("fact"), &fact__STUB); } /* gcc -lsagittarius fact.c -shared -o fact.so -fPIC -O3 */Now, benchmark. I've added
set!
to prevent the compiler optimisation.(load-dynamic-library "fact") (import (time) (sagittarius control) (fact)) (define dummy) ;; Load C implementation first (print fact) (time (dotimes (i 10000) (set! dummy (fact 1000)))) ;; Scheme implementation (define (fact n) (let loop ((m 1) (r 1)) (if (= m n) (* m r) (loop (+ m 1) (* m r))))) (print fact) (time (dotimes (i 10000) (set! dummy (fact 1000)))) #| #<subr fact 1:0> ;; (dotimes (i 10000) (set! dummy (fact 1000))) ;; 6.604181 real 14.084392 user 0.174243 sys #<closure fact 1:0> ;; (dotimes (i 10000) (set! dummy (fact 1000))) ;; 6.698120 real 14.32880 user 0.132117 sys |#Well, almost the same. C version is slightly faster. This is, I believe, because it doesn't have VM dispatch. But this is more or less error range.
Even if this had siginificant improvement, I still need to resolve loads of things to native shared object from Scheme code. Such as:
- Mapping of Scheme procedure and C function
- Calling Scheme procedure from C
- Error handling especially unbound variables
- C compiler detection
- Etc. (macro, location so on)
call/cc
friendly. There's a way to avoid some overhead, such as using CPS, however it'd be still the same or even slower then just running on VM (this is really proven by previous experience, unfortunately). On possible good future would be less consicing similar with the one mentioned unboxing in guile -- wingolog. Though, as long as I need to use CPS, it would most likely no more than trivial improvement.It seems there's no easy way out for performance improvement. Maybe it's time to give up looking for this path.
No comments:
Post a Comment