2014-08-19

Performance turning - Windows (MSVC)

It turned out that Windows version of Sagittarius is extremely slow. If I try ack on Cygwin version then it can run about 60 sec (Core i7 3.40 GHz). Following is the result of three implementations that previous article mentioned.
% time gosh -r7 ack.scm
gosh -r7 test.scm  130.14s user 4.26s system 120% cpu 1:51.70 total

% time sagittarius ack.scm
sagittarius test.scm  54.48s user 3.01s system 124% cpu 46.022 total

% time chibi-scheme ack.scm
chibi-scheme test.scm  49.84s user 0.00s system 99% cpu 50.170 total

And this is the inside of "acm.scm".
(import (scheme base))
(define (ack m n)
  (cond ((= m 0) (+ n 1))
        ((= n 0) (ack (- m 1) 1))
        (else (ack (- m 1) (ack m (- n 1))))))

(ack 3 12)
Apparently Chibi is the fastest. And Sagittarius is not so slow. Well, I don't know why Gauche is slow in Cygwin environment. I think this is simply the performance of VM and its stack overflow handling. (at least on Sagittarius, not sure for others.)

Now let's see how is the Windows version's performance. Following is the result of ack using Cygwin's time command to see how slow it is.
D:>\cygwin\bin\time sash.exe ack.scm
0.00user 0.00system 1:12.13elapsed 0%CPU (0avgtext+0avgdata 220928maxresident)k
0inputs+0outputs (901major+0minor)pagefaults 0swaps
I don't know why it has some garbage but 30% slower. So indeed it is Windows version's issue. But why?

I think the reason why is it does not use direct threading due to the limitation of MSVC (if we use it should improve approx 30% of performance, so about 21 sec in this case?). However I've found this article: How not to make a virtual machine (label-based threading). Even though this article concluded not to emulate direct threading on MSVC however it wouldn't hurt to give it a shot.So I've modified VM a bit to emulate it (sources are in msvc32-emulate-direct-threading branch). Then run the above script.

Here is the result;
D:>\cygwin\bin\time.exe build\sash.exe ack.scm
0.00user 0.00system 1:40.56elapsed 0%CPU (0avgtext+0avgdata 220928maxresident)k
0inputs+0outputs (901major+0minor)pagefaults 0swaps
Well, it's always better to believe what people have done, I guess...So this really doesn't work. Now we only have 2 options, giving up or adopt to MinGW. Hmmmm.

No comments:

Post a Comment