2016-02-15

Server performance 2

So (net server) itself wasn't too bad performance, then there must be other culprit. To find out it, I usually use profiler however it can only work on single thread environment. That means it's impossible to use it on the server program written on top of (net server) library.

Just giving up would be very easy way out but my consciousness doesn't allow me to do it (please let me go...). Thinking current HTTP server implementation uses 2 layers, Paella and Plato. The first one is the basic, then web framework. At least I can see which one would be slow. So I've just tried with bare Paella server. Copy&pasting the example and modify a bit like this:
(import (rnrs)
        (net server)
        (paella))

(define config (make-http-server-config :max-thread 10))

(define http-dispatcher
  (make-http-server-dispatcher
    (GET "/benchmark" (http-file-handler "index.html" "text/html"))))

(define server 
  (make-simple-server "8500" (http-server-handler http-dispatcher)
                      :config config))

(server-start! server)
Then uses the same script as before.The result is this:
$ time ./benchmark.sh
./benchmark.sh  4.66s user 3.76s system 335% cpu 2.507 total
Hmmm, bare server is already slow. So I can assume most of the time are consumed by the server, not the framework.

Listing up what's actually done by server would help:
  1. Converting socket to buffered port
  2. Parsing HTTP header
  3. Parsing request path.
  4. Parsing query string (if there is)
  5. Parsing mime (if there is)
  6. Parsing cookie (if there is)
  7. Calling handler
  8. Writing response
  9. Cleaning up
So I've started with the second item (port conversion actually improves performance so can't be removed, unless I write everything from scratch using socket but that sound too much pain in the ass). Conclusion first, I've improved header parsing almost 100% (mostly reducing memory allocation) but it didn't affect the performance of the server at all. Parsing header occurs once per request, so I've dumped headers what cURL sends and carefully diagnosed which procedure takes time. As the result, SRFI-13 related procedures consuming a lot of times because it has rich interface but requires packing rest arguments. So I've replaced them with no rest argument version. Then in the same library, the procedure rfc5322-header-ref which is for referring header value called string-ci=? which calls string-foldcase internally. So changed it to call case folding once. And couple of more improvements. All of them, ideed, improved performance however calling header parser only 1000 times took 30ms from the beginning. So make it 15ms doesn't make that much change.

Then I've started doubting that the benchmark script itself is actually slow. I'm not sure how fast cURL itself is but forking it 1000 times and wait for them didn't sound fast. So I've written the following script:
#!read-macro=sagittarius/bv-string
(import (rnrs)
        (sagittarius socket)
        (sagittarius control)
        (time)
        (util concurrent)
        (getopt))

(define header
  #*"GET /benchmark HTTP/1.1\r\n\
     User-Agent: curl/7.35.0\r\n\
     Host: localhost:8500\r\n\
     Accept: */*\r\n\r\n")

(define (poke)
  (define sock (make-client-socket "localhost" "8500"))
  (socket-send sock header)
  ;; just poking
  (socket-recv sock 256)
  (socket-close sock))

(define (main args)
  (with-args (cdr args)
      ((threads (#\t "threads") #t "10")
       (unit    (#\u "unit")    #t "1000"))
    (let* ((c (string->number unit))
           (t (string->number threads))
          (thread-pool (make-thread-pool t raise)))
      (time (thread-pool-wait-all!
             (dotimes (i (* c t) thread-pool)
               (thread-pool-push-task! thread-pool poke))))
      (thread-pool-release! thread-pool))))
Send fixed HTTP request and recieve the response (could be partially). -t option specifies how many threads should used and -u option specifies how many request should be done per thread. So if this ideed takes time, then my assumption is not correct. Lemme do it with bare HTTP server:
$ sash bench.scm -t 100 -u 100

;;  (thread-pool-wait-all! (dotimes (i (* c t) thread-pool) (thread-pool-push-task! thread-pool poke)))
;;  4.052414 real    0.670089 user    1.255910 sys
100 threads and 100 request per thread so in total 10000 request were send. Then it took 4 seconds, so 2500 req/s. It's faster than cURL version.

2500 req/s isn't fast but for my purpose it's good enough for now. So I'll put this aside for now.

No comments:

Post a Comment