Let's start Scheme

2016-02-12

Server performance

Sagittarius has server framework library (net server) and on top of this library I've written simple HTTP server and web framework, Paella. I don't use it in tight situation so performance isn't really matter for now. However if you write something you want to check how good or bad it is, don't you? And yes I've done simple benchmark and figured out it's horrible.

I've created a very simple static page with Plato which is a web application framework bundled to Paella. It just return a HTML file. (although it does have some overhead...) It looks like this:
(library (plato webapp benchmark)
    (export entry-point support-methods)
    (import (rnrs) (paella) (plato) (util file))

  (define (support-methods) '(GET))
  (define (entry-point req)
    (values 200 'file (build-path (plato-current-path (*plato-current-context*))
                                  "index.html")
            '("content-type" "text/html")))
)
The index.html has 200B data.
I don't have modern nice HTTP benchmark software like ApatchBench (because I'm lazy) so just used cURL and shell. The script looks like this:
#!/bin/sh

invoke () {
    curl http://localhost:8500/benchmark > /dev/null 2>&1
}

call () {
    for i in `seq 1 1000`;
    do
        invoke &
    done
}

call
wait
It's just create 1000 processes background and wait them.

The benchmark is done on default starting script which Plato generates. So number of threads are 10. Then this is the result:
$ time ./benchmark.sh
./benchmark.sh  4.89s user 3.77s system 313% cpu 2.764 total
So, I've done couple of times and average is approx 3 seconds per 1000 requests. So 300 Req/S. It's slow.

If I run the above benchmark with 10 requests, then the result was like this:
$ time ./benchmark.sh
./benchmark.sh  0.05s user 0.05s system 249% cpu 0.040 total
And 1 request is like this:
$ time ./benchmark.sh
./benchmark.sh  0.01s user 0.01s system 77% cpu 0.025 total
So up to thread number, I can assume it does better, at least it's not increased 10 times. But if it's 100, then it's about 7 times more.
$ time ./benchmark.sh
./benchmark.sh  0.49s user 0.35s system 285% cpu 0.293 total
1 to 10 is twice, but 10 to 100 is 7 times. Then 100 to 1000 is 10 times. Something isn't right to me.

Why it's so slow and gets slow when number of requests is increased? I think there are couple of reasons. The (net server) uses combination of select (2) and multithreading. When the server accepts the connection, then it tries to find least used thread. After that it pushes the socket to the found thread. The thread calls select if there's something to read. Then invokes user defined procedure. After the invocation, it checks if there's closed socket or not and waits input by select again. So flow is like this (n = number of thread, m = number of socket per thread):
  1. Find least used thread. O(nm) (best case O(1) if none of the threads are used)
  2. Push socket to the thread. O(1)
  3. Handling request. O(m)
  4. Cleaning up sockets. O(m)
I think dispatching socket smells slow. So I've made some changes like the followings:
  • Adding load balancing thread which simply manage priority queue
  • Just asking the queue which thread is least loaded
  • Code cleaning up
    • Using (util concurrent shared-queue) instead of manually managing sockets and locks
    • Don't assume write side shutdowned socket is not used.
    • more...
With these changes, the first step only takes O(1).  Now benchmark! This is the result:
$ time ./benchmark.sh
./benchmark.sh  4.61s user 3.76s system 317% cpu 2.633 total
YAHOOOOO!!!! 100ms faster!!! ... WHAAAATTTT!???

Well in average it's 2.6sec per 1000 request so it is a bit faster like 300ms - 400ms. And using (util concurrent) made the server itself more robust (it sometimes hanged before). I think the server framework itself is not too bad but HTTP server. So that'd be the next step.

No comments:

Post a Comment