donderdag 24 januari 2013

DataSnap, RO, RTC, mORMot, WCF, Node speed test

After reading the "DataSnap test" blog article, I wanted to do some extra tests: RemObjects SDK (RO) and the effect of ScaleMM2. I also got a server build from RealThinClient (RTC) for testing.
To get some reference I downloaded the test servers and JMeter 2.8 and ran them on my pc. After that I got a RemObjects server working with JMeter so I could compare the results with the other solutions. I also made a "plain indy" server to mimic the plain Node.js test (to see what indy 10 is capable of).

Test setup

I tested on single PC: a quad core, Windows 7, 8gb computer; I used JMeter 2.8 with 50 threads and 1000 request per thread. I configured all test servers to use a threadpool of 50 (whenever this was possible).
It is a rather rough and quick test, no 3 times average etc: it is only used to get a quick indication of the performance. I also did not monitor the memory usage because most test servers had a low usage of about 6 or 9 mb (and memory is cheap :) ). Only ScaleMM2 versions use about 50Mb: each thread starts with its own 1mb memory block (this needs to be further optimized but it works for now).


See below for the crowded results. Tip: the Google Chart is interactive, this will help to dig through the many bars...
Loading Google Chart...


My test has some big limitations you have to be aware of:
  • I tested on "localhost" so no real network test with reliability, error rate, etc.
  • Because all software is ran on 1 pc, most results are cpu bound. This means the results have some kind of automatic "correction" for servers with a high cpu usage: these are "less efficient" and get "punished" for this: they will get a lower request rate.
    Again: it is not a real network test, it does not show the maximum possible speed. A real network test  should be done before conclusions or decisions can be made!
  • I did only a short test (50.000 request, so a couple of seconds), so no long term performance.

Observations (not conclusions :) )

So, how useful is this test then? Well, despite the rough chunks, you can however make some interesting observations:
  • RTC and mORMot both performs very good! 
  • however, mORMot has big differences between the results: it runs fast in admin mode (run as Administrator), but slower in normal user mode (higher kernel(red) cpu). Also the XE3 build acts weird: I could not make connections in user mode, only in admin mode and is much slower than the D2010 build! (same source, same release compiler options). Are there known slow downs in the XE3 compiler?
  • Node.js and "plain indy" have similar results, so Indy itself is quite fast (but slower than RTC and mORMot)
  • RemObjects SDK (RO) comes close to RTC and mORMot (and a little bit faster than WCF) but only(!) if ScaleMM2 and not Indy but Synapse or DX is used
  • RO, DataSnap and Indy benefit the most when using ScaleMM2. RTC and mORMot are very fast on themselves (less MM bound, so more optimized and more efficient?). But a multi threaded memory manager gets very useful when you start writing user code (creating string, objects, records etc) anyway.
  • DataSnap XE3 is disappointing, even with ScaleMM2 it performs much slower than the others. Also the performance drops very quickly! The first second(nr1) it does about 3500 request per second, but after a couple of seconds(!)(nr2) it drops to an average of 1700/s! See below for a ProcessExplorer screenshot, notice the steep drop of the IO chart at the bottom of the screen! Maybe some kind of increasing array or list that makes this logarithmic decline? Due to automatic session management? (but I don't want sessions, I want a fast stateless server!). Or maybe a lot of "interthread memory" is used, so a multithreaded MM needs to lock too? (cpu also gets lower which can indicate higher locking times?).
    Using Google's TCmalloc Memory Manager (Delphi unit, dll) instead of ScaleMM2 shows similar behavior however it starts lower and drops less steep.

Other remarks:

Some other minor remarks about the test:
  • Delphi software is compiled with D2010, only DataSnap is build with XE3 trial.
  • RTC server is compiled by, also with D2010.
  • The fastest RTC settings are used, non blocking and not multithread are also very fast, but blocking and multithreaded gave me the best results
  • DataSnap with keep-alive is used, this gives me the 3500request/s but without it it gave about 3100/s
    (but keep-alive on a real network gives a high error rate?)


Some todo's:
  • I would like to test RemObjects SDK for .Net too: I saw they have a http.sys server (like mORMot) too!
  • I should redo the tests with multiple physical pc's but unfortunately I don't have that much good ones...
  • I will also build the "plain indy" server in Delphi XE3 to see if the XE3 compiler is really broken (maybe that's why DataSnap performs bad?)

Used server software

Some download links to the used servers: