Sunday, July 3, 2011

EhCache replication: RMI vs JGroups.

Recently, I was working on one product which required replicated caching. Caching provider was already decided - EhCache, and the remained was a question about transport. Which one is the best option? By the best option here I mean just the one which has better performance. The performance measurement was done just between two of available transports - JGroups and RMI, others were not considered, sorry.

Replication was tested between two nodes. The main goal was to understand how increase of message data size and total number of messages affects performance.  Another goal was to find the point where replication performance getting really bad. Latter is not that easy, because test used limited amount of memory and non-leaner performance deterioration could be caused by exhaust of free heap space.
Below are memory size and software versions used to run the test:
  • All tests used 6GB of heap for all executions.
  • Tests were executed on the EhCache v.2.3.2
  • JVM is Sun java 1.6.0_21
The test itself is very simple. One node puts some number of elements with some size in the cache, other node reads all these elements. The test output is the the time required to read all elements. Timer starts just after first element is read.

The first test creates 10000 elements for each iteration. The variable is the message size, which increased twice on each iteration. On the first iteration size is 1280 bytes, on the last one - 327680 bytes (320 Kb). It means that final iteration with 10000 elements, where each size was 320 Kb will transfer approximate 3Gb of data. The tests have shown that EhCache copes very well with increasing size of the element and the slowdown was approximately proportional to the size of transferred data, which can be seen on the graph:

Here y-axis is time required for transfer in milliseconds and x-axis the the size of the element. No need to give much comments. RMI definitely looks better than JGroups.

In the seconds test, the variable was number of elements and the size of the element stayed constant and equal to 1280 bytes. As in previous test the number of messages was multiplied by two in each iteration and the amount of data transferred in final iteration was the same 3Gb. Graph below show how did it go:
As in previous graph, y-axis is the time require to transfer all elements in one iteration. X-axis is the number of elements. Again, it can be seen that RMI is the leader. I believe hat JGroups hit the heap at the latest iteration, that’s why it performed so bad. It means that JGroups has more memory overhead per element.
For the once, who do not trust (I woulnd’t ;) ) to my results and want to try it yourself, here are sources and configuration.

And, as conclusion... Well, RMI and JGroups both are acceptably fast. JGroups is definitely more memory consuming, which means one can hit a problem using it with big amounts of data. RMI, on the other hand uses TCP, instead of UDP, which, with big amount of nodes, may cause higher network load. Latter, unfortunately, is not covered by the test by any means and the real impact is not clear.


Jk said...

Some remarks:

1. RMI is a TCP point-to-point protocol and your tests were done with only two servers: if you increase the number of servers, your point-to-point communication will increase.
JGroups uses by default UDP multicast, so increasing the number of servers will have no impact on the network load.
So if you redo your study with increased number of servers, you probably won't get the same results. See

2. Your comparison is not fair, since you did use "replicateUpdatesViaCopy=false" for JGroups and "replicateUpdatesViaCopy=true" for RMI. I don't know if this would make any difference...

3. It would be nice to have a conclusion. From your results, it seems that RMI manages large amount of memory more efficiently than JGroups (because a few large messages take more time in JGroups, while sending a a lot of small messages takes almost the same time in RMI and JGroups). Can you confirm this analysis ?

4. The first graph legend describes "y-axis" twice instead of once "x-axis" and once "y-axis". The second graph has the same issue, but this time "x-axis" is described twice.

stanislavk said...

1. You may be right. Unfortunately, can't run that test on more machines. I do not have even that lab on my hands any more :( As a workaround, each node can be configured to talk just to limited number (2-3) of other nodes. This will make impossible to use automatic discovery, though... Also, guys in link you've provided are saying the "ehcache creates a replication thread for each cache entity". That sounds strange. I have had a look at the source (RMIAsynchronousCacheReplicator) and there is just one thread, which is pulling messages for replication from the queue. Where did they found that "replication thread for each cache entity"? Also it seems like they do not have any results, just theoretical study :)

2. Do not think, it makes difference to that particular test, as far as I'm just creating entities. Thanks for noticing, it could be important, if test may be slightly different.

3. Conclusion is both are good, but JGroups definitely seems to consume more memory, which makes it less attractive when dealing with big amounts of data.

4. That's will fix it.

Jk said...

JOMB submitted a paper for publication on JGroups on 2 to 16 nodes using UDP unicast (n-n communication) and multicast (1-n communication). They show that performance is stable when using multicast but drops when using unicast. They also have experiments on the packet size and TCP vs UDP. See

stanislavk said...

JK, thanks a lot! Very good paper, gives clear overview. Guys made a good job.

Venkat Nitw05 said...


Nice Blog, btw. did you get a chance to compare JMS Replication vs RMI Replication?

stanislavk said...

No, I haven't. But I reckon JMS is not going to be faster. On the other hand such implementation like HornetQ are showing very good results. If you really need good performance and scalability I would recommend to go for UDP multicast, on the appropriate network it has to be the fastest option.

Venkat said...

In my scenario, the cache cluster is going to be of just 2-3 servers, but the data is going to be huge.. And the reads will be relatively more than the writes...

Stas said...

Multicast UDP (e.g. via JGroups) will not cause such a heavy load on networks as any TCP option. But in case of 2-3 server I do not think that there will be any difference, really.
I would go for UDP as the first option and use TCP just in case when it's really required, e.g. firewall do not allow UDP traffic through.

Ivan Astafyev said...

Hi. I have one remark. In case of many cache regions rmi creates many threads and consumes more memory than jgroups. You