rotten February 2016

Neo4j and Hugepages

Since Neo4j works primarily in memory, I was wondering if it would be advantageous to enable hugepages (https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt) in my Linux Kernel, and then XX:+UseLargePages or maybe -XX:+UseHugeTLBFS in the (OpenJDK 8) JVM ?

If so, what rule of thumb should I use to decide how many hugepages to configured?

The Neo4j Performance Guide (http://neo4j.com/docs/stable/performance-guide.html) does not mention this, and Google didn't turn up anyone else discussing it (in the first couple of search pages anyway), so I thought I'd ask.

I'm wrestling to get acceptable performance from my new Neo4j instance (2.3.2-community). Any little bit will help. I want to know if this is worth trying before I bring down the database to change JVM flags... I'm hoping someone else has done some experiments along these lines already.

Thanks!

Answers


Chris Vest February 2016

Since Neo4j does its own file paging and doesn't rely on the OS to do this, it should be advantageous or at least not hurt. Huge pages will reduce the probability of TLB cache misses when you use a large amount of memory, which Neo4j often would like to do when there's a lot of data stored in it.

However, Neo4j does not directly use hugepages even though it could and it would be a nice addition. This means you have to rely on transparent huge pages and whatever features the JVM provides. The transparent huge pages can cause more-or-less short stalls when smaller pages are merged.

If you have a representative staging environment then I advise you to make the changes there first, and measure their effect.

Transparent huge pages are mostly a problem for programs that use mmap because I think it can lead to changing the size of the unit of IO, which will make the hard-pagefault latency much higher. I'm not entirely sure about this, though, so please correct me if I'm wrong.

The JVM actually does use mmap for telemetry and tooling, through a file in /tmp so make sure this directory is mounted on tmpfs to avoid gnarly IO stalls, for instance during safe-points (!!!). Always do this even if you don't use huge pages.

Also make sure you are using the latest Linux kernel and the latest Java version.

You may be able to squeeze some percentage points out of it with tuning G1, but this is a bit of a black art.

Post Status

Asked in February 2016
Viewed 3,151 times
Voted 4
Answered 1 times

Search




Leave an answer