Home Ask Login Register

Developers Planet

Your answer is one click away!

Complexity February 2016

MongoDb's Sharding does not improve application in a Lab setup

I'm currently developing a mobile application powered by a Mongo database, and however everything is working fine right now, we want to add Sharding to be prepared for the future.

In order to test this, we've created a lab environment (running in Hyper-V) to test the various scenario's:

The following servers have been created:

  • Ubuntu Server 14.04.3 Non-Sharding (Database Server) (256 MB Ram / Limit to 10% CPU).
  • Ubuntu Server 14.04.3 Sharding (Configuration Server) (256 MB Ram / Limit to 10% CPU).
  • Ubuntu Server 14.04.3 Sharding (Query Router Server) (256 MB Ram / Limit to 10% CPU).
  • Ubuntu Server 14.04.3 Sharding (Database Server 01) (256 MB Ram / Limit to 10% CPU).
  • Ubuntu Server 14.04.3 Sharding (Database Server 02) (256 MB Ram / Limit to 10% CPU).

A small console application have been created in C# to be able to measure the time to perform an insert.

This console application does import 10.000 persons with the following properties: - Name - Firstname - Full Name - Date Of Birth - Id

All 10.000 records differs only by '_id', all the other fields are the same for all the records.

It's important to note that every test is exactely run 3 times. After every test, the database is removed so the system is clean again.

Find the results of the test below:

Insert 10.000 records without sharding

Writing 10.000 records | Non-Sharding environment - Full Disk IO #1: 14 Seconds.
Writing 10.000 records | Non-Sharding environment - Full Disk IO #2: 14 Seconds.
Writing 10.000 records | Non-Sharding environment - Full Disk IO #3: 12 Seconds.

Insert 10.000 records with single database shard

Note: Sharding key has been set to hashed _id field. See Json below for (partial) sharding information:

shards:
  {  "_id"         

Answers


kliew February 2016

The layers between the routing server and config server, routing server and data nodes add latency. If you have 1ms ping * 10k inserts, you have an 10 seconds of latency that does not appear in the unsharded setup.

Depending on your configured level of write-concern (if you configured any level of write-acknowledgement), you could have an additional 10 seconds to your benchmarks on the sharded environment due to blocking until an acknowledgement is received from the data node.

If your write-concern is set to acknowledge and you have replica nodes, then you also have to wait for the write to propagate to your replica nodes, adding additional network latency. (You don't appear to have replica nodes though). And depending on your network topology, write-concern could add multiple layers of network latency if you use the default setting to allow chained replication (secondaries sync from other secondaries). https://docs.mongodb.org/manual/tutorial/manage-chained-replication/. If you have additional indexes and write concern, each replica node will have to write that index before returning a write-acknowledgement (it is possible to disable indexes on replica nodes though)

Without sharding and without replication (but with write-acknowledgement), while your inserts would still block on the insert, there is no additional latency due to the network layer.

Hashing the _id field also has a cost that accumulates to maybe a few seconds total for 10k. You can use an _id field with a high degree of randomness to avoid hashing, but I don't think this affects performance much.

Post Status

Asked in February 2016
Viewed 2,332 times
Voted 7
Answered 1 times

Search




Leave an answer


Quote of the day: live life