Arun February 2016

Can Triggers be used in Cassandra for production for a multi datacenter environment?

I have a multi datacenter(DC1, DC2) environment having 3 nodes in each datacenter with RF=3 per datacenter.

  1. Wanted to know if triggers can be used in production in a multi-datacenter environment. If so, how can this be achieved?
  2. Case A: If I start inserting the data to DC1, it would have 3 replicas with in DC1 and is responsible of replicating the data to other data center DC2. Every time an insert into DC2 takes place, I would like to have an trigger event to occur and notify about the latest inserted value in the application. Is it possible?

  3. Case B: If not point 2, is it good to insert the data simultaneously on to two datacenters DC1, DC2 (pointing to a single table) and avoid triggers concept? Will it have any impact with the network traffic? Based on the latest timestamp, the table would have the last insert to the table which serves the purpose when queried from either of the regions.

Consistency level as LOCAL_QUORUM for Read
Consistency level as ONE for write
dse 4.8.2

With these Consistency levels, good consistency can be achieved lowering the latency for write operation across the datacenters.

Usecase:

We have an application (2 domains) for two different regions(DC1 & DC2). Users of DC1 region uses domain 1 to access the application and users of DC2 region uses domain 2 for the same. The data is ingested to DC1 for the same region and when this replicates in its DC, the coordinator of DC1 would replicate the data in other DC (DC2). The moment Dc2 receives the data from DC1, we want to let the application know about the latest information (Polling_ available using some trigger event mechanism. Just wanted to know if this can be implemented with cassandra triggers.

Can someone give the feedback on Case A and Case B? and which would be efficient in production. Thanks<

Answers


Mayank Raghav February 2016

Generally, multiple data center concept is used for workload separation(say different DCs for real-time query,analytic and search). Cassandra by itself takes care of replicating the data across multiple DCs. So, coming to your question Case B doesn't seems a right option because:

  1. Cassandra automatically replicates data across multiple DCs link
  2. Case A is feasible.alerts/notifications using triggers

Hope, it will be helpful.


bechbd February 2016

In either case stated above I am not sure why you want to use a trigger to notify your application that a value was inserted. In the scenario as I understand it your application already knows the newest value. Once the write has been successful you can notify your application with the newest value.

In both cases A and B you are working against some of the basic principals of how Cassandra functions. At an application level you should now need to worry about ensuring replication or eventual consistency of your data across multiple nodes and data centers. That is a large part of what Cassandra brings to the table.

In both Case A and B you are going to get multiple inserts of the same data for each write in each node it is replicated to in both data centers. As you write to DC1 it will also be written to DC2. If you then write to DC2 it will be written back to DC1. This will end with a large number of rows containing the same data and will increase disk requirements and compaction frequency. This will also increase network traffic as the two DC's talk back and forth to gain eventual consistency.

From what I can see here I also have to ask why you are doing an RF=3 on a 3 node cluster. This means that each node in each data center will have all the data essentially making each server a complete replica of the others. This seems like it may be overkill (depending on the data of course) as you are not going to get a lot of the scalability benefits that Cassandra offers.

Cassandra will handle the syncing of data between the data centers and across nodes so your application does not need to worry about this.

One other quick note - Currently your writes are using a CL=ONE. This means that you may end up with cross-DC latency on a write request. If you change this to LOCAL_ONE then you limit your CL query until one of the nodes in the local DC has written the value instead of possibly a node in the other DC. Cassandra

Post Status

Asked in February 2016
Viewed 1,277 times
Voted 11
Answered 2 times

Search




Leave an answer