How Apache Cassandra executes CQL queries?
Apache Cassandra is a distributed, highly scalable database which organizes data into rows and columns (Somehow like traditional DBs). Each row must have a unique primary key. This key is used to distribute the data to the DB nodes. If you are not familiar with Apache Cassandra please refer to cassandra.apache.org.
I have been a member of Cassandra developer community and I would like to shed some lights on the query execution process from a developer’s perspective.
The query language of Cassandra is called CQL (Cassandra Query Language). Although it is much similar to SQL language, it has a lot of underlying differences. You can use `cqlsh` tool to run CQL commands against a Cassandra system or you can run queries from source code (using one of Cassandra drivers to talk to a Cassandra instance).
When you submit a command from `cqlsh` tool, it is sent to Cassandra over a network connection. On the other side of the connection, a daemon is waiting for commands. It will produce a response for each received commands and send it to the originator. Cassandra uses Netty for its network protocol handling. Netty allows Cassandra to define complex message encoding, decoding and dispatching to handle the communication protocol with its clients.
Here is the general overview of the flow of method calls during processing of a CQL query:
- The starting point, on the server side, is CassandraDaemon which initializes the service and Server which will be network endpoint to receive requests from clients. When you run ‘./bin/cassandra‘ it will invoke JVM to run ‘main‘ method of this class. This will start the initialization process for Cassandra service and then set up NativeTransportService. This service will configure and start the network service (Defined in Server class).
- org.apache.cassandra.transport.Server: This class is responsible for setting up a socket server. According to Netty terminology, it defines a ‘childHandler’ which will be responsible for setting up and initialize each newly established Channel (network connection). This is done by an internal static class, named Initializer (If Cassandra is set up to use SSL mode, this will be SecureInitializer). This is the component which receives a CQL query request, delegates the processing and sends the response.
- org.apache.cassandra.transport.Server.Initializer: Upon establishment of a new connection with a client, this class will configure the pipeline for processing incoming and outgoing messages through this channel. This includes frame encoding/decoding, compression/decompression and message encoding/decoding. Additionally, it will add a message dispatcher to the pipeline (An instance of Message.Dispatcher) which is responsible for handling incoming messages, delegate execution to corresponding modules and produce an appropriate response.
- org.apache.cassandra.transport.Message.Dispatcher: All network related classes act on Request and Response classes. Requests in Cassandra are all based on org.apache.cassandra.transport.Request abstract class. For each different type of request, there is a separate concrete implementation of this class. The most important of which for this post is org.apache.cassandra.transport.messages.QueryMessage. Message dispatcher class will extend SimpleChannelInboundHandler from Netty library. The main implemented method here is ‘channelRead0‘. This method is called when a new message is received (and processed through the configured pipeline). This method calls ‘execute’ method of the received request. The output of this method will be a Response which will be sent back to the client.
- org.apache.cassandra.transport.Message.Request$execute: This is an abstract method which is implemented for different types of messages. The message type in which we are interested is QueryMessage.
- org.apache.cassandra.transport.messages.QueryMessage.execute: This method delegates the processing to QueryProcessor.process method. The only logic here is for tracing and exception handling.
- org.apache.cassandra.cql3.QueryProcessor$process: This method first calls ‘getStatement‘ to parse the query string (using ANTLR parser) and create an instance of ‘ParsedStatement.Prepared‘. This will include a ‘statement‘ attribute which is of type CQLStatement. CQLStatement is an interface which has implementations for each query type. For example org.apache.cassandra.cql3.statements.UpdateStatement handles INSERT/UPDATE statements. After getting an instance of CQLStatement, its ‘execute‘ method will be invoked to do the processing and return the response.