Cassandra provides the global persistence layer for the New York Times nyt⨍aбrik project.
nyt⨍aбrik (in production January 2014) is reliable, low latency messaging middleware connecting internal clients at the New York Times (breaking news, user generated content, etc) with millions of external clients around the world. The primary technologies employed are: RabbitMQ (AMQP), Cassandra, and websockets/sockjs. Components developed by the New York TImes will be made open source beginning in 2014.
This presentation will focus on the use of Cassandra as the high performance distributed data store supporting the nyt⨍aбrik.
Two years ago, the New York Times launched a speculative project to radically simplify application connections among internal and external clients. The project was based upon an observation, a prediction, and a fact:
Observation: World-class messaging middleware is available as open source, e.g. RabbitMQ.
Prediction: Websocket messaging to almost all client devices of interest will be practical by 2014.
Fact: Messaging architectures are proven to be very resilient, highly scalable, fast, and efficient.
The project was named nyt⨍aбrik because: 1) my boss is Russian, 2) the result is a fabric topology, 3) I liked it.
We use Amazon Web Services. A complementary idea for this project was to be completely independent of the "old" infrastructure and its management. We managed everything on our own, as well as developing software, so we could optimize both the application AND its supporting infrastructure.
As you can see, the architecture implied by the above requires a gateway between websockets and "normal" messaging, e.g. AMQP (RabbitMQ). We put these gateways in the "retail" layer - they autoscale based upon client load.
The "retail" instances each connect to one or more "wholesale" instances, organized into pipeline clusters. The "retail" and "wholesale" layers are spread horizontally across datacenters within regions around the world. Suffice it to say that they are 'meshed' such that there are no single points of failure. And the application structure is stateless, except for transitory queues.
But we need state. When a client connects, we need to get subscription preferences and deliver breaking news, latest videos, and any personal messages for that client. And it needs to be fast. And it has to be headless and resilient like the rest of the nyt⨍aбrik.
So we "outsourced" state information to Cassandra. My talk will address in detail how we chose and architected Cassandra to support the nyt⨍aбrik.