Jasper22.NET: How Twitter Uses NoSQL

Posted by jasper22 at 13:58 | Tuesday, January 4, 2011

InfoQ has released a video of Twitter's Kevin Weil speaking at Strange Loop earlier this year on how the company uses NoSQL. Weil is quick to point out that Twitter is heavily dependent on MySQL. However, Twitter does employ NoSQL solutions for many purposes for which MySQL isn't ideal. According to Weil, Twitter users generate 12 terrabytes of data a day - about four petabytes per year. And that amount is multiplying every year. Read on for our notes on Weil's talk.

Scribe
Syslog stopped scaling for Twitter after a while, so instead it uses Scribe, a log collection framework created and open-sourced by Facebook. Twitter has contributed some of its own patches to Scribe.

Twitter uses Scribe to write logs to Hadoop. Scribe made it so easy for Twitter to log data, it started to log much more data. It now logs 80 different categories of data.

Hadoop
Twitter needs to store more data per day than it can reliably write to a single hard drive, so it needs to store data on clusters. Twitter uses Cloudera's Hadoop distribution to power its clusters.

Weil says MySQL isn't efficient at doing analytics at the scale Twitter needs. Instead, Twitter uses Hadoop and its own open source project called FlockDB. Hadoop can run analytics and hit FlockDB in parallel to assemble social graph aggregates.

Pig

Jasper22.NET

How Twitter Uses NoSQL

Archive

Random sites

Followers

Search This Blog