Jasper22.NET: Holy Large Hadron Collider, Batman!

Posted by jasper22 at 15:38 | Sunday, June 6, 2010

Valentin Kuznetsov just presented a paper at the International Conference on Computational Science on CERN’s use of MongoDB for Large Hadron Collider data. The paper, The CMS Data Aggregation System, is available as a PDF at ScienceDirect.

A summary

“CMS” stands for Compact Muon Solenoid, a general-purpose particle physics detector built on the Large Hadron Collider. The CMS project posted a few comics which provide a nice, simple (if somewhat cheesy) explanation of what the CMS/LHC does.

The LHC generates massive amounts of data of all different varieties, which is distributed across a worldwide grid. It sends status messages to some of the computers, job monitoring info to other computers, bookkeeping info still elsewhere, and so on.

This means that each location has specialized queries it can do on the data it has, but up until now it’s been very difficult to query across the whole grid. Enter the Data Aggregation System, designed to allow anything to be queried across all of the machines.

How it works

The aggregation system uses MongoDB as a cache. It checks if Mongo has the aggregation the user is asking for and, if it does, returns it, otherwise the system does the aggregation and saves it to Mongo.

They query the system using a simple, SQL-like language which they transform into a MongoDB query. So, something like file="abc", run>10 becomes {"file" : "abc", "run" : {"$gt" : 10}}. (It’s not the same as SQL, but the code for this might be interesting to people who want to use SQL queries with MongoDB.)

Jasper22.NET

Holy Large Hadron Collider, Batman!

Archive

Random sites

Followers

Search This Blog