This is a mirror of official site: http://jasper-net.blogspot.com/

GFS the Google File System in 199 Lines of Python

| Thursday, November 18, 2010
GFS, the Google File System, sits as the backbone of the entire Google infrastructure. However, for many it is a mystery, especially for those lucky enough to be more acquainted with high-level python code than low-level C operating system sources. But have no fear, we shall break through the veil and describe an implementation of GFS in 199 lines of python. Naturally, you may want to read about the theory and design of GFS in the original Google Research GFS paper. But we will aim to give the core concepts, with real working python code, in this article. Complete runnable python 2.6 source code, if you wish it, is available at the end of this post.

A brief summary of GFS is as follows. GFS consists of three components: a client, a master, and one or more chunkservers. The client is the only user-visible, that is programmer-accessible, part of the system. It functions similarly to a standard POSIX file library. The master is a single server that holds all metadata for the filesystem. By metadata we mean the information about each file, its constituent components called chunks, and the location of these chunks on various chunkservers. The chunkservers are where the actual data is stored, and the vast majority of network traffic takes place between the client and the chunkservers, to avoid the master as a bottleneck. We will give more detailed descriptions below by going through the GFS client, master, and chunkserver as implemented in python classes, and close with a test script and its output.

Read more: Python Cloud DB

Posted via email from .NET Info

0 comments: