This is a mirror of official site: http://jasper-net.blogspot.com/

CloudFFS : Large scale, high performance files storage system

| Sunday, January 30, 2011
CloudFFS

So we built and deployed this new service not long ago at work, and @stelabouras suggested we document some parts of it for internal consumption. Given that I haven't blogged for months, I thought I 'd just pour those words here instead.

CloudFFS (yes, it is a funny name) is a file-system (but not in the traditional sense, it doesn't hook into the kernel VFS layer or anything ) that provides storage for unbound number of files and very fast access to them over HTTP. It can manage PB scale volumes and up to 2^64 files per namespace(see below).

We have hundreds of millions of static files(images, video, text files, you name it) stored across our storage devices; having to deal with those many files is not a pleasant task, for our sys.operators and developers alike. We wanted a solution that frees our developers from having to worry about storage and provides a very simple way to store and retrieve files, and at the same time help our systems guys deal with backups and management of those files efficiently.

There are many problems associated with the use of multiple files. Wasted inodes/disk blocks, slower access time (iterating a path components is not free, looking up a directory entity within a directory is not free either), difficulty in making backups, need and use of elaborate directory naming schemes in order to deal with large directories, and more. In addition that, accessing those files over a network filesystem (e.g NFS) is not efficient by any means. Developers need to be aware of those limitations and of the rules that are in place in order to deal with said limitations, which places an unnecessary burden on them.

None of the solutions we looked into really seemed all that great for us, so we went ahead and build our own. Though, to be fair, we almost always end up building our own anyway. This practice has worked great for us all those years and given that we are a technology company, it makes sense for us to disregard the 'not invented here' approach.

Data Model

Files are uniquely identified by a 64bit number. They belong in namespaces, for example 'blogs', or 'images, or 'mails'. A file can hold up to 1GB of data. Files can also be either public, or private. Public files can be accessed directly (e.g ), whereas private files require HTTP authentication. This makes it possible to, say, make everything accessible over the public Web, except files that should not be accessible in that fashion (e.g log files, archived content, emails, etc ).

Read more: CloudFFS

Posted via email from Jasper-net