In prior blogs, we’ve discussed both our vision for the hybrid cloud and the data-centric approach required to achieve that vision. In our view (and we’re not alone in this view), it’s clear that high-performance, distributed file systems are needed to deliver the data mobility and manageability required for effective hybrid cloud solutions. And while there’s very little debate regarding the potential of a true hybrid cloud, a common question we hear is “But why file?”
“Why not block? Why not object?”
To explain why file is the most effective, most efficient, most logical interface, it’s useful to first take a look back…
Block vs. File vs. Object: A Brief History
For many years, IT administrators waged a philosophical war over storage architecture superiority. On one side, we had the supporters of block-based architectures, delivered as a SAN. On the other side, were supporters of file-based architectures, delivered as NAS. Ultimately, the conflict ended when most admins decided that they actually needed both. Once co-existence became the norm, things remained unchanged for quite a while, with block and file sharing the spotlight fairly equally. Then a few years ago, a new storage architecture was introduced to the market – the object store (Example: Amazon’s S3 buckets). Object stores quickly gained a following of their own and, today, most enterprises are using all three storage architectures.
Here’s a quick breakdown of the three architectures:
● Block-based (SAN) - Provides multiple entities (volumes or virtual disks), each comprised of arrays of blocks. These entities look like separate disks to the server. Block-based architectures provide both READ and WRITE block semantics and volume provisioning services.
● File-based (NAS) - Provides single or multiple hierarchical namespaces (directories) of entities (files), each comprised of arrays of bytes. The directories and files are very dynamic in nature. They can be easily created and/or deleted and can vary in number, size, and by the I/O patterns associated with the objects (e.g. number of writers/readers, streaming, etc.). File systems provide semantics for hierarchy manipulation, access control, file-level and ‘byte range’-level locking, and control of other file/directory attributes.
● Object-based - Provides storage services for arbitrary storage blobs (objects) within groups (buckets). Objects are immutable, support a single writer, multiple readers, are named using arbitrary user-defined keys, and have no explicit hierarchy. (Note that a “weak” hierarchy can be implemented using key patterns.) S3-compatible object stores support versions (i.e. multiple instances of objects under the same name), atomic HEAD manipulations (e.g. “Copy Object”), access control, and security features. In addition, multi-site replication (with “eventual consistency”) and tiering are supported by most modern object-based systems.
In addition, the common view of the pros and cons of each can be summarized as follows:
○ Pros: Robust, stable, predictable, high performance, platform-independent
○ Cons: Difficult to use at the application level, sharing is difficult, scalability is very limited, effective management requires deep knowledge of storage (hence, not conducive to easy self-service).
○ Pros: De-facto standard application-level storage interface, supports all I/O patterns, easy user-level provisioning, efficient data management (hierarchies, quota management, etc.). High granularity (at the application level) enables precise and efficient data services (backup, replication, tiering, indexing, etc.).
○ Cons: Less performant and predictable than block services, limited in scale (in terms of number of entities), no standard for copy services, protocol-level complexities.
○ Pros: Integration with cloud-centric applications, easy to use, highly scalable, minimal administration required, very cost-efficient, integrated copy services (versions, multi-site, tiering).
○ Cons: Only covers a subset of storage use cases (in particular, object cannot support transactional I/O patterns), weak semantics shift complexity to the application and application developer, incompatible with most existing applications.
At a very high level, the typical enterprise usage can be described as follows:
● Block: High performance applications, backend for virtual disks
● File: General purpose, file/data sharing, capacity pools
● Object: Nearline access, archiving, backup, interface to cloud storage, cloud-centric apps
To gain insight into the relative value associated with these architectures, we can review the pricing offered by the leading expert in value pricing for storage (IMHO) - Amazon:
*Publicly posted pricing as of 10/28/2016
As evidenced by its popularity and by the pricing above, the capability delivered by file systems is valued very highly…and for good reason. Relative to object and block, file is a much more flexible, broadly applicable interface. As a result, file-based data management aligns very well with the requirements of a scale-out, heterogeneous environment…e.g. a hybrid cloud.
A few important things to note:
1. Block-based entities (volumes, virtual disks) are very rarely used directly by applications. In fact, almost all volumes are used to support a file system on top of them. Even databases that used to consume raw volumes are now commonly using (dedicated) file systems to simplify management.
2. Objects cannot fully replace files. In particular, they do not effectively support use cases that require frequent creation/removal of small files and/or random access within files.
3. File-level semantics provide the best general purpose interface. If only the scalability (number of entities) and performance issues could be solved, file would be the perfect solution. This makes one wonder...
The Emergence of File-over-“Object” Architectures
As I mentioned at the start, we’re not the only ones who recognize the importance and transformative potential of file. In an attempt to leverage the power of file-level semantics, a new category of file system has emerged over the past decade. These file systems expose a file interface, but internally manage entities with reduced semantics. These entities are often referred to as “objects”, but they are not the same as the S3-style objects described above. In most cases, such file-over-object systems are limited in both their scalability and in their ability to support sharing. They typically support reduced hierarchical capabilities (e.g. limited numbers of directories and files, limited numbers of files per directory, limited directory nesting, etc.) and reduced semantics (in most cases, a single writer with multiple readers…though sometimes only supporting a single reader).
In addition, these architectures are tuned to a specific “object” pattern and typically expect a small number of large files, where each file will sustain a significant number of I/Os. Patterns that utilize the file system hierarchies and/or elasticity (e.g. patterns that create and remove lots of files) cannot be practically supported. The main use case for these limited file-over-object systems is as a virtualization backend (i.e. implementing virtual disks). These systems provide varying tradeoffs between ease of use and read/write efficiency. HOWEVER, the pattern that they serve is a block-centric pattern (perhaps with some extensions) and not a full-fledged file pattern. Therefore, these solutions are used as SAN replacements, and not as true filers or NAS. In other words, the applications are using them as block devices, and not to provide file services.
In summary, these solutions enable a file interface, but offer only a subset of file functionality. Worse, they are still hampered by the same scalability and performance limitations associated with traditional file systems….so not well-aligned with the needs of a hybrid cloud.
Note: I’ve sometimes seen these file-over-object file systems referred to as “Infrastructure File Systems”, with the more generic, full-fledged file systems referred to as “User-Level File Systems”. I love this distinction, but I think a more accurate name for the full-fledged file systems would be “Application-Level File Systems”. Note that there also attempts to provide full (or nearly so) file system semantics over “objects” (e.g. Ceph’s file mode, Swift).
Application-Level File Systems (True NAS): The Right Features…But Can They Scale?
As described above, the common perception is that file systems are limited in scalability (number of entities) and inferior to the performance of block-based architectures. Some even go so far as to state this as a “well-known fact”. In reality, the accurate part of this claim was the fact that the basic block data path was simpler (i.e. shorter) than the basic file system data path. This was true because file system entities are much more dynamic and, therefore, need more complex mapping structures and more complex metadata management.
However, during the last decade, the data path discrepancy has been considerably reduced because most block-based enterprise solutions have also embedded storage services (e.g. snapshots, clones, sync/async replication). These storage services make the block data path much longer and more complicated.
In addition, as enterprise flash becomes more and more mainstream, the more complex mapping structures required by file (e.g. those that cause many small, random accesses) have become feasible to support with acceptable cost.
Finally, most modern deployments achieve scalability by scaling out (as opposed to scaling up). Recent technology advances such as faster interconnects (10/24/40/100+ GbE) and more efficient, industry-proven scaling techniques (heavily influenced by the innovations of large web-scale companies), make it much easier (but certainly still not easy!) to build a full-featured scale-out system with nearly linear performance scaling. It’s just a matter of time before such systems become publicly available. When they do, our vision for the hybrid cloud will be one (massive) step closer to reality.
[Note: We aren’t the only ones who feel this way. Check out Steve Duplessie’s recent video blog “Back to the Future – The Case For The Universal Distributed File System”]
A Final Note Regarding “File vs. Block” Performance Comparisons
Comparing file and block performance is very tricky. First of all, file semantics are much richer, so there are many things that you just can’t do (directly) on block devices (e.g. “create a directory”). But even if you limit yourself to block patterns (i.e. simple reading and writing of blocks), you should always measure the performance from the application point of view.
For example, let's say you want to test a storage system as a candidate storage backend for MongoDB. Since MongoDB is deployed over a file system, you should always measure the performance over the file system. In other words:
● If you are using block services (or an “infrastructure file system”) and deploy a local file system on top of it, be sure to measure the performance over the local file system and not directly over the block device. The local file system will likely introduce significant I/O overhead (as a result of I/O amplification) and increase the latency. Note that most file systems create significant overhead when handling random I/O, especially for random writes with small blocks…this overhead can easily double both the total amount of I/O and the latency!
● If you use direct file services (i.e. via an application-level file system), then you would (and should) naturally test your performance over the file system without any other additions.