post-image

The OrangeFS Project, an Open Source Product with a Research Outreach


(reprint)…

More and more businesses rely on data-driven decisions, simulations, models and customer interaction. More of these data-driven requirements need faster and faster responses, even real-time, and this demand is stressing traditional storage architectures. Historically, storage systems and protocols have single points of ingress and egress for data. These single pinch-points are increasingly becoming bottlenecks, limiting the ability to process data in a timely fashion.

The High Performance Computing community has faced these problems since the inception of the Beowulf cluster, the original concept of tying several commodity computers together to distribute computational workloads. As these initial systems grew, the demand on protocols such as NFS (Network File System) could not sustain the demand. So, similarly, as the computational workloads were distributed, the idea of distributing the demand on the storage created the original parallel file systems targeted at larger IO workloads found on the Beowulf clusters.

These concepts found their way into more of what we now consider mainstream computational infrastructure, including database sharding and storage systems used by large internet service companies. We even see some of the more popular parallel file systems attempting to make inroads into mainstream computing.

At the other end of the spectrum, a large portion of mainstream computing is looking like it may head in the direction where high performance computing has historically been, but with some twists: legacy and infancy. Legacy faces many of the disciplines where the codes that are run to fulfill the data-driven requirements historically ran on desktop computers. Many of these codes are in their infancy in the parallel world, and they are being updated to meet parallel computation needs, but the legacy of how IO works is often overlooked.

Reading and writing a small amount of data over large data sets puts extreme loads on storage architectures, traditional and parallel. These unoptimized IO patterns are harder on parallel architectures from the standpoint of metadata work vs. file IO work. The traditional models of storage with the single point of ingress/egress can’t keep up if the IO, large or small, exceeds the capability of that single point. Sounds like a rock and a hard place to me.

Things can be improved by throwing hardware at the problem. SSD or other future storage with memory-like speeds and persistent characteristics, and low latency interconnects such as Infiniband also provide a boost at both the metadata and file IO layers. But the biggest gains will be in the storage software itself. System level software such as file systems evolves over years, not in months as in the web-driven world, compounding the problem.

I have heard people, including those from research funding areas, say that storage is solved and the problems are up the stack with integration and applications. This is not to say that work does not need to be done at those layers. All I am proposing is to not forget the storage layer that feeds the computational engine. Maybe I should call it the gas tank, battery or fuel cell; either way, you won’t get there without it. Continuing R & D into storage software is a huge need.

The OrangeFS project, historically PVFS, is such a project. It’s built around an open source agenda, providing production-ready, high-performing, distributed file storage, but it also embraces research, with an extremely modular and extensible design (since it was rewritten with PVFS version 2). The community has built many features into the project since its rebirth. These include distributed metadata, not only just for files, but also for directory entries, diverse client access, including a new upstream (as of version 4.6 of the Linux kernel) client, windows client, HCFS (Hadoop Compatible) interface as well as others such as WebDAV. Recently OrangeFS has also been extended to support multiple databases for metadata, including BDB and LMDB. The EMC Fast Data Group, with 2TIERS, is working on data movers between the file system and object storage. They are also working to help optimize single process IO for workloads such as 4K & 8K raw video streaming and processing.

With all this progress, there is still much more that needs to be figured out to improve a foundational element that feeds computation: storage. Some of the larger problems that still need to be resolved include:

- How can we improve the small IO performance for distributed storage without affecting large IO performance?
- How can we improve the throughput for a single process?
- How can we find methods of file access that are more efficient than relying on directory structures?
- How can we improve security while still increasing performance?

While we have ideas to improve these areas and the pace at which that happens requires a broader community to work on these problems. The broader community can help through contributing funding, time and collaboration to projects such as OrangeFS. The more people that work on the improvements will significantly impact storage technology, for everyone, through open collaboration.

This is a “reprint” of the Linkedin article

Back to blog

CLOUDYCLUSTER MILLION+ CORE RUN ON AWS

Read more about it in Jeff Barrs AWS Blog

1.1m vCPU on AWS Blog