GFFS – Global Federated File System

The GFFS was born out of a need to access and manipulate remote resources such as file systems in a federated, secure, standardized, scalable, and transparent manner without requiring either data owners or applications developers and users to change how they store and access data in any way.

The GFFS accomplishes this by employing a global path-based namespace, e.g., /data/bio/file1. Data in existing file systems, whether they are Windows file systems, MacOS file systems, AFS, Linux, or Lustre file systems can then be exported, or linked into the global namespace. For example, a user could export a local rooted directory structure on their “C” drive, C:\work\collaboration-with-Bob, into the global namespace at /data/bio/project-Phil. Files and directories on the user’s “C” drive in \work\collaboration-with-bob would then, subject to access control, be accessible to users in the GFFS via the /data/bio/project-Bob path.

Transparent access to data (and resources more generally) is realized by using OS-specific file system drivers that understand the underlying standard security, directory, and file access protocols employed by the GFFS. These file system drivers map the GFFS global namespace onto a local file system mount. Data and other resources in the GFFS can then be accessed exactly the same way local files and directories are accessed – applications cannot tell the difference.

PLEASE NOTE: XSEDE is currently under development. Once deployed, it will be implemented as described below.

Three examples of GFFS Typical Use Cases

Three cases illustrate GFFS typical uses cases, accessing data at an NSF center from a home or campus, accessing data on a campus machine from an NSF center, and directly sharing data with a collaborator at another institution. For each of these three examples suppose that Sarah is an Extreme Science and Engineering Discovery Environment (XSEDE) user at Big State U and her students regularly runs jobs on Ranger at TACC. (See Figure 1)

Access to non file system resources

Not all resources are directories and flat files. The GFFS reflects this by facilitating the inclusion of non-file system data; any resource type can be modeled as a file or directory, compute resources, databases, running jobs, and communications channel. (See Figure 2)

An Aside on GFFS Goals and Non-Goals

The complexity of sharing resources between researchers creates a barrier to resource sharing and collaboration – an activation energy if you like. Too often, the energy barrier is too high – and valuable collaborations, that could lead to breakthrough science, do not happen, or if they do, take much longer and cost more.

GFFS Implementation

The GFFS uses as its foundation standard protocols from the Open Grid Forum [8-20], OASIS [21-29], the W3C [30-32], and others [33]. As an open, standards-based system, any implementation can be used. The first realization of the GFFS at XSEDE is using the Genesis II implementation from the University of Virginia [34, 35]. Genesis II has been in continuous operation at the University of Virginia since 2007 in the Cross Campus Grid (XCG) [36]. In mid-2010, the XCG was extended to include FutureGrid resources at Indiana University, SDSC, and TACC.

Client Side – Accessing Resources

By “client-side”, we mean the users of resources in the GFFS. For example, a visualization application Sarah might run on her workstation that access files residing at an NSF service provider such as TACC.

Sharing Resources

There are many resource types that can be shared, file system resources, storage resources, relational databases, compute clusters, and running jobs.

File System Resources – a.k.a. exports

An export takes the specified rooted directory tree, maps it into the global namespace, and thus provides a means for non-local users to access data in the directory via the GFFS.

Storage Resources

New files and directories could be stored in the containers’ own databases and file system resources.

Compute Resources

Compute resources such as clusters, parallel machines, and desktop compute resources can be shard in a similar manner. For example to create an OGSA-BES resource that proxies a PBS queue.

References