Access to Non File System Resources

Not all resources are directories and flat files. The GFFS reflects this by facilitating the inclusion of non-file system data in much the same manner as Plan 9 [1]; any resource type can be modeled as a file or directory, compute resources, databases, running jobs, and communications channel.

Computer resources: A compute resource, such as a PBS controlled cluster, can be modeled as a directory (folder). To start a job, simply drag or copy a JSDL XML file describing the job into the directory. The job will then logically begin executing. (We say logically because on some resources such as queuing systems it is scheduled for execution.) A directory listing of the folder will show sub-folders for each of the jobs “in” the compute resource. (A similar concept was first introduced with Choices [2, 3].) Within each job folder is a text file with the job status, e.g., Running or Finished, and a subfolder that is the current working directory of the job with all of the intermediate job files, input files, output files, stdout, stderr, etc. The user can interact with the files in the working directory while the job executes, both reading them to monitor execution and writing them to steer computation. Recall that in our earlier example Sarah’s group had a local compute cluster. Sarah could also export her compute cluster into the GFFS as a shared compute resource and give Bob’s group access to the cluster. Bob’s group could then use Sarah’s resource without needing special accounts, and without having to login to Sarah’s machines. If Bob too had a cluster, he could export that cluster into the GFFS. They could then create a shared Grid Queue that includes both of their clusters that would load balance jobs between the two resources – effectively creating a mini-compute grid.

RDBMS: Relational databases can similarly be modeled as a folder or directory containing a set of tables1 . Each sub-table is itself a folder that contains sub tables (created by executing queries against it) and a text file that can be used as a CSV text representation of the table. Queries can be executed by copying or dragging a text file with a SQL query into the folder. The result of the query is itself a new sub folder.

Named pipes: Often two more applications need to communicate. Traditionally applications can communicate via files in the file system, e.g., application A writes file A_output and application B reads the file, or via message passing [4, 5] or sockets of some kind, e.g., open a TCP connection to a well-known address and send bytes down the channel. In Unix for programs started on the same machine, pipes are often also used. Unfortunately, in wide-area distributed systems, many resources are behind NATs and firewalls and simply opening a socket is not always an easy option.

To address this problem the GFFS supports named pipes. GFFS named pipes are analogous to their Unix counterparts; they are buffered streams of bytes. Named pipes appear in the namespace just as any other file, and have access control like any other file. As with Unix named pipes GFFS named pipes may have many readers and writers, though the same caveats apply. Thus, an application can create a named pipe at a well known location and then read from it, awaiting another application to write to it.

Return to GFFS page

Footnote

1 This capability has been demonstrated, but is not ready for production use.

References