Three Examples Illustrate GFFS Typical Uses Cases

  • Accessing data at an NSF center from a home or campus
  • Accessing data on a campus machine from an NSF center
  • Directly sharing data with a collaborator at another institution

For each of these three examples suppose that Sarah is an Extreme Science and Engineering Discovery Environment (XSEDE) user at Big State U and her students regularly runs jobs on Ranger at TACC. She and her students run many of the same sorts of jobs (though much smaller) on their local cluster, and they do software and script development on their local cluster. The software consists of a workflow (pipeline) comprised of a number of programs that generate intermediate results used in subsequent stages of the pipeline. Further, Sarah and her students frequently need to check on the pipeline as it is executing by examining or visualizing intermediate files.

Accessing data at an NSF center from a home or campus

Using the GFFS Sarah and her students can export their home directories or scratch directories at TACC into the global namespace. They can then mount the GFFS on their Linux workstations and on their cluster nodes. This permits them to directly edit, view, and visualize application parameter files, input files, intermediate files, and final output files directly from their desktop. Further, they can start local applications that can monitor application progress (by checking for files in a directory, or scanning an output file) all in real time against the actual data at TACC. There is no need to explicitly transfer (copy) files back and forth, nor is there any need to keep track of which version of which file has been copied –consistency with the data at TACC is assured.

Accessing data on a campus machine from an NSF center

Similarly, Sarah and her students can directly access files on their clusters and desktops at Big State U. directly from the centers. This means they can keep one set of sources, makefiles, and scripts at Big State U., and compile and execute against them from any of the NSF service providers. For example, suppose that Sarah’s group keeps their sources and scripts in the directory /home/Sarah/sources on her departmental file server. She, could export /home/Sarah/sources into the GFFS and access it in scripts or at the command line from any of the service providers. Any changes made to the files, either at Big State U, or at any of the service providers, will be immediately visible to GFFS users, including her own jobs and scripts running at Big State U or any of the service providers1 .

Next, consider the case when Sarah’s lab has an instrument that generates data files from experiments and places them in a local directory. As is so often the case, suppose the instrument comes with a Windows computer onto which the data is dumped. Sarah could export the directory in which the data is placed by the instrument, e.g., c:\labMaster-1000\outfiles into the GFFS. The data will then be directly accessible not only at her home institution, but also at the service providers, without any need to copy the data explicitly.

Sharing data with a collaborator at another institution

Finally consider the case of a multi-institution collaboration in which Sarah is collaborating with a team led by Bob at Small-State-U. Suppose Bob’s team is developing and maintaining some of the applications used in the workflow. Suppose that Bob’s team also needs to access both Sarah’s instrument data as well as the data her team has generated at TACC. First, Bob can export his source and binary trees into the GFFS and give Sarah and her team access to the directories. Sarah can similarly give Bob and his team access to the necessary directories in the GFFS. Bob can then directly access Sarah’s data both at Big-State-U and at TACC. An interesting aspect is that, Bob accessing Sarah’s data at Big-State-U, and Sarah accessing Bob’s code at Small-State-U does not necessarily involve XSEDE at all though they are using the XSEDE-provided GFFS as a medium.

Return to GFFS page

Footnote

1 Access from login nodes is assured. The GFFS is not always accessible from the compute nodes.