GENESIS II DISTRIBUTION

Introduction

The Genesis II distribution provides a rich set of features, tools, interfaces, and services for participating in a Grid. Before you install and configure your Genesis II distribution, you should consider the features afforded by the Genesis II distribution and the activities you wish to perform. The choices between the various activities will impact both the configuration work and security implications for your distribution. This primer is intended to provide informative background information to help in the setup, configuration, and usage of your Genesis II distribution. Links to more-detailed information on various topics/features are provided.

There are four basic activities that a computer may perform in a Genesis II Grid:

RoleActivities
ClientAccess remote data as a client. Access remote compute resources (i.e., run jobs).
ServerHost and share (publish) data into the Grid. Serve as a BES job hosting endpoint (i.e., run other peoples jobs on your machine).

Genesis II participants performing the first two activities are considered as having client roles because they do not expose data or computing resources into the Grid. Participants exposing data and computing resources are characterized as having server roles.

The client and server roles are not mutually exclusive for a given Genesis II distribution. For example, you may choose to access remote compute and data resources and share some of your data, but not allow others to use your computing resources. However, the configuration options and requirements will likely be influenced by the role(s) you intend your distribution to perform.

Background

In this section we present some general information about the Genesis II distribution and the technologies that affect its usage.

Web Services

Genesis II is architected upon the Web Services paradigm. Interaction with Grid resources is achieved by communicating XML messages in accordance with the SOAP protocol over an HTTP or HTTPS network transport.

The distinction between client and server roles is largely an artifact of the Web Services paradigm. Grid resources are exposed via Web Service operations, and must be hosted within a lightweight “web application container” (Genesis II Container) that listens for incoming HTTP/HTTPS connections and manages a stateful database. Therefore if you would like to use your Genesis II distribution to provision computing or data resources within the Grid, your distribution must operate a Genesis II Container that serves incoming Web service requests for your resources. If you choose to operate a Genesis II Container, you may be subject to additional networking requirements. (See the NAT and Firewall topics below). On the other hand, if you only wish to access remote computing and data resources, operating a Genesis II Container is not necessary.

NAT

NAT stands for “Network Address Translation”. A NAT sits between one or more hosts and the internet. NAT’s are typically used to multiplex multiple hosts or devices to a single IP address. Most home networks in the U.S. are behind a NAT. If your machine is behind a NAT it means that it does not have a globally addressable IP address. In other words, other machines cannot initiate a conversation with your machine. You can tell if you are behind a NAT using the program ipconfig in Windows. If your address starts with a 10 or 192, e.g., 192.168.1.1, you are behind a NAT. Most home networks are behind a NAT. Note: without special effort you cannot operate a Genesis II Container behind a NAT: your distribution can perform client activities, but not server activities.

Firewall

A firewall is a hardware or software component that filters incoming network packets. If you are behind a firewall you will need to have your firewall administrator open port XXX to your machine if you wish to be a Genesis II Container. In order to open the Windows firewall on Windows machines you must have administrator privileges on a Windows machine to run a Genesis II Container. Note that NATs and firewalls do not usually affect the ability to be a Genesis II client.

RNS Namespace

RNS stands for the “Resource Namespace Service” specification. RNS is a standard way of building directory structures in Web Services. An RNS namespace allows us to hierarchically organize Grid resources using familiar path strings, such as /home/grimshaw/myfile.

When you “connect” to a Grid, you are effectively telling your Genesis II distribution the location and identity of the “root” RNS directory.

Client Role

The Genesis II distribution provides a variety of tools and interfaces for accessing remote data and computing resources in the Grid from the client host. In this section, we address the activities of (a) accessing filesystem-like data and (b) running and managing remote computing jobs.

Access to Data Resources

Genesis II uses the RNS and ByteIO standards to provide familiar directory and file interfaces for datagrid resources. Genesis II provides four primary interfaces for creating, browsing, and accessing remote Grid data:

Grid ShellThe Grid Shell is a command-line tool that provides a text-based interface to the Grid RNS namespace, similar to most Unix and Windows command shells.The Grid Shell is invoked by invoking the Genesis II “grid” script from a Windows or Linux command shell. The Grid Shell persists a “session” of grid credentials and the “present working directory” in the Grid RNS namespace, and supports a number of familiar commands such as ls, cd, mkdir, cp, login, run, qsub, and a simple scripting language.
Local FTPGenesis II distributions can operate a simple, easy-to-use FTP daemon from which local FTP clients (e.g., Windows Explorer) can access the Grid namespace and file-like data resources. The Genesis II FTP daemon serves as a proxy between local FTP clients and the Grid. The security concerns of FTP are addressed by the restriction that FTP access is only granted to local programs.
Windows Installable Filesystem (IFS)The Genesis II Installable File System for Windows XP allows users to map the Grid RNS namespace as a Network Drive (e.g., the G: drive). Users already familiar with the traditional File Explorer interface provided in Windows XP will have little trouble adjusting to the IFS. When mounted as an IFS filesystem, Windows programs can transparently create, delete, read, and write Grid files and directories. The IFS also provides a convenient interface for copying Grid data to/from the local filesystem.
OGRSH Linux I/O ShimOGRSH is a Linux “library shim” that traps library calls made by existing applications and redirects them to the Genesis II Grid. Native programs and applications (e.g., bash, cp, ls, etc.) can transparently interface with remote Grid resources when run “on top of” OGRSH.

Access to Computing Resources

Applications can be executed on remote computing hardware using Basic Execution Service (BES) Grid resources. BES is a standard Web Services interface for creating and managing jobs. To run a job on a BES resource, you need a JSDL (XML) document that describes your application in terms of where to obtain the program executable and input data sources, and where to place any results. Genesis II Grids may also contain Queue resources that can be used to throttle and schedule jobs on multiple BES resources.

BES and Queue resources are typically linked into the Grid’s RNS namespace. Users can run jobs on a specific BES/Queue resources using one of several interfaces:

  • The Grid Shell’s run command. The primary arguments of the run command are the location of a JSDL document and the RNS path to a BES resource.
  • The Grid Shell’s qsub command. The primary arguments of the qsub command are the location of a JSDL document and the RNS path to a Queue resource.
  • By “copying” a JSDL file “into” a Genesis II BES or Queue resource. What does this mean? Our Genesis II BES and Queue resources also implement the RNS directory interface, yet overload the normal file-creation semantics such that when a JSDL file is “copied into” a BES/Queue resource, the implication is that the job is to be submitted to that resource. Thus jobs can be started on Grid resources by simply using any one of “data access” interfaces from the previous section. For example, one can use the IFS interface to “drag-and-drop” JSDL files into BES resources.

Because BES and Queue resources implement the RNS directory interface, their contents can also be “listed” using any of the namespace-browsing interfaces from the previous sections. In their case, the semantics of a directory-listing request are to return the job activities currently managed by that BES/Queue resource. The Genesis II Queue resources also can be manipulated using a variety of command line tools (e.g., qstat, qlist, qkill, qcomplete, etc.).

More information regarding the deployment and execution of jobs can be found in the Genesis II FAQ.

Server Role

When hosting data or computing resources, a distribution must operate a Genesis II Container. The Genesis II Container is a lightweight web application container that responds to Web Services messages on incoming HTTP/HTTPS connections.

Hosting arbitrary Grid Resources

By operating a Genesis II Container, you can allow Grid resources (e.g., job Queues, RNS and ByteIO resources, IDP resources, etc.) to be hosted by your container. The ability to create such resources is dependent on the Genesis Security access-control policies that you configure, of course. For example, you can use the mkdir Grid Shell tool to place a new subdirectory on a particular Genesis II Container. Any RNS and ByteIO resources created within that subdirectory will also be located within your Genesis II Container.

Hosting and Sharing Data

In many cases, the file and directory data that you wish to share already exists on a local filesystem. In this case, it is much more efficient to “share” the data directly into the Grid, as shown below.

What does it mean to “share” or “export” data? The basic idea is really simple. Suppose you have a directory (also called a “folder” in Windows) on your hard drive that you want to share with a friend or colleague. You can use the Genesis II export? Grid Shell command. Invoked without any command-line options, the export command starts up a simple GUI (which requires X-windows Linux environments). The GUI allows you to specify a local directory path that you want to export, and the directory path where you want to place it in the global RNS name space. The process is similar to mapping/mounting a drive, except that you are mounting a portion of your local filesystem into the Grid namespace.

Once exported, the directory and its contents can be manipulated both via local users and via Grid users (see Section 3). Updates made by local users are visible by remote users, and vice versa. Access by remote users is governed by mechanisms. The carried by the user-principal who performed the export are given initial, exclusive grid access to the exported resources. After performing an export, you will typically modify the access-control policies for the newly-exported resources to specify which users and groups can read, write, and execute (i.e., make subdirectories) these exported resources.

Hosting Computing Resources

By default, a BES resource is also started within a Genesis II Container. Those users who do not wish to allow jobs to be run on their machine need not worry: the BES resource is inert by default. Access control to BES resource is set to the user who executed the installation, and may be extended to other users (or services, such as Queues). If such access is not extended, no jobs may be scheduled on it.