Genesis II Omnibus Reference Manual

With Focus on XSEDE EMS & GFFS

Version 10.9 ― Updated June 1 2016

 


 


Table of Contents

A.       Welcome to Grid Computing. 13

B.       Document Overview... 14

B.1.        Intended Readers. 14

B.2.        Author List. 15

B.3.        Document Notation.. 15

B.4.        Common Environment Variables Referenced in the Document. 15

B.4.1.       HOME.. 16

B.4.2.       TMP.. 16

B.4.3.       JAVA_HOME.. 16

B.4.4.       GENII_INSTALL_DIR.. 16

B.4.5.       GFFS_TOOLKIT_ROOT.. 16

B.4.6.       GENII_USER_DIR.. 16

B.4.7.       The “grid” Command.. 17

B.5.        Glossary.. 17

C.       Introductory Topics. 21

C.1.         Learning the Grid Metaphor. 21

C.2.         System Administrator Tutorial 21

C.3.         Public Key Infrastructure (PKI) Tutorial 21

C.4.         Documentation Updates. 21

D.      Genesis II Installation.. 23

D.1.        Installer User Considerations. 23

D.2.        Installing the Grid Client Using the Graphical Installer. 23

D.3.        Installing the Grid Client at the Command-Line. 26

D.4.        Installing a Grid Container Using the Graphical Installer. 27

D.4.1.       OpenSSL Conversion.. 30

D.5.        Installing a Grid Container at the Command-Line. 31

D.6.        Automatic Container Start-up.. 32

D.6.1.       System-wide Installation Start-up.. 32

D.6.2.       Personal Installation Start-up.. 32

D.7.        Installation on Linux Using an RPM or DEB Package. 33

D.7.1.       Installation.. 33

D.7.2.       Upgrading RPM or Debian packages. 33

D.7.3.       Installation to Different Location.. 34

D.7.4.       RPM and DEB Upgrade Caution.. 34

D.8.        Unified Configuration for Containers. 34

D.8.1.       Creating a New Container That Uses a Unified Configuration.. 35

D.8.2.       Converting a Container in Split Configuration Mode to a Unified Configuration.. 36

D.8.3.       Changing Container’s Installation Source for Unified Configuration.. 37

D.8.4.       Updating Unified Configuration Container after Upgrade. 37

D.8.5.       Using a Grid Deployment Override. 38

D.8.6.       Unified Configuration Structure. 39

D.8.7.       Converting a Source-Based Container to a Unified Configuration.. 40

E.       Grid Usage Topics. 41

E.1.        Grid Basics. 41

E.1.1.       Built-in Help.. 41

E.1.2.       Essential Commands. 42

E.2.        Authentication and Authorization.. 42

E.2.1.       Credentials Wallet. 43

E.2.2.       How to Login & Logout. 43

E.2.3.       Grid Access Control Lists (ACLs). 46

E.3.        Data Files. 48

E.3.1.       Copying Data Into and Out of the GFFS. 48

E.3.2.       Exporting Local Filesystems to the Grid.. 52

E.3.3.       How to Mount the GFFS via a FUSE Filesystem... 52

E.3.4.       Other Staging Methods for Data Files. 54

E.4.        Grid Commands. 55

E.4.1.       Grid Command Set. 55

E.4.2.       Grid Paths: Local vs. RNS. 55

E.4.3.       Scripting the Grid Client. 56

E.4.4.       XScript Command Files. 57

E.5.        Submitting Jobs. 61

E.5.1.       How to Create a JSDL file. 61

E.5.2.       Using the Job-Tool 61

E.5.3.       Submitting a Job to a Grid Queue. 63

E.5.4.       Controlling or Canceling Jobs in a Queue. 63

E.5.5.       Cleaning Up Finished Jobs. 64

E.5.6.       The Queue Manager in Client-UI. 64

E.5.7.       Job Submission Point. 66

E.5.8.       Submitting a Job Directly to a BES. 67

E.5.9.       How to Run an MPI Job.. 67

E.6.        Client GUI. 69

E.6.1.       Client GUI Basics. 69

E.6.2.      Credential Management 70

E.6.3.       Client UI Panels and Menus. 73

E.6.4.       Drag-and-Drop Feature. 92

E.6.5.       File associations. 92

E.7.        Fastgrid Command.. 93

F.       Grid Configuration.. 95

F.1.         Structure of the GFFS. 95

F.1.1.        Linking a Container into a Grid.. 96

F.2.         Deployment of a New GFFS Grid.. 97

F.2.1.        Preparing the Environment for Generating Deployments. 97

F.2.2.        Creating the GFFS Root Deployment. 99

F.2.3.        Changing a Container’s Administrative or Owner Certificate. 103

F.2.4.        XSEDE Trust Store Customization.. 104

F.2.5.        Detailed Deployment Information.. 105

F.2.6.        Certificate Revocation Management (CRL files). 108

F.3.         Grid Containers. 109

F.3.1.        Container Structure. 110

F.3.2.        Where Do My Files Really Live?. 113

F.3.3.        Serving GFFS Folders from a Specific Container. 114

F.3.4.        Container Network Security.. 114

F.3.5.        Container Resource Identity.. 115

F.3.6.        User Quota Configuration.. 116

F.3.7.        Genesis Database Management. 117

F.4.         Grid Queues. 121

F.4.1.        Creating a Genesis II Queue. 121

F.4.2.        Linking a BES as a Queue Resource. 122

F.5.         Basic Execution Services (BES). 122

F.5.1.        How to Create a Fork/Exec BES. 123

F.5.2.        Running a BES Container With Sudo.. 123

F.6.         Grid Inter-Operation.. 128

F.6.1.        How to Create a BES using Construction Properties. 128

F.6.2.        Adding a PBS Queue to a Genesis II Queue. 131

F.6.3.        Adding a UNICORE BES to a Genesis II queue. 131

F.6.4.        Adding an MPI Cluster to a Grid Queue. 134

F.6.5.        Establishing Campus Bridging Configurations. 136

G.       Grid Management. 138

G.1.        User and Group Management. 138

G.1.1.       Creating Grid Users. 138

G.1.2.       Creating a Group.. 138

G.1.3.       Adding a User to a Group.. 139

G.1.4.       Removing a User from a Group.. 139

G.1.5.       Removing a User. 139

G.1.6.       Removing a Group.. 140

G.1.7.       Changing a User's Password.. 141

G.1.8.       Using a Kerberos STS. 141

G.1.9.       Creating XSEDE Compatible Users. 142

G.1.10.     Configuring Kerberos Authorization on a Container. 143

G.1.11.     Setting Up an InCommon STS. 145

G.2.        Container Management. 146

G.2.1.       How to Stop a Grid Container. 146

G.2.2.       How to Start a Grid Container. 147

G.2.3.       How to Backup a Genesis II Grid Container. 147

G.2.4.       How to Restore a Genesis II Grid Container. 148

G.2.5.       Replication of GFFS Assets. 150

G.3.        RNS & ByteIO Caching. 156

G.4.        Grid Accounting. 156

G.4.1.       Accounting Prerequisites. 157

G.4.2.       Background.. 157

G.4.3.       Accounting Database. 157

G.4.4.       Denormalized accounting data for usage graphs. 158

G.4.5.       The denormalization process. 158

G.4.6.       Linking the Accounting Database Into the Grid.. 160

G.4.7.       Migrating Accounting Info to a New Grid.. 160

G.4.8.       Usage Graph Web Site. 161

G.4.9.       Database Table Structure for Accounting. 162

G.4.10.     Creating the Accounting Database. 166

G.5.        Grid Inter-Operation.. 169

G.5.1.       Connecting a Foreign Grid.. 170

H.      XSEDE Development with Genesis II. 172

H.1.        Installing Java. 172

H.1.1.       Centos Build Dependencies. 173

H.1.2.       Ubuntu Build Dependencies. 173

H.2.        Getting the Genesis II Source Code. 173

H.3.        Building Genesis II from Source on the Command Line. 173

H.4.        Developing Genesis II in Eclipse. 174

H.4.1.       Getting Eclipse. 174

H.4.2.       Getting Subclipse. 174

H.4.3.       Eclipse Package Explorer. 175

H.4.4.       Ant Builds. 176

H.4.5.       User Directory for Genesis II Grid State. 177

H.4.6.       Run Configurations. 177

H.4.7.       Running Genesis II. 179

H.5.        Building Genesis II GFFS Installers. 179

H.6.        Genesis Debugging Tips. 180

H.6.1.       Jstack.. 180

H.6.2.       Yourkit Profiler. 180

H.7.        Editing the Container Database within Eclipse. 180

I.        Genesis II Testing via the GFFS Toolkit. 182

I.1.          Getting the GFFS Toolkit. 182

I.1.1.         Preparing the GFFS Toolkit on Linux. 182

I.1.2.         Preparing the GFFS Toolkit on Mac OS X.. 182

I.1.3.         Preparing the GFFS Toolkit on MS-Windows. 182

I.1.4.         Grid Permissions Required for Running the Tests. 184

I.2.          Running the GFFS Tests on a Grid.. 185

I.2.1.         Setting up the GFFS test suite. 185

I.2.2.         Initializing the test environment. 186

I.2.3.         How to Bootstrap a Miniature Test Grid.. 187

I.3.          Running the XSEDE Regression Test. 187

I.3.1.         What to Expect From the Test Run.. 188

I.3.2.         Reporting Bugs Seen in a Test. 189

I.4.          Helpful Notes for the Tests. 189

I.5.          More Information on the Bootstrap Process. 189

I.6.          Grid Administrator Steps to Enable Testing. 189

I.6.1.         Cloud VM for Replication Tests. 190

J.        Appendix 1:  FAQs and Troubleshooting Guide. 192

J.1.          Grid Client Problems. 192

J.1.1.         What does “Internal Genesis II Error -- Null Pointer Exception” mean?. 192

J.1.2.         Why Can't I Login With My Valid XSEDE Portal ID?. 192

J.1.3.         Why Does Client-UI Get out of Sync?. 193

J.1.4.         Why Is My Grid Client Logged in Differently than Client-UI?. 193

J.1.5.         How Can I Deal With Memory Problems?. 193

J.1.6.         Why Can’t the Installer or Grid Client Connect to the Grid?. 194

J.1.7.         What does “Unable to locate calling context information” mean?. 196

J.2.          Problems with the Grid Container. 196

J.2.1.         Why is My Container Consuming So Much CPU?. 196

J.2.2.         How Can I Improve File Transfer Bandwidth?. 196

J.2.3.         How Do I Open a Port in the Firewall for My Container?. 197

J.2.4.         How Can an Expired Container TLS Certificate Be Replaced?. 198

J.2.5.         How Can I Restart My Container on Windows?. 201

J.2.6.         How Can I Update the Construction Properties for a Genesis II BES?. 202

J.2.7.         Why is the GFFSContainer script missing after my RPM install?. 202

J.3.          How Can I Modify Container and Client Logging?. 202

J.4.          How Can I Compress My Container Database?. 204

J.4.1.         Configure Embedded Derby.. 204

J.4.2.         Verify Derby.. 205

J.4.3.         Start up ij 206

J.4.4.         Connect to GenesisII database. 206

J.4.5.         Execute SQL statements. 206

J.4.6.         Disconnect from a database. 206

J.4.7.         Exiting from ij 206

J.4.8.         Run SQL Scripts to compress GenesisII state directory.. 207

J.4.9.         Extracting Tables and Indexes from the DB.. 207

K.       Appendix 2: Genesis II Deployment Arcana. 209

K.1.        Intended Audience. 209

K.2.        Genesis II Model Overview... 209

K.2.1.       Genesis II “Object” Model 209

K.2.2.       Grid Containers, Services, and Resources. 210

K.2.3.       Global Namespace. 211

K.2.4.       Security Model 212

K.2.5.       Understanding a Genesis II Grid Container. 212

K.2.6.       Storage. 213

K.3.        Pre-Installation.. 218

K.4.        Installation.. 219

K.4.1.       Planning for Installation Storage. 219

K.4.2.       Run Installer. 220

K.4.3.       Start/Stop Grid Container. 221

K.5.        Post Installation: Configuring the Grid Container. 221

K.5.1.       Deployments and Deployment Inheritance. 221

K.5.2.       Grid Container Configuration Options. 222

K.6.        Post Installation: Setting Grid Container Service Permissions. 231

K.6.1.       VCGRContainerPortType Service. 231

K.6.2.       Other services. 231

K.6.3.       Tips. 232

K.7.        Creating and Configuring Genesis II BES Resources. 232

K.7.1.       BES Resource Attributes and Matching Parameters. 233

K.7.2.       Creating a BES Resource. 234

K.7.3.       Configuring Genesis II BES Resources. 235

K.7.4.       Genesis II BES Resource Security.. 239

K.8.        Creating and Configuring Genesis II Queue Resources. 241

K.8.1.       Creating a Hierarchy of Genesis II Queues. 242

K.9.        Structure of the Genesis II Installation Directory.. 242

K.9.1.       Configuration Directories/Files. 243

K.9.2.       Executable Code. 244

K.9.3.       Log Files. 244

K.9.4.       Supporting Libraries, Jar Files and Other Software. 245

K.10.      Structure of the Grid Container State Directory.. 245

K.10.1.     Grid Resource-Specific Storage. 246

K.11.      Cross-Campus Grid (XCG) Global Namespace. 248

K.12.      Security in the Grid.. 249

K.12.1.     Supported Security Tokens. 249

K.12.2.     Genesis II User/Group Resources and GAML Tokens. 249

L.       Appendix 3: XScript Language Reference. 250

L.1.         Introduction – What is XScript?. 250

L.2.         Namespaces. 251

L.3.         Running XScript Scripts. 251

L.4.         XScript Variables/Macros. 251

L.5.         XScript High-level Description.. 252

L.5.1.        Grid Command Elements. 252

L.5.2.        XScript Language Elements. 253

M.      GFFS Exports Explicated.. 267

M.1.       What is an Export?. 267

M.2.       Types of Exports. 268

M.2.1.      ACL Exports. 268

M.2.2.      ACLAndChown Exports. 268

M.2.3.      ProxyIO Exports. 268

M.3.       Users View - Exporting File Systems Directory Trees to the GFFS. 270

M.3.1.      Creating an Export in the GFFS. 270

M.3.2.      Lightweight vs. Heavyweight Exports. 271

M.3.3.      Setting Extended ACLs for ACL and ACLAndChown Exports. 272

M.3.4.      Preferred Identity Management. 273

M.4.       System Administrator Considerations. 275

M.4.1.      Establishing the Container's Export Configuration.. 275

M.4.2.      ACL Export Mode Configuration.. 276

M.4.3.      ACLAndChown Export Mode Configuration.. 276

M.4.4.      ProxyIO Export Mode Configuration.. 277

M.4.5.      Enabling Export Creation for Grid Users. 277

M.4.6.      Configuration of the Grid-Mapfile. 278

M.4.7.      Configuration of the “sudoers” File for Sudo Access. 279

M.4.8.      Creating Exports for Other Grid Users. 280

M.4.9.      Configuration Files and Scripts. 281

N.      Central Administrator’s Guide for the XSEDE Production Grid.. 286

N.1.        Installation Support. 286

N.2.        Deployment Perspective. 286

N.3.        Creating an XSEDE GFFS Grid of Four Central Containers. 286

N.3.1.       Deployment Prerequisites. 286

N.3.2.       Time and Effort Estimates. 290

N.3.3.       Deploying the XSEDE GFFS Central Containers. 290

N.4.        Updating the XSEDE GFFS Production Grid to SDIACT-149.. 297

N.4.1.       Update Installations to RPM Packages. 297

N.4.2.       Enable Users to Store Files on Root and Root Replica Containers. 298

N.4.3.       Add a Pattern-based ACL for MyProxy users. 298

N.4.4.       Configure Periodic Certificate Package Upload.. 299

N.5.        XSEDE GFFS Central Container Administrative Procedures. 299

N.5.1.       Container Startup/Shutdown.. 299

N.5.2.       Revert/Undo/Rollback.. 300

N.5.3.       Container Backup and Restore. 301

N.5.4.       Creating XSEDE User Accounts. 301

N.5.5.       Container log files. 305

N.5.6.       User and group management. 305

N.5.7.       Prepare for Service Provider GFFS Deployment. 306

N.5.8.       Link an External Grid into XSEDE Grid.. 306

O.      References. 308

P.       Document History.. 309

 

 


 

List of Figures

Figure 1. Installer Welcome Dialog. 24

Figure 2. Installation Location Dialog. 24

Figure 3. Installation Type Dialog. 25

Figure 4. Active Copying of Files. 25

Figure 5. Concluding Dialog After Installation.. 26

Figure 6. Installation Type as Container. 28

Figure 7. Container Web-Service Configuration.. 28

Figure 8. Owner Selection Dialog. 29

Figure 9. Grid Keypair Generation Choice. 29

Figure 10. Choosing Password for Generated TLS Keypair. 30

Figure 11. Specifying Existing Keypair and Password.. 30

Figure 12. The client UI with RNS Tree and ACL List. 50

Figure 13. Directory Context Menu.. 51

Figure 14. Job tool basic information tab. 62

Figure 15. Job tool data staging tab. 62

Figure 16. Job tool resources tab. 63

Figure 17. Launching the queue manager. 65

Figure 18. Queue manager’s job list. 65

Figure 19. Queue manager’s resource tab. 66

Figure 20. Removing a job from the queue. 66

Figure 21. Job history detail window. 66

Figure 22. Setting matching parameters in resources tab. 68

Figure 23. XCG3 viewed in client-ui 69

Figure 24. Credential Management->Login->Standard Grid User. 71

Figure 25. Showing grid credentials using mouse hover. 72

Figure 26. Highlighting a specific credential to logout from... 73

Figure 27. Major User Interface Panels. 74

Figure 28. Drag RNS Resource to Trash.. 75

Figure 29. Drag-and-drop a user to ACL list on a resource. 76

Figure 30. Dragging ACL Entry Into Trash.. 77

Figure 31. Changing UI Shell font and size. 78

Figure 32. Setting UI to show detailed credential information.. 79

Figure 33. Viewing detailed credential information.. 80

Figure 34. Displaying resource information as tree structure. 81

Figure 35. File->Create New File option.. 82

Figure 36. Job tool, creating simple ls job.. 83

Figure 37. Job-tool showing project number/allocation.. 84

Figure 38. Job Tool, Data tab showing Output and Error Files. 85

Figure 39. Jobs->Queue Manager. 86

Figure 40. Displaying Job History.. 87

Figure 41. Job Definition Using Variables. 88

Figure 42. Variable Usage in Output Filename. 89

Figure 43. Defining Job Variable Values. 90

Figure 44. Queue View with Sweep Jobs. 91

Figure 45. Invoking grid shell via Tools->Launch Grid Shell option.. 92

Figure 46. User request form... 161

Figure 47. Example of daily usage graph.. 162

Figure 48 Remote Client Interaction with Container. 267

 


 

 

A.    Welcome to Grid Computing

Welcome to the world of grid computing!  This field combines ideas from high performance computing, web services, networking, and computer security in order to provide the user with a powerful, virtual super-computer for running large workloads across heterogeneous systems and clusters.  Grid computing can offer a uniform interface that glosses over the multifarious details of the distributed computing systems that are brought together to form the grid.  Users can submit their jobs to grid queues that will automatically parcel out the workload to available execution services.  User’s computing jobs can rely on data located around the world, brought together by the uniform filesystem view provided by the grid.

The Genesis II GFFS (Global Federated File System) is the topic of this reference manual.  The GFFS is part of the XSEDE project (http://xsede.org) and provides a globally accessible grid filesystem as well as grid queues that leverage the high performance computing infrastructure of the XSEDE project.

 


B.    Document Overview

B.1.         Intended Readers

The main body of the document consists of the following sections:

  1. Installation
    Describes both graphical and command-line versions of installers for both the grid client and container.
  2. Grid Usage
    Surveys the basics of authentication and authorization in the grid, running jobs on compute resources, exporting local file system paths to the GFFS, copying data files into and out of the grid.
  3. Configuration
    Discusses the deployment of the root GFFS container, the deployment of secondary containers, creation of Basic Execution Services to run jobs, creation of grid queues, and establishing campus bridging configurations.
  4. Management
    Covers creating users and groups in the grid, removing users and groups, stopping and restarting containers, backing up containers, and restoring containers from a backup.
  5. Development
    Provides links to the Genesis II code repository, describes the command-line build process, and debugging Genesis II source with the Eclipse IDE.
  6. Testing
    Discusses how to create a small bootstrapped grid for testing, how to run the GFFS test scripts in the GFFS Toolkit, and what results to expect from the tests.
  7. Appendices
    Contains a FAQ & troubleshooting guide.  Also provides a detail-oriented reference for extended deployment issues and other configuration considerations.

This document is intended for the following classes of users (also known as personas):

1.       XSEDE System Administrators

2.       Scientific Users

3.       Campus Grid Administrators

4.       Grid Testers

5.       XSEDE Developers

Membership in a particular user class does not necessarily limit an individual’s interest in any of the information documented here.  That said, the Installation and Grid Usage chapters will be especially relevant to the Scientific User.  The Configuration and Management chapters will be of more interest to the XSEDE System Administrators and Campus Grid Administrators.  Finally, the Grid Tester and XSEDE Developer personas each have a chapter devoted to their particular viewpoint.

B.2.         Author List

This document is a group effort.  It incorporates text from many contributors who, over an extended period of time, wrote various documents about the Genesis II grid functionality.  These contributors include:

Editor: Chris Koeritz

This omnibus document was originally accumulated and edited for “XSEDE Activity 43 – Genesis II Documentation” during the spring of 2012.  Chris has served as the Omnibus editor for ongoing edits through XSEDE Increment 5.

Special Thanks

Thanks to Jessica Otey for her help in editing this document.

B.3.         Document Notation

In this document, command-lines appear as follows:

·         The command-line font in this document will be 9 point monospace bold

·         The dollar sign followed by a name specifies an environment variable: $VAR

·         Curly brackets indicate a parameter that must be filled in: {parmX}

·         Pound signs (#, aka octothorpes) indicate comments that accompany code

B.4.         Common Environment Variables Referenced in the Document

A few environment variables are used consistently in this document.

B.4.1.             HOME

The HOME variable is expected to already exist in the user environment; this points at the home folder for the current user on Linux and Mac OS X.  On Windows, the home directory is composed of two variables instead: ${HOMEDRIVE}${HOMEPATH}

B.4.2.             TMP

The TMP variable should point at a location where temporary files can be stored.  If TMP is not set, the tool and test scripts will default to /tmp on Linux and Mac OS X.

B.4.3.             JAVA_HOME

The JAVA_HOME variable is used to specify the top-level of the Java JDK or JRE.  This variable is not widely used in the Genesis II software but may be used in a few specific scripts.  If the “java” executable is found on the application path, then JAVA_HOME is not usually needed.

B.4.4.             GENII_INSTALL_DIR

The GENII_INSTALL_DIR variable is a Genesis II specific variable that points at the top folder of the Genesis II software installation.  This variable is not needed by the Genesis II Java software, although it may be relied on by some scripts and is used extensively in this document.

B.4.5.             GFFS_TOOLKIT_ROOT

The GFFS_TOOLKIT_ROOT variable points at the top-level of the GFFS tool and test scripts within the Genesis II installation package.  It is also not needed by the Java software of Genesis II, but will be relied on heavily within the provided tool and test scripts.  It is established automatically by the set_gffs_vars script (described below in section B.4.7).

B.4.6.             GENII_USER_DIR

The GENII_USER_DIR variable points at the path where client and container state are stored.  This is also referred to as the “state directory”.  This variable is used within the Genesis II Java software and by many of the tool and test scripts.  The variable is optional in general and will default to “$HOME/.genesisII-2.0”.  However, if a Genesis II client or container is intended to use a different state directory than the default, then the variable must be defined before the client or container software is started.  It is recommended that any non-default value for the variable be set in the user’s script startup file (such as $HOME/.bashrc) to avoid confusion about the intended state directory.

For users on NFS (Network File System), it is very important that container state directories (aka GENII_USER_DIR) are not stored in an NFS mounted folder.  Corruption of the container state can result if this caution is disregarded.  To avoid the risk of corruption, the GENII_USER_DIR variable can be set to a directory location that is on a local hard disk.

B.4.7.             The “grid” Command

Throughout the document, we will often reference the “grid” command from Genesis II.  It is shown as just “grid” in example commands, which assumes that the grid command is in the PATH variable.  The path can be automatically updated for Genesis II GFFS by running a script included with the install called “set_gffs_vars”.  For example, this loads the important Genesis II variables into the current bash environment:

source /opt/genesis2-xsede/set_gffs_vars

The above command assumes an XSEDE production grid installation of the Gensis II GFFS RPM file; other installations may have a different install path.  The above command loads GENII_INSTALL_DIR, GFFS_TOOLKIT_ROOT, and other important variables as well as putting the Genesis II grid command into the PATH.  This statement can be added to .bashrc for automatic execution in each bash shell if desired.  There are many other methods for getting the grid command into the path, including Environment Module files, or even just adding the environment variables manually.

To add the GFFS grid command manually, one can set the value of $GENII_INSTALL_DIR and add it into the PATH variable:

export GENII_INSTALL_DIR=/opt/genesis2-xcg
export PATH=$PATH:$GENII_INSTALL_DIR

This command assumes an RPM install for the Cross-Campus Grid (XCG) at the University of Virginia.

B.5.         Glossary

These terms will be used throughout the document.

ACL                       Access Control List

A security feature that specifies a set of rights for particular users.  Any object stored in the GFFS has three ACLs (one each for read, write, and execute permissions).  Each ACL can have zero or more rights in the set.

BES                       Basic Execution Services

The component that offers computational resources to a grid.  A BES can accept jobs, run them on some resource, and then provide the job's results.

CRUD                   Create Read Update Delete

An acronym for the four most common file operations.

EMS                      Execution Management Services

The general category for grid computation services.  This is implemented by the grid's available BES components, which can all be of different types.

EPI                        EndPoint Identifier

A short unique pointer (across time and space) to an EPR (see below).  EPIs provide for a simple identity comparison, such that if object A has an identical EPI to object B, then they are in fact the same object.

EPR                       EndPoint Reference

A pointer to a web-service, including network location (such as URL), security policies, and other facts needed for a client to connect to the service.

Export                               

To “export” a file system directory structure is to make it available (subject to access control) to other users in the grid. One exports a local rooted directory tree, e.g., sourceDir and maps it into a target directory in the GFFS directory space, e.g., /home/Alice/project1/soureDir. The files and directories in “sourceDir” are still accessible using local mechanisms and are also accessible via the grid.

Export Implementation

The realization of the export functionality in the Genesis II container, as implemented in source code.

Export Owner

The local user at the SP or campus who owns the data being exported.

Export Owner User ID

The Unix user ID (also called ‘account name’) of the export owner.

FUSE mount    File system in User SpacE

FUSE is a file system driver for Linux and MacOS that allows users to define and write their own user space (non-kernel) file system drivers. Genesis II has a grid-aware FUSE driver that maps the GFFS into the users local file system using a FUSE mount.

Genesis II          The Genesis System version 2

A grid computing project developed at the University of Virginia.  Genesis II provides the GFFS component for XSEDE.

Genesis II GFFS Container

The Genesis II GFFS implements a “Web Services” container architecture. The container is the process running the Genesis II source code in Java with which clients interact. The container receives requests to operate on exported data, and the container‘s export implementation carries out those requests (subject to authorization).

GFFS                     Global Federated File System

The filesystem that can link together heterogenous computing resources, authentication and authorization services, and data resources in a unified hierarchical structure.

GFFS Container User ID

The container currently executes with a non-privileged Unix user id. For example, a normal Unix account named ‘gffs’ might be used to run the GFFS container. In the remainder of the document, the Unix user that is running the container will be referred to as “GffsUser”.

GORM                  Genesis II Omnibus Reference Manual

This reference manual.

GIU                        Grid Interface Unit

A Grid Interface Unit (GIU) is the hardware component on which the Genesis II container runs. The required elements of the GIU are defined in the XSEDE Architecture Level 3 Decomposition document (L3D) in section 8.1.2.

IDP                        IDentity Provider

A service that can create or authenticate user identities.

L3D                       XSEDE Architecture Level 3 Decomposition document.

PBS                       Portable Batch System

A queuing service for job processing on computer clusters.  PBS queues can be linked to Genesis II grid queues.

PKCS#12           Public Key Cryptography Standard Number 12

A file format for storing key-pairs and certificates with password protection.

PKI                        Public Key Infrastructure

The general category of all services that rely on asymmetric encryption where a key owner has two parts to their key: the public part that can be shared with other users, and the private part that only the owner should have access to.  Using the public key, people can send the owner encrypted documents that only he can decrypt.  The owner can also create documents using his private key that only the public key can decrypt, offering some proof of the document's origin.  With this one essential feature of enabling communication without giving away private keys (unlike symmetric encryption algorithms), a number of important authentication schemes have been developed (such as SSL, SSH, TLS, etc).

Principle of least privilege

The term was first coined by Jerome Saltzer, “Every program and every privileged user of the system should operate using the least amount of privilege necessary to complete the job.” Saltzer, Jerome H. (1974). What this means here is that software that does not need root should not have it. And that if it does need it, then it should have it for the least amount of time in the most encapsulated way (sometimes called privilege bracketing.)

RNS                       Resource Namespace Service

A web services protocol that provides a directory service for managing EPRs.

Root Squash

Because of the way many early distributed file systems handled trust and authentication, processes running as root on file system client hosts may not actually have root privilege with respect to a network mounted file system. When root squash is in effect, the network file server squashes, or ignores, requests that arrive from clients asserting root privileges. This is done to prevent compromised clients from attacking the file system with root privilege. Note that root squash can be selectively applied by adding exceptions in /etc/exports.

SSH                       Secure SHell

A terminal emulation program that allows users to connect to remote computers while providing an encrypted communication channel that keeps their passwords, command history, and so forth private.

SSL                        Secure Socket Layer

A protocol for connecting to a web service or web site using encrypted transmissions.  This protocol is considered deprecated now in favor of TLS.

STS                       Secure Token Service

The STS offers a method for a user to authenticate against a known service in order to log in to the grid.  Configuring an STS is usually a task for the grid administrator.

Sudo privilege

In Unix there are two important privilege levels, user and root. Root can do anything. Processes with “user” level privilege have very limited capabilities. For example, root can read and write any file, change user id to any user, change file ownership, etc. Users cannot.  Sometimes though one wants a user level process to have enhanced capability, without having the infinite capability of root. This is consistent with the principle of least privilege. To support restricted (and tempory) extension of privilege a user may be given sudo (pseudo root) privilege to execute certain commands as either root or as another user.

TLS                       Transport Layer Security

A protocol for connecting to a web service or web site using encrypted transmissions.  TLS is the more modern incarnation of SSL.

Trust Store     A set of certificates that are “trusted”

A trust store can be a file (or directory) with one or more certificates that are trusted for a particular purpose.  For example, in Genesis II as of XSEDE Increment 1, there is a trust store in a PFX format file that contains the certificates that a grid client will trust connecting to using TLS.  If a container presents an identity that is not present in the trust store and which is not signed by a certificate in the trust store, then no connection will be made.

UNICORE           UNiform Interface to COmputing REsources

The primary EMS for XSEDE is provided by the UNICORE software, an EU open source grid computing project initially funded by the German Ministry for Education and Research.


 

C.     Introductory Topics

C.1.         Learning the Grid Metaphor

If the reader has not been involved in scientific computing before or would like an overview of the XSEDE GFFS and EMS implemented in Genesis II, this tutorial may be very helpful:

Getting Started with Grids [http://genesis2.virginia.edu/wiki/uploads/Main/gettingstartedall102011.pdf]

C.2.         System Administrator Tutorial

Readers who are new to administrating a grid or who would like an introduction to the system administrator topics in Genesis II may find this tutorial to be a good introduction:

System Administrator Tutorial [http://genesis2.virginia.edu/wiki/uploads/Main/xsedesystemadmin2012.pdf]

C.3.         Public Key Infrastructure (PKI) Tutorial

There are a number of excellent guides that discuss the basic topics of authentication and encryption using modern PKI technologies.  The following is just one example:

Everything You Never Wanted to Know About PKI but Were Forced to Find Out [http://www.cs.auckland.ac.nz/~pgut001/pubs/pkitutorial.pdf]

The Genesis II project relies on TLS for authentication and encryption of all grid communications.  The following provides a basic summary of TLS:

http://en.wikipedia.org/wiki/Transport_Layer_Security

This is in contrast to other security mechanisms, such as the myproxy server that uses proxy certificates for authentication and authorization.  A survey of proxy certificates and related technologies can be found here:

Globus Toolkit Key Security Concepts
[http://www.globus.org/toolkit/docs/4.0/security/key-index.html]

C.4.         Documentation Updates

The official html version of the document is available at the following location:

http://genesis2.virginia.edu/wiki/uploads/Main/GenesisII_omnibus_reference_manual.htm

Other formats are available at:

http://genesis2.virginia.edu/wiki/Main/Documentation

This document is also available via a subversion repository and can be downloaded with the following:

svn co svn://svn.xcg.virginia.edu:9002/XSEDEDOCS/trunk/omnibus

 


D.    Genesis II Installation

Genesis II GFFS is a standards-based web-services application. The Genesis II installers can provide both the client-side and server-side of the software.  The server side of the web service is called the container.  There are interactive installers available for both client and container, and the interactive installer can function with a graphical user interface or in console-mode (text only).  The former is intended for most users, while the latter is intended for users who wish to script the install or who do not have access to graphical capabilities during installation.

Genesis II is also available in RPM and DEB package formats on Linux.  Unlike the interactive installer, these installation packages are installed by the system administrator once per host.  All client and containers configured by users utilize the same installation.

Currently, the container installation is available for 32-bit and 64-bit Linux, and for 32-bit MS-Windows.  Client installers are available for 32-bit and 64-bit Linux, for 64-bit Mac OS X (Intel Platform), and for 32-bit MS-Windows.

The Genesis II GFFS software relies on the Java Runtime Engine (JRE) and officially supports Oracle Java 8 (aka version 1.8).  The interactive installers include a recent JRE version, but the RPM/DEB packages do not provide a Java JRE.

The Genesis II GFFS software is released under the Apache license agreement, which is available at: http://www.apache.org/licenses/LICENSE-2.0

The installers for Linux, Mac OS X and MS-Windows are available at:
http://genesis2.virginia.edu/wiki/Main/Downloads

D.1.        Installer User Considerations

The average user who wishes to use the Genesis II container or client does not need to have administrative access to the computer where the installation will occur.  In general, a user who has a home directory with write access can just run the installer as their own personal identity, and there are no special permissions required for running either the container or the client on one's own computer.

In some institutional or corporate settings, administrators may prefer to install the software at a single location per computer or per network.  This is also supported by the Genesis II installer (in both interactive and Linux package formats).   The container owner needs to perform additional tasks to configure Genesis II, which are documented in the sections below.

A common requirement is for the grid tools to be available in the user’s application path.  One solution for this is a TCL Modules file, and a sample file is provided in the GFFS Toolkit in the folder called “tools/genesis_module”.  There are many other ways to address the path issue, including modifying environment variables (per user or per system).

D.2.        Installing the Grid Client Using the Graphical Installer

This section will walk through the installation process for the Genesis II GFFS client using the interactive installer in its GUI mode.

The GUI-mode installer can be launched by double-clicking the installation executable on Windows and Mac OS X.  On Linux, the installer can be launched with bash:

bash genesis2-gffs-linux64-v2_7_503.bin

This will begin an interactive install process where graphical dialogs are displayed to request configuration input.

01-client-welcome.png

Figure 1. Installer Welcome Dialog

Clicking “Next” leads to picking the installation location.

02-client-installlocat.png

Figure 2. Installation Location Dialog

The next dialog allows one to choose the type of installation, client-only or full GFFS container.  Leave it at the default choice of client-only if you do not need a container installation that will provide GFFS services of your own.

03-client-typeinstall.png

Figure 3. Installation Type Dialog

Note that during an upgrade, the installation type dialog will default to the previously selected choice of client vs. container.

After picking the type of installation to perform, files are copied into place.

04-client-filesinstalling.png

Figure 4. Active Copying of Files

Once the Genesis II software files are stored in the target location, the GFFS software will be used to connect to the configured grid.  If the grid connection does not succeed and an error message is printed, please refer to the FAQ, Section J, for possible solutions.

When the installation is finished, the completion dialog is displayed.

06-client-final.png

Figure 5. Concluding Dialog After Installation

D.3.        Installing the Grid Client at the Command-Line

The console version of the installer is available from the same install program that does the graphical installs.  On Linux, the command-line version requires passing a '-c' flag to the installer at run time:

bash {installer filename} -c

For MS-Windows, run the installer as an exe instead.  This assumes the user is in the same directory as the installer:

genesis2-gffs-linux64-v2_7_503.exe -c

This will begin an interactive install process where prompts are displayed to request configuration input.  The same prompts shown in the graphical install dialogs are shown on the console instead. 

Interactive install process for the grid client

$ bash genesis2-gffs-linux64-v2_7_503.bin -c

Unpacking JRE ...

Preparing JRE ...

Starting Installer ...

This will install Genesis II GFFS on your computer.

OK [o, Enter], Cancel [c]

   (hit enter)

Where should Genesis II GFFS be installed?

[/home/fred/GenesisII]

   (type a different location or use suggested one, then hit enter)

Please Select the Type of Install to Perform

Installing grid deployment for Freds internal grid

This installation can provide GFFS client-only support or it can function as

a GFFS container. Which would you prefer to install?

Client-Only GFFS [1, Enter], GFFS Client and Container [2]

(type 1 for client install or just hit enter)

Extracting files ...

   (…filenames flash by…)

Connecting to the Grid
   (slight pause occurs while connecting…)

Setup has finished installing Genesis II GFFS on your computer.

Finishing installation...

If it is important to automatically script the grid client installer, one technique that can help is called a 'here document'.  The here document, denoted by the << below, answers the installer prompts using a list of canned responses.  The word ‘eof’ below is used to end the stream of commands:

bash genesis2-gffs-linux64-v2_7_503.bin -c <<eof

o

/home/fred/GenesisII

1

eof

More information about here documents can be found at http://www.tldp.org/LDP/abs/html/here-docs.html.

D.4.        Installing a Grid Container Using the Graphical Installer

This section will walk through the installation process of the Genesis II GFFS container using the interactive installer in its GUI mode.

The first four dialogs are roughly the same as the client installer.  Beginning at the third dialog, the installation diverges.  On the installation type dialog, choose a GFFS container install type.

03a-container-type.png

Figure 6. Installation Type as Container

Once the “Next” button is clicked, the files will be installed similarly to the client installer.

After the files are in place, the container installation prompts for items specific to the container configuration.  The next dialog requests to know the web services configuration for the container.

04-container-port-host.png

Figure 7. Container Web-Service Configuration

The port number is where the container will reside on the current host.  This port number should not be in use by any other service, and it must not be blocked by a firewall.  The hostname should be a publically visible DNS name (or IP address, although DNS names are preferred).  This host must be reachable using that name from potentially anywhere in the world, or the container will not be able to be linked into a grid.

After the web service configuration, the installer will attempt to connect to the configured grid.  Once this completes, the container specific configuration continues with a dialog requesting to know which grid user will own the container.

06-container-owner.png

Figure 8. Owner Selection Dialog

The user specified must be an existing user in the GFFS grid in question (the location of which is packaged in the installer).  If you do not currently have a valid grid user, you will need to request one that can own your container.

The grid user in question will completely “own” the container and will be given full administrative rights.  This allows the user to add, configure and remove resources on this container.  The grid user can also link the container into the grid’s RNS hierarchy in locations where the user has appropriate access rights.

After a valid grid user is provided, the installation offers to generate certificates for the container or to let the user provide her own certificate.  This certificate is used for TLS (SSL) communication by the container; all outgoing and incoming web service calls use this certificate for identification and all encryption is done with the associated private key.

07-container-generate.png

Figure 9. Grid Keypair Generation Choice

If the keypair generating service is used, as depicted, then a certificate for TLS communication is automatically generated.  In that case, the next dialog requests to know the password for the generated TLS certificate.

09-container-gen-keypass.png

Figure 10. Choosing Password for Generated TLS Keypair

The default password for the TLS keystore is ‘container’, but this can be changed as desired.  After the TLS keystore and certificate are generated, the installer finishes with the final dialog.

Alternately, if one chooses not to use the keypair generator, one must supply a TLS keypair in PFX format that can be used for the communication.  The keypair dialog prompts for the keypair and supports browsing for it on the local computer.

11-container-keyentry.png

Figure 11. Specifying Existing Keypair and Password

Once an appropriate PFX file has been provided with the proper password, the installation continues to the final dialog.

D.4.1.             OpenSSL Conversion

The following commands may be helpful for converting between the PEM and PKCS#12 formats.

To convert a certificate from DER format to PEM format:

openssl x509 -inform der -in certificate.cer -out certificate.pem

To convert certificate from PEM format to DER format:

openssl x509 -outform der -in certificate.pem -out certificate.cer

To convert keypair from PKCS#12 format to PEM format:

openssl pkcs12 -nodes -in keypair.pfx -out keypair.pem

To convert PEM format certificate and private key to PKCS#12 format:

openssl pkcs12 -export -out keypair.pfx -inkey private.key -in certificate.pem

If the CA certificate is not in the certificate.pem file, then add this flag: -certfile CA.pem

 

D.5.        Installing a Grid Container at the Command-Line

The console mode installation process for the container is very similar to the client install.  There are just a few more questions to answer than for the client, all regarding the container configuration. 

Interactive container install in console mode

(Installation is shown after container install type is selected and files have been installed…)

Port Number

By default the XCG container listens for incoming messages on TCP port 18443. You can override this behavior here.

XCG Container Port Number

[18443]

(select a port number and hit Enter)

Specify the hostname or IP address where your container will run.
Hostnames should be globally resolvable via DNS.

Host Name

[]

(type in the world-reachable DNS host name for the computer and hit Enter)

Connecting to the Grid

Owner Information

Please select a user to manage the container

User Name

[]

(enter an XSEDE portal id or other grid user name here)

This service will generate and sign your container keypair with your supplied credentials

Use Grid Keypair Generating Service?

Yes [y], No [n, Enter]

      (choose and hit enter.  Remainder assumes keypair was not generated.)

Select path for container keypair (.pfx) to be used for this container (will be copied)

Keypair Path

[]

(enter the path to a key-pair to use as the TLS key for the container)

Keystore Password

[]

(enter the key-pair and keystore password for the pfx file; these must both be the same password to use the pfx with the GFFS installer.)

Start Container Service?

Yes [y], No [n, Enter]

(hit Y and then Enter to start the container)

Configuring Container

Preparing GFFSContainer Script

Starting Container Service

Setup has finished installing Genesis II GFFS on your computer.

Finishing installation...

Note that the same approach used for scripting the grid client (a 'here document') can be used to script this install.

D.6.        Automatic Container Start-up

A system administrator can configure a Genesis II container installation to automatically restart when machines are rebooted.

D.6.1.             System-wide Installation Start-up

Genesis II installations provide a sample init.d-style service script in a file called “GFFSContainer”.  This file can be deployed on some Linux systems in /etc/init.d to automatically restart the container.  Once the file is installed, the system administrator must set the script at an appropriate “run level” for starting on reboot.

D.6.2.             Personal Installation Start-up

Users who wish to automatically start their personal containers can do so with a “cron job”.  This method usually does not require administrator assistance.  The Genesis II installation provides a script called GFFSContainer which can be used to restart the service if the computer is restarted or if the service inadvertently stops.  The following is an example cron job that uses the container restart script to launch the container if it is not already running.

Example of a cron job that restarts the GFFS container

# checks every 5 minutes for Genesis II container service and restarts it if missing.
GENII_USER_DIR=$HOME/container_state
*/5 * * * * $HOME/GenesisII/GFFSContainer start

Cron has a different environment than your normal users, and thus it is important to provide the state directory (GENII_USER_DIR) to the cron job.  Otherwise the default state directory ($HOME/.genesisII-2.0) will be used.

D.7.        Installation on Linux Using an RPM or DEB Package

The installation of one of the Linux-based packages is much simpler than the interactive process, but mainly because the configuration steps have been moved out to script-based process.  This is necessary because the RPM and DEB package formats are intended to be installed once per host, and shared between multiple users for the software.  In these package formats, all user and container state must reside in the state directory (unlike the interactive installation, where some of the configuration can reside in the installation directory).

D.7.1.             Installation

To install or upgrade the Linux RPM for the Genesis II GFFS, use sudo (or login as root) to call the rpm installer:

sudo rpm -Uvh genesis2-xsede-2.7.503-1.x86_64.rpm

To install the Linux DEB package for the Genesis II GFFS, use sudo (or login as root) and run the dpkg program:

sudo dpkg -i genesis2-xsede-2.7.503-1.x86_64.deb

Each of these actions will install the Genesis II GFFS software to “/opt/genesis2-xsede” by default when using the XSEDE production grid install package.  Installers for other grids will follow a similar form.  For example, the European Grid (GFFS.EU) which will install to “/opt/genesis2-european” and the XCG installer installs to “/opt/genesis2-xcg”.

D.7.2.             Upgrading RPM or Debian packages

An RPM installation can be upgraded to a newer version via this command:

sudo rpm -Uvh genesis2-xsede-2.7.503-1.x86_64.rpm

Debian installations can be upgraded to a newer version using this command:

sudo dpkg -i genesis2-xsede-2.7.503-1.x86_64.deb

(These are actually both identical to the first time installation commands provided above.  The -U flag for RPMs is technically an upgrade flag, but also works for installing a new package.)

D.7.3.             Installation to Different Location

To install to a different location when using RPMs, add the “prefix” flag to the command:

sudo rpm -Uvh --prefix {new-location} genesis2-xsede-2.7.503-1.rpm

To uninstall the RPM or DEB package, use the appropriate package manager’s removal procedure:

sudo rpm -e genesis2-xsede

or

sudo apt-get remove genesis2-xsede

If needed, the RPM install can be forced to upgrade a package with identical version information despite already being installed:

rpm -Uvh --force genesis2-xsede-2.7.503-1.rpm

The process of configuring container installations and converting older installations is documented in the following sections.  The configuration scripts documented below can also be used with interactive installs (on Linux only), which is especially useful when those are installed by the root user for host-wide usage.

D.7.4.             RPM and DEB Upgrade Caution

Multiple containers can be configured on a host using the system-wide RPM or DEB package for Genesis II.  This poses an issue at upgrade time, since the running containers will become unavailable when the Java jar files and configuration directories are replaced.  The system administrator may want to institute a procedure for alterting users to shut their containers down before the installation and to restart the containers again afterwards.  An alternative is to require users to register their container installations in a way that allows a site-implemented, sudo-based process to automatically stop all of them before the installation and start them again afterwards.  A mechanism for automating this process may be developed in a future release.

D.8.        Unified Configuration for Containers

The interactive installers provide what is termed a “Split Configuration” installation mode, where the container configuration partially resides in the installation folder itself.  In the newer “Unified Configuration” mode, the client-specific and container-specific configuration is stored entirely in the state directory.  This is a more flexible configuration, which can operate based on the RPM/DEB packages as well as on the interactive installer.  The following sections describe the procedures used for managing containers with the Unified Configuration.

In general, the Unified Configuration is the most useful on Linux when the RPM or DEB package is installed.  However, these same approaches can be used directly on Mac OS X also.  On MS-Windows, using a Linux compatibility such as Cygwin is required (see section I.1.3 for more information on Cygwin).

In all of the Unified Configuration scripts documented below, the environment variables GENII_INSTALL_DIR and GENII_USER_DIR must be set.  The former variable specifies the install location (such as /opt/genesis2-xsede), and the latter specifies the state directory for the container (such as $HOME/.genesisII-2.0).  The install directory and the state directory do not need to exist before running the installer, but the two environment variables must be established.

D.8.1.             Creating a New Container That Uses a Unified Configuration

To configure a new container, first install a version of the Genesis II GFFS that provides the Unified Configuration (2.7.500+).  Run the configuration script with no parameters to get full help instructions:

bash $GENII_INSTALL_DIR/scripts/configure_container.sh

The instructions provided by the script should be complete, if a bit terse.  This section will explain some finer points of the required parameters, but the script’s built-in help should be consulted as the most authoritative and up to date reference.

There are potentially six parameters for the script, and it requires at least five of these.  They are:

1.       The container host name.  This is the globally visible name at which the new container can be reached over the internet.  It is alright for this to be an IP address, although a textual host name is preferred.

2.       The network port number where the container service provides TLS connection services.  This port number must not be blocked by a firewall.  The GFFS container must also be the only user of this port number.

3.       An existing grid user who will be given complete control over the new container.  This should be your grid user name, and it can be an XSEDE-style MyProxy/Kerberos user or a GFFS X509 style user.  If you do not have a user name yet, contact your grid administrator to acquire one.  In some rare cases, the grid administrator may provide a special user name that will own your container, rather than your own grid user.

4.       The keypair file in PFX format (that is, PKCS#12) which holds the TLS certificate and key-pair that the container will use to communicate over the internet.  This can either be an already existing PFX file or it can be the word “generate”, which will cause a new TLS keypair to be created by the grid’s certificate generator (where available).

5.       The password on the keystore itself.  This password is used to unlock the keystore and get at the keypair inside it.  If parameter 4 was “generate”, then this password will be used to secure the newly generated keystore.

6.       A password for the TLS key within the keystore.  This is optional and will only be necessary when the keystore and key password differ.  This parameter is not used for the “generate” keystore option.

Note that the script will produce diagnostic output during configuration which can include passwords, so it may be wise to run “clear” or “cls” in that terminal afterwards.

After running the configuration script with the appropriate parameters, the container’s configuration will be built in the GENII_USER_DIR directory.  The script prints out a command that will start the container running.  For example, the new container might be started up using the default RPM package location:

/opt/genesis2-xsede/GFFSContainer start

After launching the container, its output can be watched with the “tail” command (assuming the default logging location):

tail -f $HOME/.GenesisII/container.log

If that shows no errors, then the container is now configured and could be linked into the grid provided by the installer (see Section F.3.1 for more details about linking the container).

D.8.2.             Converting a Container in Split Configuration Mode to a Unified Configuration

Users may want to free themselves from the Split Configuration mode after they have previously configured a container with the interactive installer.  Typically, this will involve installing an RPM or Deb package to provide the new installation.  The existing container can be converted into the Unified Configuration mode with a provided script, which will acquire configuration items from the interactive installation (which must still exist at conversion time).  To see the built-in help for the conversion script, run the following:

bash $GENII_INSTALL_DIR/scripts/convert_container.sh

This will show the required parameters and some example execution sequences.  This script is considerably simpler than the configure script (last section), as all of the configuration information should already exist and just needs to be extracted from the old installation directory.

It is important to back up the container state before the conversion process, in order to defend against any unexpected problems during the conversion.  Both the installation directory (pointed to by the GENII_INSTALL_DIR variable) and the state directory (specified by the GENII_USER_DIR environment variable or residing in the default location of $HOME/.genesisII-2.0) should be archived.  For example, this will create an archive of both directories, assuming the environment variables are set:

tar -czf container_backup.tar.gz  $GENII_INSTALL_DIR  $GENII_USER_DIR

The most common way to run the container conversion script is to migrate an old interactive installation to using the RPM/DEB package format.  It is important to fix the GENII_INSTALL_DIR to point at the newer install location before running the convert script, e.g.:

export GENII_INSTALL_DIR=/opt/genesis2-xsede

bash $GENII_INSTALL_DIR/scripts/convert_container.sh $HOME/GenesisII

The script will produce diagnostic output during the conversion which can include passwords, so it may be prudent to run “clear” or “cls” in that terminal afterwards.

It is possible to convert to a Unified Configuration even if there is only one installation of the newer interactive installer (e.g., if the old installation was upgraded in place).  In this situation, pass the current $GENII_INSTALL_DIR as the parameter to the script.

After the conversion script has run successfully, the container’s configuration will be unified under the state directory.  The older interactive installation can be removed, and the container will rely on the new package location for the GFFS software.

During the execution of the script, you will be offered a chance to create a copy of your deployment folder from the old installation.  This is only necessary if you have manually modified the deployment, or if the deployment is non-standard.  This is true for upgrading a source-based container to use an RPM, which is further documented in section D.8.7.

D.8.3.             Changing Container’s Installation Source for Unified Configuration

After converting a container to the Unified Configuration, it is sometimes necessary to adapt to changes in the installation location.  This may occur is if the container was initially converted from an older interactive install to the newer interactive install, but then later the RPM install is used instead.  The install also might need to change locations due to organizational or hardware changes.

In these cases where there is no other configuration change required for the container, the location can be fixed with the “update_install_location” script.  Running the script prints out the built-in help:

bash $GENII_INSTALL_DIR/scripts/update_install_location.sh

This is a very simple script.  The GENII_INSTALL_DIR should point at the new install location, and the older location is passed on the command line.  Below is an example of switching to the RPM package as the new installation source, after having previously relied on the interactive installation to support the container’s Unified Configuration.

# if the old installation is still active, stop that container…
$GENII_INSTALL_DIR/GFFSContainer stop

# update the installation directory variable to the new path…
export GENII_INSTALL_DIR=/opt/genesis2-xsede

# fix the install paths…
bash $GENII_INSTALL_DIR/scripts/update_install_location.sh $HOME/GenesisII

Again, this is only appropriate for switching the location of a container that already has the Unified Configuration (see prior section for information about converting to the Unified Configuration).

D.8.4.             Updating Unified Configuration Container after Upgrade

A GFFS deployment provides the information needed to connect to a grid, such as the grid location on the internet and the associated certificates for that grid.  Occasionally some characteristics of the grid deployment are updated, and these are pushed out in a new deployment package or in a new installer.  For containers with a Split Configuration mode that are set up by interactive installers, this usually poses no problem, as the installer can update the deployment when the new version is installed.  But containers with a Unified Configuration are more independent from the installation directory and are not automatically updated to the latest deployment.  This is a consequence of the RPM/DEB installation model, where the root user installs the package, but many other users can base their container on the installed package.  These types of containers require a deployment update in order to use the latest grid deployment.

If you have just updated your Genesis II installation by using a new Linux package or installer (on any supported platform), then it is important to update your container’s state directory by following the steps below.  As always with the unified configuration model, it is crucial that the GENII_USER_DIR variable is set to the container state directory before managing the container.  Given the appropriate GENII_USER_DIR, the “deployment updater” script does not take any parameters on the command line, and it can be started using these commands:

source /opt/genesis2-xsede/set_gffs_vars  # assuming default path for XSEDE grid.
bash $GENII_INSTALL_DIR/scripts/update_deployment.sh

The script will automatically find the deployment information in the installation directory and update the container state directory in GENII_USER_DIR to reflect the latest deployment information from the system-wide installation.

If one is using a specialized deployment, then the current deployments folder can be pointed at by the “$GENII_DEPLOYMENT_DIR” variable.  If that variable is not set, then the deployments folder falls back to the default of “$GENII_INSTALL_DIR/deployments”.  The use of a GENII_DEPLOYMENT_DIR variable is uncommon but useful if one’s deployments are not located under the GFFS installation directory.

D.8.5.             Using a Grid Deployment Override

There are two methods for using a different deployment than the deployment provided by the Genesis II install package.

The first method is to set the variable GENII_DEPLOYMENT_DIR in the environment before starting the container.  This causes the container to use that folder as the root of the deployments hierarchy, rather than the default of $GENII_INSTALL_DIR/deployments.

The second method is to store the specialized deployment hierarchy in a folder called “deployments” under the container’s state directory (in $GENII_USER_DIR).  If the container finds a folder named “deployments” in its state directory at start-up, then it will use that one instead of the one stored in the installation directory.

The order of precedence for finding the deployment folder is first to check the GENII_DEPLOYMENT_DIR variable, then to look for “deployments” in the container state directory (GENII_USER_DIR), and finally to look for deployments under the GENII_INSTALL_DIR.

D.8.6.             Unified Configuration Structure

The unified configuration mode for the installer provides a method for overriding values that were previously always provided by the installed deployment.  This allows all of a container’s unique information to be managed in the container’s own state directory.

The unified configuration adds these files and directories to the state directory:

installation.properties

certs/

webapps/

wrapper/

deployments/   (optional)

D.8.6.1.                       installation.properties file

The installation.properties file provides override values for configuration properties that are otherwise provided by the “configuration” directory of a deployment.  This includes the files “security.properties”, “server-config.xml” and “web-container.properties”.  The following is an example of a real “installation.properties” file for a container that relies on the installed deployment:

gffs-sts.kerberos.keytab.TERAGRID.ORG=KHANDROMA.CS.VIRGINIA.EDU@TERAGRID.ORG.gffs-sts.keytab

gffs-sts.kerberos.principal.TERAGRID.ORG=gffs-sts/KHANDROMA.CS.VIRGINIA.EDU@TERAGRID.ORG

edu.virginia.vcgr.genii.container.external-hostname-override=surya.gruntose.blurgh

edu.virginia.vcgr.genii.container.listen-port=18080

edu.virginia.vcgr.genii.container.security.ssl.key-password=**

edu.virginia.vcgr.genii.container.security.ssl.key-store-password=**

edu.virginia.vcgr.genii.container.security.resource-identity.key-password=**

edu.virginia.vcgr.genii.container.security.resource-identity.key-store-password=**

edu.virginia.vcgr.genii.container.security.resource-identity.container-alias=signing-cert

edu.virginia.vcgr.genii.container.security.certs-dir=/home/fred/.surya_grid_state_dir/certs

edu.virginia.vcgr.genii.container.security.ssl.key-store=tls-cert.pfx

edu.virginia.vcgr.genii.container.security.ssl.key-store-type=PKCS12

edu.virginia.vcgr.genii.container.security.resource-identity.key-store=signing-cert.pfx

edu.virginia.vcgr.genii.container.security.resource-identity.key-store-type=PKCS12

edu.virginia.vcgr.genii.gridInitCommand="local:/opt/genesis2-xsede/deployments/surya-grid/surya_context.xml" "surya-grid"

edu.virginia.vcgr.genii.container.deployment-name=surya-grid

Note that there will be significantly fewer fields if the container installation carries its own “deployments” folder in the state directory.  In that case, the security properties come from the deployments folder rather than the installation.properties file.

As the above shows, the installation.properties is formatted as a java property file, and provides “name=value” definitions of variables.  Each of the above entries corresponds to a setting that would otherwise have come from the deployment’s configuration files.

Generally this file should not be hand-edited, but that is always an option if additional overrides are needed or if values must be corrected to adapt to changes.

D.8.6.2.                       certs directory

This directory is used to store container specific certificates and Kerberos keytab files for authentication and authorization.  It has a structure mirroring the “security” folder from the installed deployment, and thus can contain a “default-owners” and a “trusted-certificates” directory.

The container configuration and conversion scripts automatically store the container’s certificate files in PFX format in this directory when using the unified configuration mode.

D.8.6.3.                       webapps directory

This directory supports the Apache Axis web services software and provides a storage place for temporary files.

D.8.6.4.                       wrapper directory

Used by the Java Service Wrapper for the container’s service management.  This provides the wrapper configuration file in “wrapper.conf”.  It also is the location where the service wrapper will track the container’s active process id in “GFFS.pid”.

D.8.6.5.                       deployments directory

If a directory called deployments is found in the state directory, and there is no GENII_DEPLOYMENT_DIR environment variable established, then this folder is used as the deployments folder, rather than the default of $GENII_INSTALL_DIR/deployments.  The convert_container script offers to create this directory (as a copy of the previous installation’s deployments folder) during conversion.

D.8.7.             Converting a Source-Based Container to a Unified Configuration

Converting a container that is built from Genesis II source code is a special case of the conversion process in Section D.8.2.  This usually only applies to the bootstrap container for a grid, or to experimental containers used by developers.  For these cases, the conversion script should perform the proper actions, but there are a few important choices to make during this process.

To convert the source-based container, follow the steps described above in Section D.8.2 to convert the source folder from “split configuration” to “unified configuration”, but with the following additions:

1.       If the source-based container is still running when executing the convert_container script, then the script will show text regarding “There are still Java processes running…”  If the script finds any of these processes, then answer “Y” to the question of whether to shut them down.  This will only stop Java processes that are detected as running Genesis II containers or clients.  Care should be taken if the same user account is running more than one Genesis II container; in that case, stop the source-based container manually.

2.       When the convert_container script asks whether to copy a specialized “deployments” folder, tell it to do so by answering “Y”.  This is crucial for a root container's specialized deployment to be preserved and is also needed in cases when the deployment generator was used to create the deployments folder.

Both of these choices can be automated by using optional flags to the convert_container script, as in the following script execution example (replace the path for {genesis2-trunk} with your container’s source code location):

# Switch installation dir variable, for example to xsede install location:
export GENII_INSTALL_DIR=/opt/genesis2-xsede
# Perform the conversion:
bash $GENII_INSTALL_DIR/scripts/convert_container.sh {genesis2-trunk} \
  stop+depcopy

The “stop” phrase will cause any Genesis II Java processes to be stopped.  The “depcopy” phrase causes the deployments folder to be copied from the installation directory into the container state directory.

After the conversion is successful, the source code should no longer be needed to run the container, and it can be removed.

E.     Grid Usage Topics

This section describes how to get computational work done with a grid based on Genesis II GFFS software (such as the XSEDE and XCG grids).  It is assumed that the grid is already configured, and that the user has already been issued a grid user account by the grid administrator.

E.1. Grid Basics

This section provides some helpful basic commands to get started.  More extensive details will be provided in future sections.

E.1.1.               Built-in Help

Genesis II has built-in help available for most commands.  The command grid help prints a list of the available commands. Additionally, each individual grid command has a short help description for usage and also a longer man-page style description.

Running the grid command requires having previously loaded the appropriate Genesis II environment variables, as per section B.4.7.  The required command looks like this (using the bash shell on any supported platform):

source /opt/genesis2-xsede/set_gffs_vars

After the grid command is available in the path, the built in help can be accessed:

# print a list of the available commands.
grid help  

# show usage information for a command.
grid help {command}             

# show the manual page for a command.
grid man {command}

E.1.2.              Essential Commands

The list below shows the commands that are used frequently with Genesis II.  Note that most grid commands will require that the user has already logged into the grid (covered in the next section).

grid whoami
     # Prints out the user’s current grid credentials.
grid pwd
     # Shows the grid client’s current working directory.
grid ls X
     # Lists the grid path X, e.g. “grid ls /” will show the contents of the
     # root directory of the grid.
grid ui
     # Starts the grid client User Interface (requires X11 or other java-supported
     # windowing environment).
grid cat X
     # Prints the contents of file X on the console.
grid rm X
     # Removes the file (or empty directory) X.
grid rm -r X
     # Removes the directory X recursively, even if not empty.
grid ping X
     # Tests the accessibility and liveness of grid asset X (a service, file
     # directory, or other type).

E.2.Authentication and Authorization

In the grid, a user's capabilities are based on who they are and what they've been granted permission to do.  Authentication is the process that a user goes through to show who they are, at least in terms of an identity that the grid will accept.  This proof is limited; the user has merely presented a certificate or a valid login that the grid recognizes.  It is not proof that the user actually is a particular person; it just proves that she possesses the credentials associated with that person.

On the other hand, authorization is the full set of capabilities that specify what a particular identity is allowed to do.  In the case of the GFFS, the user's authorization is specified by access control lists on resources that the user has the right to use in some particular manner.  For example, the user may have authorization to submit a compute job to a particular queue.

The following sections detail the processes of grid authentication and grid resource authorization.

E.2.1.              Credentials Wallet

Genesis II uses what is termed a “credentials wallet” to store user identity for grid operations.  The wallet contains all the identities that a user has “authenticated” with the grid using a supported protocol, such as by providing a username and password, or by logging into a Kerberos domain.

Users may require a collection of identities for their work, rather than just one.  For example, the user may have allocations at a Supercomputing Center as well as having a local campus identity.  The credentials wallet allows the user to present all of her valid identities with a single login.

E.2.1.1.                        Who Are You?

A grid client instance that is not connected to a grid container initially has no identity at all.  As part of making the secure connection to a grid container, the client creates a self-signed certificate to represent its own identity.  Upon attempting to connect to a grid container, the grid client examines the identity of the container and compares it with the client's own “trust store”.  The trust store is a set of server certificates that the grid administrator has instructed the client to “trust”.  Trust here just means that the client will connect to containers that identify themselves via one of those certificates, and it will not connect to any containers that are not in the trust store.  More details about the trust store are available in the section on GFFS Deployments.

# show the initial certificate on a client that has never
# authenticated as a user before.
grid whoami

When the client has no previously cached identity, this command shows just the certificate that the grid client created to represent its side of the secure TLS connection.  This is an example of the “whoami” output for a client in that state.

Client Tool Identity:
(CONNECTION) "Client Cert 90C75E64-D5F9-DCC2-A11F-584339FD425F"

Once the client has decided to trust the container (and possibly, based on configuration, the container has made a similar decision to trust the client), the secure TLS connection is made and services can be requested by the grid client.  The first of the requested services is generally a login request, because the client must authenticate as an identity of some sort to obtain any authorization for grid resources.  Different methods for logging in are discussed in the next section.

E.2.2.              How to Login & Logout

Genesis II supports a variety of authentication mechanisms, including username & password, Kerberos, MyProxy, InCommon, and direct use of a key-pair.  Each of these methods may be appropriate for a different reason.  Thanks to the credentials wallet, the user does not need to pick just one approach, but can attain whatever collection of identities that are needed to get the work done.

E.2.2.1.                        Logging Out of the Grid

Although it may seem counter-intuitive to log out before having logged in, this can be done and is not a null operation; logging out always clears at least the self-signed client certificate.  If the user had previously authenticated to any grid identities, those identities are dropped as well.

# logout of all identities.
grid logout --all

It is possible to log out of just one identity by specifying its “alias” on the command-line.  Identities each have a unique alias name, and the alias is shown in the whoami listing. 

For example:

Example of grid whoami result

Client Tool Identity:
(CONNECTION) "Client Cert 90C75E64-D5F9-DCC2-A11F-584339FD425F"
 Additional Credentials:
(USER) "drake" -> "Client Cert 90C75E64-D5F9-DCC2-A11F-584339FD425F"
(GROUP) "uva-idp-group" -> "Client Cert 90C75E64-D5F9-DCC2-A11F-584339FD425F"
(USER) "skynet" -> "Client Cert 90C75E64-D5F9-DCC2-A11F-584339FD425F"

 

This alias can then be used log the identity out:

# log out of a user identity.
grid logout --pattern=skynet

# or log out of the group.
grid logout --pattern=uva-idp-group

E.2.2.2.                        Login with Grid IDP

The grid's identity provider (IDP) supports standard username and password authentication for users to log in to the grid.  The username and password in question must already have been set up by the grid administrator.  To log in with a grid user identity, use:

grid login --username={drake}

In a graphical environment, this will pop up a dialog for filling in the password.  In a console environment, there will be a prompt asking for the password at the command shell.

Note that the password can be included in the login command if it is absolutely required.  This may be needed for scripting a grid login, but it is not generally recommended because the password will be visible in script files or in command history:

grid login --username={drake} --password={myPass}

E.2.2.3.                        Login With Kerberos

For users to log in using a Kerberos STS, the STS must already have been created according to the instructions in the section “Using a Kerberos STS”.  Once the Kerberos STS exists, users can log in with the following command:

grid login rns:{/containers/containerPath}/Services/KerbAuthnPortType/ {userName}

This will bring up a password dialog for the userName specified.

E.2.2.4.                        Login From a Keystore

In some cases, user identity may need to come from a key-pair stored in a file.  This is often the case when a user needs to authenticate as a grid administrator.  It is also possible that a key-pair will be issued by a resource owner to control access to the resource.  In order to obtain authorization on that resource, merely being logged in as a known grid user would not suffice and the user must add the key-pair credentials to the wallet.

To authenticate using a keystore file (such as a PKCS#12 format PFX file):

# using a keystore on a local disk.
grid keystoreLogin local:{/path/to/keyFile.pfx }
 

# or using a keystore in the grid.
grid keystoreLogin grid:{/home/drake/keyFile.pfx }

E.2.2.5.                        Login Using xsedeLogin

The xsedeLogin command is a special purpose login for users of the XSEDE grid.  It authenticates to the XSEDE Kerberos server and the XSEDE MyProxy server in order to obtain both types of identities for grid services.  It is very similar to the simple login command

# log in to the grid.
grid xsedeLogin –username={drake}

If login is successful, the “whoami” listing may appear as follows:

Client Tool Identity:
(CONNECTION) "Drake Valusic"
Additional Credentials:
(USER) "drake" -> "Drake Valusic"

In the case of the XSEDE-style login, there is no self-signed certificate for the client.  The client's identity is instead dependent on the Kerberos authentication using a real XSEDE portal ID for login.

E.2.2.6.                        Logging in with InCommon

The iclogin command uses the Enhanced Client or Proxy protocol (ECP) to authenticate to an InCommon identity provider (IDP), and then use that authentication to acquire grid credentials. Any of the previous STS types may be the target of an InCommon login, as long as it has been set up according to the section “Setting up an InCommon STS” (Section G.1.11).

Once the InCommon STS link exists, users can log in with the following command:

grid iclogin

There are five parameters to log in using InCommon:

1.       The URL of the IDP's ECP service endpoint,

2.       The user id and

3.       The password for the user at that identity provider, and

4.       (optional) An SSL public/private keypair and

5.       (optional) An associated SSL certificate signing request (CSR).

In a graphical environment, dialogs will be displayed to retrieve these parameters. In a console environment, the user will be prompted in the command shell. Alternatively, all of these parameters, or any subset may be specified at the command line, such as follows:

Using a keypair file on a local disk

grid iclogin --idp={https://url/of/IDP/endpoint} --username={drake} \
  --password={myPass} --key=local:{/path/to/local/key/file} \
  --csr=local:{/path/to/local/CSR/file}

Using a keypair file in the grid

grid iclogin --idp={https://url/of/IDP/endpoint} --username={drake} \
  --password={myPass} --key=grid:{/path/to/grid/key/file} \
  --csr=grid:{/path/to/grid/CSR/file}

If the user does not wish to specify an existing SSL keypair, a new keypair and CSR will be generated by the client. If the user does specify a keypair file, he may also choose to provide a CSR as well or have one generated which contains the provided public key.

The iclogin tool uses the InCommon authentication service at CILogon.org to generate an authentication request for the provided or generated CSR, forwards the request to the selected IDP with the provided credentials for a signed assertion of identity, and then returns the assertion to CILogon.org to retrieve a X.509 certificate. As in the xsedeLogin, the self-signed session certificate is discarded, and the certificate from CILogon.org becomes the current client session certificate. Finally, the iclogin tool contacts the STS corresponding to the InCommon credentials provided to acquire additional grid identity certificates, which are delegated to the CILogon.org session certificate.

E.2.3.              Grid Access Control Lists (ACLs)

Upon authentication, the user may perform all actions she is “authorized” to perform.  In Genesis II, authorization is implemented using a technique called Access Control Lists.  Every resource in the Genesis II GFFS has three access control lists which are called Read, Write, and Execute ACLs.  Each type of ACL can have from zero to an arbitrary number of grid identities listed.  This associates the decision-making information about whether a resource is accessible onto the resource itself, rather than associating it with a user or a group (as might be done in a capability model rather than an ACL model).

There are a few generally applicable attributes for the Read, Write and Execute ACLs, but specific resources can vary how these ACLs are interpreted.  In general though, Read access grants a user identity the right to see a resource.  Without Read access, the user cannot even list the contents of that resource in the GFFS.

Generally speaking, Write access often is considered to grant administrative access to the resource.  For example, a queue that lists a user X in its Write ACL is granting user X the right to completely control the queue, even to the extent of removing queued jobs of other users or changing the properties of the queue.

The general interpretation of the Execute ACL is to make a resource available to a user for whatever primary purpose the resource provides.  For example, a user with Execute access on a queue is allowed to submit jobs to it, and to cancel her own jobs.  That user cannot however manage the jobs of other users or change the attributes of the queue.

E.2.3.1.                        How to See ACLs in the Grid Client

Genesis II provides two ways to display the ACL lists for a resource: the console grid client and the graphical client UI.  The graphical client provides a summary of the permissions for user ids, whereas the console client displays the full authorization data (including the EPIs that uniquely describe user identities in the ACLs).

You can show the authorization information for any resource in the GFFS using the grid authz command.

grid authz {/path/to/resource}

 

Example listing of just the Read ACL for a grid path

  Read-authorized trust certificates:

    [0] (X509Identity) "CN=EnhancedRNSPortType, SERIALNUMBER=urn:ws-naming:epi:41A37E3B-8E0A-0502-9DDA-BCA21C8E0008, OU=Genesis II, O=GENIITEST, L=Charlottesville, ST=Virginia, C=US"  [06/04/12 11:05:45, 06/05/13 11:05:45]

    [1] (X509Identity) "CN=X509AuthnPortType, CN=admin, SERIALNUMBER=urn:ws-naming:epi:B0D0624B-9939-9A8E-4682-52A416657D88, OU=Genesis II, O=GENIITEST, L=Charlottesville, ST=Virginia, C=US"  [06/04/12 11:04:28, 06/05/13 11:04:28]

    [2] (X509Identity) "CN=X509AuthnPortType, CN=drake, SERIALNUMBER=urn:ws-naming:epi:2A9784BC-2DF8-42D0-2C34-00CE2857B9D9, OU=Genesis II, O=GENIITEST, L=Charlottesville, ST=Virginia, C=US"  [06/04/12 11:05:52, 06/05/13 11:05:52]

    [3] (X509Identity) "CN=X509AuthnPortType, CN=uva-idp-group, SERIALNUMBER=urn:ws-naming:epi:5CF49A70-88F8-C08B-2DAF-ED0029C8D2F5, OU=Genesis II, O=GENIITEST, L=Charlottesville, ST=Virginia, C=US"  [06/04/12 11:04:00, 06/05/13 11:04:00]

    [4] EVERYONE

Note that this particular resource allows “everyone” to read it.  This is often the case for top-level GFFS folders and other assets that are part of the “grid commons” available to all users.  Also of interest are the EPIs (listed after urn:ws-naming:epi: that uniquely specify a particular grid identity.

To use the client-ui for viewing ACLs, launch the client (grid client-ui) and navigate to the file or directory of interest in the RNS Tree.  Once an item has been selected (by left-clicking with the mouse), the ACL pane on the right will show the Read, Write and Execute permissions for that resource.

E.2.3.2.                        Meaning of Read, Write, and/or Execute Permissions

The interpretation of the Read ACL is constant within the grid for all resources.  It always specifies visibility of the resource to a particular user.

The Write and Execute permissions can however be interpreted differently by different resources.  This section provides a summary of what those permissions mean for the different types.

E.2.3.2.1.         ByteIO Files and RNS Directories

For ByteIO files and RNS directories in the GFFS, the write permission simply indicates that a user can change the contents of the file or directory.  The execute permission is not really used internally for files and directories, but could be set for use within FUSE mounts (to make a grid file executable when mounted on a Linux filesystem).

E.2.3.2.2.         Queue Resources

Having write permission on queue resources indicates that the user is an administrator of that queue.  Having execute permission gives the user the ability to submit jobs to the queue.

E.2.3.2.3.         BES Resources

Having write permission on BES resources indicates that the user is an administrator of the BES.  Having execute permission gives the user the ability to directly submit jobs to the BES.  Queues also need execute permission on the BES before they can successfully submit jobs to it.

E.2.3.2.4.         IDP Resources

Having write permission on an IDP or other STS object in the GFFS indicates that the user is an administrator of that particular entry (but not necessarily of the server providing security services).  Having execute permission enables a user to behave as “a member” of an IDP, which is especially relevant for users being members of groups.

E.3.Data Files

Data files that feed into computational results are an integral component of any grid computing software.  Genesis II provides a variety of methods for specifying the locations of data files.  Most jobs can rely on stage-in and stage-out files that are available via the GFFS.  This section describes a number of methods for loading data into, and retrieving data from, the GFFS.

E.3.1.              Copying Data Into and Out of the GFFS

The need to access data files arises when a user's job needs input files for computation and when the job produces output files.  There are three main approaches for copying resources in and out of the GFFS: using the command-line grid client, using the graphical grid client, and using a FUSE mounted filesystem.

E.3.1.1.                        Copying Data Files Using the Console Grid Client

Similar to cp in the UNIX operating system, the grid’s cp command can copy the contents of multiple source files and directories to a target location.  The source files can be any mix of local and grid locations.  The target must be a directory, unless the source is a single file to copy to another location.

# copy a file from the local filesystem.
grid cp local:/home/drake/File1.txt grid:/home/drake/File2.txt

# copy a grid file to a local file.
grid cp grid:/home/drake/File2.txt local:/home/drake/File1.txt

# copy a folder from the local filesystem to the grid.
grid cp –r local:/home/drake/myDir grid:/home/drake/newPlace

# copy a folder from the grid to the local filesystem.
grid cp –r grid:/home/drake/myDir local:/home/drake/newPlace

 

Note that many commands, such as cp, assume the “grid:” prefix if is not provided.  For local paths, the “local:” prefix (or the synonym of “file:”) must be used.

E.3.1.2.                        Copying Data Files Using the GUI Client

The grid client-ui tool has recently been updated for a variety of methods of copying data files, including drag&drop functionality.  These may be helpful for users more familiar with graphical user interfaces.

To copy files into the grid with the client-ui, first start the GUI:

grid client-ui

 

When the graphical client is running, a window similar to the one below is displayed.  The window shows a view of the grid filesystem (labeled as RNS Space) and a view of the ACLs for the object currently focused in the tree.

A description...

Figure 12. The client UI with RNS Tree and ACL List

E.3.1.2.1.         Drag&Drop Files Into the Grid

The client-ui supports dragging and dropping files into the grid using the standard file browser application for the user’s operating system.  On Windows, Windows Explorer (explorer.exe) is the recommended browser, and on the Mac, the Finder is recommended.  For Linux, the Nautilus or Konqueror applications can be used for file browsing.

Once the file browser has been opened, one performs drag and drop copying by dragging the file or directory of interest out of the file browser and into the grid tree (in the RNS Space tab of the client-ui) at the desired location.  A progress dialog will open and show as the files and directories are copied.

E.3.1.2.2.         Drag&Drop Files out of the Grid

The grid client-ui can also copy files to the operating system's file browser via drag&drop.  In this case, the user drags the file or directory of interest from the RNS tree view in the client-ui into the desired folder in the file browser.

There is an important caveat for dragging files out of the grid.  Drag&drop defines that the drop may only occur when all the files to be dropped are available locally.  In the case of the grid’s client-ui, making the files available locally involves copying them to a temporary location in the local filesystem.  Once copied, the files can be dropped into the desired location.

This impacts the behavior for drag and drop significantly.  The user must wait until the icon changes to the operating system’s “drop okay” icon before letting go of the mouse.  If the contents to be dropped are sizeable, then the copy process can take quite a while, and the user must hold the mouse button down that entire time.  In the case of larger transfers, it is recommended to use the “Save To” technique from the next section instead of drag&drop.

E.3.1.2.3.         Copying Files out of the Grid with “Save To”

Due to the potential for large data files to cause unacceptable delays in a drag&drop operation, the grid client provides another method to copy files and directories in and out of the grid.  This feature is used by right-clicking on a grid path (e.g. a directory) that is to be copied and selecting either the “Copy to Local File System From GFFS” or the “Copy From Local File System to GFFS” option. The former will open a directory browser for the local file system. The user selects the target location and hits “save”. When copying to the GFFS a GFFS directory browser is opened and the user selects the target location in GFFS. When the target location is selected, a dialog opens and shows the copy operation’s progress.

The advantage of this feature is that the contents do not need to be copied locally before the operation can be started, unlike drag&drop.  The user simply selects where the data files should be saved, and the client-ui manages the copying process after that point.

Directory Operations

When a directory is highlighted, the follow options are available from the drop-down Directory Menu:

copy_options_menu.png

Figure 13. Directory Context Menu

E.3.1.3.                        Copying Data Files Using a FUSE Mount

FUSE is a method for mounting the grid filesystem onto a local path, so that a portion of the grid namespace is available on the user's computer.  This enables the user to copy data to and from the mounted grid directory as if it were present in the local filesystem.

Creating a FUSE mount is detailed in the next section.  But using a FUSE mounted GFFS to copy data files is very simple.  Assuming the grid has been mounted at /home/drake/gridfs, the following will copy a directory tree in or out of the grid:

# copy a directory hierarchy up into the grid.
cp -r {/a/directory/tree/} {/home/drake/gridfs/home/drake/newDir}

Note that when the gridfs is mounted at the root folder of the grid, the extra /home/drake path is necessary to get down to the user's home directory.

# copy a hierarchy down from the grid to local filesystem.
cp -r {/home/drake/gridfs/home/drake/toCopy} {/local/path/for/directory}

 

Note that the commands above use just cp and not grid cp, because in these cases the operating system’s native copy command is used.

E.3.2.              Exporting Local Filesystems to the Grid

The GFFS provides a feature called “exports” for sharing data into the grid.  Exports allow data to reside on one’s own machine, but be shared with other users and used as staging data for job processing.  This may be very helpful for large data sets, where one does not want to make a secondary copy of the data; the original data can be served on demand within the grid.

A simple export command to share a path under one’s local home folder might resemble this:

grid export --create \
    /resources/xsede.org/mason.iu.xsede.org/containers/mason-gffs \
    local:/home/xd-fred/myData  grid:/home/xsede.org/fred/mason-data

In the above, my local path on the Mason machine is “/home/xd-fred/myData”.  This folder will show up in the GFFS grid at the path “/home/xsede.org/fred/mason-data”.  This is relying on a container that is already established at Mason and which is linked in the grid at “/resources/xsede.org/mason.iu.xsede.org/containers/mason-gffs”.

The GFFS exports feature is supported by two different web services with varying properties and is a fairly large topic.  The exports feature is covered in detail in Appendix M.

E.3.3.              How to Mount the GFFS via a FUSE Filesystem

Genesis2 provides a technique for mounting a portion of the grid namespace onto a local computer.  This relies on the FUSE subsystem, which allows user-space drivers to manage filesystems, rather than needing the kernel to manage the filesystem.  FUSE enables the user to copy files in and out of the mounted directory as if it were simply another directory in the local filesystem.

To fuse mount the top level of the GFFS onto a local path:

grid fuse --mount local:{/local/path} &

This makes the root folder of the GFFS available as the local path specified.

To fuse mount a specific folder in the GFFS locally, use the “sandbox” flag.

grid fuse --mount --sandbox={/path/in/grid} local:{/local/path} &

The “--sandbox=X” portion of the command specifies where the fuse mount should be rooted in the GFFS RNS tree.

After the fuse mount is created, the user can copy files using the /local/path.  Most file and directory operations provided by the operating system can be used on the contents of the path.

To unmount the fuse mounted directory:

# Unmount using the grid command.
grid fuse --unmount local:{/local/path}

# This alternative OS-level command can be used on Linux.
fusermount -u /local/path

Note that on Centos, it was required to execute this command to make fusermount available to all users:

sudo chmod 4755 /usr/bin/fusermount

E.3.3.1.                        How FUSE Mounts Are Different From Unix Filesystems

The FUSE mounted grid filesystem does not behave exactly like a standard Unix filesystem.  It does support most standard operations (copying files & directories, deleting them, and so forth), but there are a few caveats described in the next sections.

E.3.3.1.1.         No Replacements

One important distinction is that the Genesis II FUSE filesystem does not currently support overwriting a directory with a move (mv) operation.  Due to the GFFS representation of files and directories as EPRs, the meaning of substituting out an RNS folder in that way is not well defined.  Genesis II requires that a directory can only be moved onto a target location in a FUSE mount if that location does not already exist.  This may require some special treatment in scripts using FUSE such that the existing directory is deleted before a directory with the same name is moved into that location.

E.3.3.1.2.         No Links

The standard Unix filesystem feature of symbolic links does not operate as expected inside of FUSE mounts.  This is due to the basic difference in mechanisms providing the filesystem between the Unix local filesystems and the mounted grid filesystem.  Links do exist in the grid, but they are an entirely different creature from the filesystem symbolic links.

Due to that implementation difference, making a link from the FUSE client side between grid asset A and grid asset B will not work. Linking local asset A into grid asset B also will not work, because the grid still does not interpret a symbolic link properly in the FUSE mount. But it is possible, however, to link from grid asset A in a FUSE mount into a local filesystem asset B. Asset B will remain usable as long as the FUSE filesystem is mounted.

E.3.3.1.3.         Permission Differences

Another important distinction between Genesis II FUSE filesystems and the standard Unix filesystem is that not all permission attributes are used.  In the standard filesystem, permission attributes are usually structured as User/Group/Other triples of Read/Write/eXecute ACL settings (e.g. rwx|rwx|rwx for user|group|other).  These control what the user owning the file can do to it, what other members of the file's group can do to it, and what the general populace can do to the file.

In Genesis II FUSE, the “group” RWX is not used at all.  The group portion of ls listings will always show up as '---' for the group portion.  This is due to the different interpretation in Genesis II of groups versus the Unix interpretation.  Group access control is managed uniformly with user access control in Genesis II.

The “other” portion of the permissions is also slightly different.  Genesis II uses the other permissions to describe the rights for “everyone” on the file, so that is quite similar to the Unix interpretation.  But Genesis II only allows the permissions to be changed if the user who mounted the grid with FUSE has write permissions on the file, whereas merely being the file's owner enables changing permissions in Unix file systems.  Because of this difference, users should never take away their write permissions on their own files and directories in FUSE mounts, or they lose the ability to give write permissions back again.

E.3.3.2.                        Operating System Dependencies for FUSE

The FUSE file system for the Genesis II GFFS is only available on Linux operating systems.

If FUSE is not provided by the Linux distribution as a default, these steps may be needed to install it:

For Centos:

sudo yum install fuse fuse-libs
sudo chmod a+rx /bin/fusermount

For Debian/Ubuntu:

sudo apt-get install fuse fuse-utils gvfs-fuse libfuse2

E.3.4.              Other Staging Methods for Data Files

Many compute jobs can rely directly on the GFFS for staging data files.  However, there are cases where the data must remain at the original location rather than being copied to or exported from the GFFS.  For these cases, the grid’s job-tool application supports other stage-in and stage-out server types.  These types include using web servers, ftp servers, and ssh-based servers (with either scp or sftp protocol) for staging in data files.  These types also support data file stage-out except for web servers, which only support data file stage-in operations.

More information about creating JSDL files is available in the section on Submitting Jobs.

E.4.Grid Commands

The Genesis II software offers a number of methods for issuing commands to the grid.  One method is to run the grid client program (called “grid”) and enter commands manually or via a script.  Another method to issue commands is to write an XScript file with grid commands in an XML format.

E.4.1.              Grid Command Set

There are quite a few commands available to users in the grid client.  A list of the available commands can be printed by issuing the command grid help.  Many of the commands will be familiar to Unix and Linux users, but some are very specific to the Genesis II grid.

A detailed reference for the Genesis II command set is available at the Genesis II wiki at http://genesis2.virginia.edu/wiki/uploads/Main/Gridcommandusage.pdf.

E.4.2.              Grid Paths: Local vs. RNS

Before discussing the various ways commands may be executed through the Genesis II client interface, it is important to understand the distinction between local resources and grid resources.  The grid client can perform many analogous commands on grid resources (like ByteIO and RNS services) and local resources (files and directories).  For example, the cat command, which is used to output the contents of a ByteIO resource, can also output the contents of a local file.  Similarly, using the ls command on an RNS service will list the RNS entries contained by that service, while that same ls command used on a local directory will list that directory's contents.

Distinguishing between grid and local resources is accomplished by prefacing the path of the resource with a prefix to denote its location.

E.4.2.1.                        Local Resources

For resources on the local system, preface the path (in the local file system) with local: or file:, as in the following example:

ls local:/home/localuser

This will cause the ls tool to list the contents of the directory /home/localuser on the local file system.  The prefixes local: and file: are interchangeable; that is, they have the same semantic meaning, and users may use either or both according to preference.

E.4.2.2.                        Grid Resources

For resources in the grid namespace (the GFFS), preface the RNS path with grid: or rns:, as in the following example:

ls grid:/home/griduser

This will cause the ls tool to list the contents of the RNS entry /home/griduser in the grid namespace. As with the local equivalents, the prefixes grid: and rns: are interchangeable; that is, they have the same semantic meaning, and users may use either or both according to preference.

E.4.2.3.                        Combining Local and Grid Prefixes

Some commands available to the grid client require multiple arguments, and in such cases it may be useful to mix grid and local resource prefixes. For example, suppose the user wishes to copy a file example.txt from the local file system into the grid, creating a new ByteIO resource with the contents of that file.  The cp command can be invoked for this purpose as follows:

cp local:/home/localuser/example.txt grid:/home/griduser/example-grid.txt

This will instruct cp to copy the contents of /home/localuser/example.txt on the local file system into a grid ByteIO resource named example-grid.txt listed in the RNS resource /home/griduser.  The semantics of the command will adjust to reflect the locations of the source and destination provided.

Note that the default is the grid namespace, i.e., /home and rns:/home are equivalent.

E.4.3.              Scripting the Grid Client

One of the features of the grid client is the ability to invoke the client to execute a single grid command and then exit without further user interaction.  For example, from the local command line, the user may enter

grid ls /home/griduser

This will start the grid client, execute the command ls /home/griduser, and then print the results of the command to the screen and return to the local command line prompt.  If the command requires user interaction, the standard input and output streams will work in the standard way; this means that the standard input can be redirected to a file using the local operating system's existing semantics.

This feature is particularly helpful for performing multiple non-interactive commands in succession through scripting on the local command line.  The user may write useful scripts, which can invoke commands on both the local system and in the grid, in whatever scripting dialect is already available.  Take the following example, written for Linux's bash:

#!/bin/bash

# example.sh: scripting local and grid commands

echo "This is a local command"

for I in {1..5}; do

     str="This is grid command number $I"

     grid echo "$str"

done

echo "End of script"

In the example script, the local operating system is instructed to print a message, then to loop over the values 1 to 5, assigned to the variable I.  For each of these loop iterations, a string variable str is composed, and a grid command to echo the contents of that variable is invoked.  Finally, the local echo command is used to signal the end of the script.

In this fashion, command-line scripting may be employed to create arbitrarily complex series of commands, mixing local and grid commands as needed.

E.4.4.              XScript Command Files

The XScript scripting language is an XML-based scripting language developed by the Virginia Center for Grid Research (then the Global Bio Grid research group) at the University of Virginia for use with Genesis II.  Originally the language was designed to support only minimal capabilities – enough to get the project started until something better could be developed – but it has since grown into a more sophisticated and fully featured language in its own right.  Today, the XScript language supports many of the language features that are expected from a real programming language, including loops, conditionals, and exceptions.

XScript is used to script commands from within the grid client, as opposed to the previous section which discussed running scripts that repeatedly invoked the grid client to execute commands.  This section will provide an overview of the features and use of XScript; a complete documentation of XScript is available in the Documentation section of the Genesis II wiki (which is available at http://genesis2.virginia.edu/wiki/Main/XScriptLanguage Reference).

E.4.4.1.                        XScript High-level Description

In XScript, every XML element (other than the root document element) represents a single language statement. These statements may or may not themselves contain other statements depending on the element type in question. For the most part, those statements which can support inner statements are the language feature elements such as conditionals and loops, while those that cannot generally represent simple statement types like echoes, grid commands, and sleep statements.

In XScript, every XML elements falls into one of two categories. The first category is for language elements and uses the first namespace shown in the figure below, abbreviated as gsh. The second category is for Genesis II grid commands and uses the second namespace shown in the figure, abbreviated as geniix. We will use the first of these, gsh, as the default namespace for all XML in this section and thus assume that the root element of all XScript scripts looks like the following:

 <?xml version="1.0" encoding="UTF-8" standalone="yes"?>

 <gsh:script

        xmlns:gsh="http://vcgr.cs.virginia.edu/genii/xsh/script"

        xmlns:geniix="http://vcgr.cs.virginia.edu/genii/xsh/grid"

        xmlns="http://vcgr.cs.virginia.edu/genii/xsh/script">

     ...

 </gsh:script>

E.4.4.2.                        XScript Language Elements

XScript has been designed to include most of the control flow structures used in modern programming languages. There are also command elements common to many scripting languages, such as “echo” and “sleep”. The following is a list of the basic control elements and commands available in XScript. Note that this list is subject to change as the language matures or additional features are added.

For usage of a specific element, or the particular semantics of its use, see the external documentation on the Genesis II wiki.

·         Echo – Prints to the terminal

·         Define – Defines a new variable

·         Sleep – Pause script execution

·         Exit – Terminate the script

·         Param – Indicate parameters to grid command elements

·         Comparisons (Equals, Matches, Compare) – Operators for comparing variables

·         Conditionals (And, Or, Xor, Not, IsTrue, IsFalse) – Operators for manipulating Boolean variables

·         If, Then, Else – Conditional execution

·         Switch, Case, Default – Choose from a set of values

·         For, Foreach – Loop over an index variable or a set of values

·         Throw, Try, Catch, Finally – Exception handling statements

·         Function, Call, Return – Defining and using functions/subroutines within the script

·         Parallel-job, Parallel – Allow parallel execution of collections of statements

E.4.4.3.                        Grid Command Elements

The simplest form of statement in an XScript script is a grid command. Grid commands are identified by belonging to the geniix namespace. Any time an XML elements exists in this namespace, the XScript engine attempts to find a grid command with the same name as the element's local name. If it finds such a command, the statement is assumed to represent that command, otherwise an exception is thrown. Parameters (command-line arguments to the grid command) are indicated with XScript param elements. Below we show example grid commands in the XScript language for the grid commands ls (list the contents of a RNS directory) and cp (copy files/resources to another location).

 ...

 <geniix:ls/>

 <geniix:cp>

        <param>--local-src</param>

        <param>/etc/passwd</param>

        <param>/home/passwd</param>

 </geniix:cp>

 ...

E.4.4.4.                        XScript Variables/Macros

Every attribute value and text content node of an XScript script can include a reference to a variable. If included, the value of this variable will be inserted at run time as a macro replacement. Further, variables are scoped by their statement level. This makes it possible to write scripts that contain multiple variables of the same name without additional variable definitions interfering with outer definitions.

Variables in XScript documents are indicated by surrounding the variable name with ${ and }. Thus, to indicate the value of the NAME variable, the string ${NAME} should appear anywhere that text was expected (such as for an attribute value or as the text content of an appropriate XScript statement).

Arrays are also supported in the XScript language, though at the time of the writing of this document, only for accessing parameters passed in either to the script itself, or to functions. The length of an array in XScript is indicated with the ${ARRAY_VARIABLE} expression syntax, while the elements inside of the array are indicated with the ${ARRAY_VARIABLE[INDEX]} syntax. Thus, to echo all elements of the ARGUMENTS array, the following XScript code can be used:

 ...

 <for param-name=”i” exclusive-limit=”${ARGUMENTS}”>

        <echo message=”Argument ${i} is ${ARGUMENTS[${i}]}.”/>

 </for>

 ...

Arguments passed in to the script as well as those passed in to functions are contained in the ARGV array variable (for command-line arguments passed in to the script, the first element is the name of the script file itself).

E.4.4.5.                        An Example XScript

Below is a complete example XScript script. The functionality of the script is trivial, but the file is syntactically correct, and provides a concrete example of some of the concepts discussed previously in this section. The script takes a single argument from the command line, which it compares to a set of switch cases, and then executes a different grid command based on that input (along with a few echo statements for good measure). Note the if test at the offset to determine if a command-line argument was provided. We will call this example file example.xml.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<gsh:script

        xmlns:gsh="http://vcgr.cs.virginia.edu/genii/xsh/script"

        xmlns:geniix="http://vcgr.cs.virginia.edu/genii/xsh/grid"

        xmlns="http://vcgr.cs.virginia.edu/genii/xsh/script">

    <condition property="NOARGS">

         <compare numeric="true" arg1="${ARGV}" arg2="2" comparison="lt"/>

    </condition>

    <if test="NOARGS">

        <then>

            <echo message="You must include at least one argument"/>

            <exit exitcode="1"/>

        </then>

    </if>

    <echo message="Starting the script"/>

    <for param-name="i" exclusive-limit="${ARGV}">

        <echo message="Argument ${i} is ${ARGV[${i}]}."/>

    </for>

    <switch value="${ARGV[1]}">

        <case pattern="who">

            <geniix:whoami/>

        </case>

        <case pattern="where">

            <geniix:pwd/>

        </case>

        <case pattern="jobs">

            <geniix:qstat>

                <param>/queues/grid-queue</param>

            </geniix:qstat>

        </case>

        <default>

            <echo message="What do you want to know?"/>

        </default>

    </switch>

    <echo message="Script complete"/>

</gsh:script>

E.4.4.6.                        Running XScript Scripts

Before we describe how to execute a script, a word about Genesis II's script handling is in order. Genesis II supports multiple scripting languages through the use of the Java Scripting API. In order to differentiate between the various scripting languages, Genesis II uses filename extensions to determine the correct language to use when running scripts. Thus, to run a JavaScript script, the filename must end in the .js extension. Similarly, to run an XScript script file, the filename must end with the .xml filename extension.

To execute a script within the Genesis II client, use the script command, passing in the path to the script and any parameters to the script. For example, if the example script above were located at the RNS path /home/griduser/example.xml, the following command would launch the script with an input parameter of who:

grid script /home/griduser/example.xml who

E.5.Submitting Jobs

The main point of any grid software is to provide a means for processing computational jobs on the compute resources that are available in the grid.  This is true for Genesis II also; many features are provided for creating jobs in JSDL, sending them to a grid queue or BES, and managing the jobs while queued.  This section discusses the basics of creating a job and submitting it for processing.

E.5.1.              How to Create a JSDL file

The purpose of a JSDL file is to specify a compute job in terms of the executable that the job should run, the resources that it will consume in terms of memory and CPU, and any special requirements for processor type or other attributes.  The JSDL specification requires that the file be stored in XML format with particular elements and attributes for specifying job attributes.  This makes it fairly difficult and unpleasant to write JSDL files from scratch.  One common way to generate a new JSDL file is to change an existing well-formed JSDL file to fit the purpose under consideration.

A better way to generate a JSDL file is to use the Genesis II JSDL file creation tool to specify the job's requirements.  This is available as a standalone install called the Grid Job Tool (located at http://genesis2.virginia.edu/wiki/Main/GridJobTool).  This provides versions for most common operating systems.  Alternatively, the job-tool is also provided by the Genesis II client installation, and can be executed this way:

grid job-tool

It can also be executed by right-clicking on an execution service such as a BES or Grid Queue and selecting “create job”.

E.5.2.              Using the Job-Tool

From within the client-ui RNS Tree view, select the directory where the JSDL project file should be located, or select the execution container (BES or queue) where the job should be executed.  Right click on that location and select 'Create Job'.  The tool has provisions to give the job a name and description.  Any arguments that the executable or script needs for running the job can be provided in the first tab (under Basic Job Information).

A description...

Figure 14. Job tool basic information tab.

In the data tab, the data to be staged in/out can be provided (see figure below).  It is worthwhile noting here that data files being staged in&out are usually done via the GFFS, and thus some BESes that do not support the GFFS may need to use other stage-out types than grid: paths (such as data files on a local file system or web server).  These can also be specified in the data tab.

A description...

Figure 15. Job tool data staging tab.

The other major component for the job-tool is the resources tab, where any specific expectations of the job in terms of hardware configurations and preferred operating system can be specified.  This is depicted in the figure below.

A description...

Figure 16. Job tool resources tab.

E.5.3.              Submitting a Job to a Grid Queue

The qsub command is used to submit a new job to a queue for processing.  Although jobs may be submitted to a BES (and bypass a queue), submitting to queues is recommended since it allows better resource allocation and job handling.

# submit a job to the queue, with a job description file.
qsub {/queues/queuePath} local:/path/to/job.jsdl

The qsub command returns a job ticket number after successfully submitting the job.  This ticket number can be later used to query the job, kill it, and so forth.

E.5.4.              Controlling or Canceling Jobs in a Queue

The qkill command allows grid users to terminate any managed job (not already in a final state) that they previously submitted.  To kill one job in the queue, use:

grid qkill {/queues/theQueue} {jobTicket#}

The ticket here is obtained when a job is submitted using anyone of the recommended methods.

The qreschedule command is used to return an already-running job back to the queue and ensures it is not rescheduled on the same BES.  The slot count for this resource must be manually reset later.  This command is useful when the Queue consists of BESes which interface to a queuing system like PBS.  A job may be in the Running state on the grid, but in a Queued state on the back-end PBS.  Such a job can be moved to an alternate BES where it can be executed immediately.  To reschedule a job:

grid qreschedule {/queues/theQueue} {jobTicket#}

Both qkill and qreschedule have variants that allow multiple job tickets to be killed or rescheduled with one command.

E.5.5.              Cleaning Up Finished Jobs

The queue manages all jobs that are submitted to it from the time that they are submitted until the time that they are executed, or have failed, or are cancelled by the user.  Even jobs in the final states of FINISHED, CANCELLED, or FAILED are held onto by the queue until they are cleaned up.  The process of cleaning a no-longer active job out of the queue is called 'completing' the job.  Completing a job performs the garbage collection of removing the job from the queue.

# Removes all jobs that are in a final state
# (i.e., FINISHED, CANCELLED, or FAILED) from the grid queue.
grid qcomplete {/queues/queuePath} --all

# Removes a specific job from the queue, where the ticketNumber
# is the job-identifier provided at queue submission time.
grid qcomplete {/queues/queuePath} {ticketNumber}

E.5.6.              The Queue Manager in Client-UI

After the client-ui has been launched, the “Queue Manager” can be opened to control jobs in the queue or to change the queue's characteristics given sufficient permissions.  The figure below shows the client-ui about to launch the queue manager on a selected queue:

A description...

Figure 17. Launching the queue manager.

After launching, the queue manager window will appear as depicted below:

A description...

Figure 18. Queue manager’s job list.

In the first tab, called Job Manager, the queue manager shows the current set of jobs that are in the selected queue.  The jobs can be in a number of non-final states, such as QUEUED and EXECUTING, or they may be in a final state, such as FINISHED or FAILED.

The second tab of the Queue Manager, called Resource Manager, shows the resources associated with the queue.  The view presents what is known about the BES resource, in terms of the operating system and other parameters.  This tab can only be modified by a user with permissions on the queue, and the “Max Slots” is the only part of the tab that is modifiable.  The number of slots controls how many concurrent jobs the resource is expected to handle, and the queue will allow at most that many jobs onto that particular resource.  An example Resource Manager is shown below:

resource_manager_window.png

Figure 19. Queue manager’s resource tab.

 To control jobs that are in the queue, look at the Job Manager window again.  When a job is selected in that view (with a right-click), a context menu for controlling that specific job is displayed. This is shown in the figure below:

A description...

Figure 20. Removing a job from the queue.

Using the choices available, a user can stop the job with “End Jobs”, clear up finished jobs with “Remove Jobs”, and examine the “Job History” for the job.  Job History brings up the following window with information about the job selected:

A description...

Figure 21. Job history detail window.

This history shows that the job had been added to the queue, and then that it had been submitted to a PBS called “pbs-long-india”.  The job was last seen being processed on that BES.

E.5.7.              Job Submission Point

A user can also submit jobs by copying the jsdl files into the “submission-point” directory under the queue.  This is an extremely simple method for job submission, and the jobs submitted this way still show up in the qstat command. 

grid cp {local:/path/to/job.jsdl} grid:{/queues/queuePath}/submission-point

grid qstat {/queues/queuePath}

 

A sample return using the above method

Ticket                                 Submit Time             Tries   State  
DF2DD56D-B220-FFA8-8D35-589F65E016DE   16:21 EDT 05 Jun 2012   1       QUEUED

E.5.8.              Submitting a Job Directly to a BES

Another method to run a job is to submit the job directly to the BES.  This is a helpful method for testing jsdl files as they are being developed, or when the user is sure that the BES supports the requirements of the job:

grid run --jsdl={local:/home/drake/ls.jsdl} {/bes-containers/besName}

The above command is synchronous and will wait till the job is run.

There is an asynchronous variant that will allow job status notifications to be store into a file in the grid namespace.  Note that this feature is only available for Genesis II BES currently, and is not supported on the UNICORE BES.  The user can check on the status of the job by examining the status file.  This is an example of an asynchronous direct submission to the BES:

grid run --async-name={/path/to/jobName} \
 --jsdl={local:/home/drake/ls.jsdl} \
 {/bes-containers/besName}

In the above, the command returns immediately after submission.  The job’s status is stored in the file specified by the grid path /path/to/jobName. Eventually this file should list the job as FINISHED, FAILED or CANCELLED depending on the circumstances.

E.5.9.              How to Run an MPI Job

To run an MPI job, the JSDL file needs to specify that the job requires MPI and multiple processors.  The job executable file needs to have been compiled with an MPI library (i.e. MPICH, MVAPICH, OpenMPI).

When using the Genesis II JSDL file creation tool, these job requirements can be specified under the “Resource” tab (depicted in the figure below). The “Parallel Environment” field permits selection of the MPI library (i.e. MPICH1, MPICH2) that the executable was compiled with.  The “Number of Processors” field lets the user specify how many total processes are needed to run the job.  The “Process per Host” field lets the user specify how many of these processes should be run per one node.

A description...

Figure 22. Setting matching parameters in resources tab.

If the user manually creates a JSDL file, the JSDL SPMD (single program multiple data) Application Extension must be used to define the requirements of the parallel application in JSDL. Please consult the specification document for details.  The SPMD application schema essentially extends the POSIX application schema with four elements: NumberOfProcesses, ProcessesPerHost, ThreadsPerProcess, and SPMDVariation.  The NumberOfProcesses element specifies the number of instances of the executable that the consuming system must start when starting this parallel application. The ProcessesPerHost element specifies the number of instances of the executable that the consuming system must start per host. The ThreadsPerProcess element specifies the number of threads per process. This element is currently not supported by the Grid Job Tool. The SPMDVariation element defines the type of SPMD application. An example of a parallel invocation using the “MPICH1” MPI environment is provided below.

<jsdl:Application>

...

  <jsdl-spmd:SPMDApplication>

<jsdl-posix:Executable>a.out</jsdl-posix:Executable>

<jsdl-posix:Input>input.dat</jsdl-posix:Input>

<jsdl-posix:Output>output.dat</jsdl-posix:Output>

<jsdl-spmd:NumberOfProcesses>8</jsdl-spmd:NumberOfProcesses>

<jsdl-spmd:SPMDVariation>http://www.ogf.org/jsdl/2007/02/jsdlspmd/

MPICH1</jsdl-spmd:SPMDVariation>

  </jsdl-spmd:SPMDApplication>

...

</jsdl:Application>

 

E.6.Client GUI

The Client UI is windowed application that provides a graphical view of grid RNS space; it is launched by running grid client-ui command.

E.6.1.              Client GUI Basics

Terminology

·         RNS – Resource Namespace Service

·         XCG3 – Cross Campus Grid, Version 3

·         (G)UI – (Graphical) User Interface

·         EPR – Endpoint Reference

·         ACL – Access Control List

·         XML - Extensible Markup Language

·         JSDL – Job Submission Description Language

Figure 23. XCG3 viewed in client-ui

In this document, we will use the XCG3 grid as an example to explain Genesis II client UI features. Refer to Figure 12Figure 23 for an example of the Client UI.

In this window, you will see:

·         7 menus (File, Edit, View, Jobs, Security, Tools and Help) on the top

·         Left Panel named RNS Space with grid name space represented as tree structure

·         Right Panel with resource information tabs (Security, Resource Properties and EPR Display) in it.  Error information text box at the bottom.

·         Recycle bin icon in the right panel (Bottom right)

·         Tear symbol icons in both right and left panels (Top right, looks like torn paper).

·         Username/Password Token Text boxes

·         Pattern Icon and Everyone Icon

·         Credential Management Button

Tabs, Menus and their options will be represented as Tab/Menu->SubTab/SubMenu->option in this document. Reference to grid commands will be in the format grid command.

E.6.2.            Credential Management

A user can manage their grid credentials by clicking on Credential Management button in the client-ui window and selecting appropriate options (Login, Logout or Logout all).  Click on Credential Management->Login->Standard Grid User tab, a separate window will pop up prompting for username, password and grid path. This will log you into grid using your grid credentials (not same as grid xsedeLogin), refer to Figure 24. If you select Credential Management->Login->Local keystore tab, you can login using a keystore file. Select the keystore file (usually .pfx form) from your local file system and enter password for it. You can also login using username/password Token by selecting Credential Management->Login->Username/password tab.

Figure 24. Credential Management->Login->Standard Grid User

 

You can see your current login credentials when you hover the mouse pointer over Credential Management button. Refer to Figure 25.

 

 

Figure 25. Showing grid credentials using mouse hover

You can logout of grid by Selecting Credential-Management->Logout option where you can select which credentials you want to logout as. This is helpful if you have multiple credentials in your credential wallet and you want to logout of specific credential. Refer to Figure 26.

Figure 26. Highlighting a specific credential to logout from

 

Alternately, you can logout of all your grid credentials by selecting Credential-Management->Logout All option.

E.6.3.              Client UI Panels and Menus

E.6.3.1.                        RNS Space (Left Panel)

Here grid name space is presented as tree structure with root of the name space represented by '/' and other sub-directories below it. You can browse the tree by clicking on toggle symbol next to the resource. You can select a resource simply by clicking on the resource. Clicking on a resource highlights the resource and you can see security information in Right panel change accordingly. You can also view Resource Properties and EPR Display of the resource on the Right panel. You should at least have 'Read' permissions on a resource to view its security and other information. If you do not have at least Read permissions you will get an error in error panel below (Ex. No Authorization info for target path: {grid resource name}). Launch grid client-ui, login as grid user and then browse the RNS tree (highlighted using red box in Figure 27) by clicking on the toggle symbol next to root directory '/' (and then descend down expanding toggle symbol). This will expand the tree; you can now browse to your grid home directory or any or any other grid resource that you have permissions on (at least read permissions). You can also minimize the tree (if already expanded) by clicking on the toggle symbol next to the grid resource. If you try to browse a resource without read permissions on that resource, you will get an error message in Error Messages Box (Highlighted using Blue box) in Figure 27.

 

Figure 27. Major User Interface Panels

 

Dragging an RNS Resource to Trash

 

To delete a RNS resource (such as files, directories etc), browse the RNS tree structure and select the object you want to delete. Then still holding the mouse click, drag the mouse to recycle bin and release.  This is depicted below in the figure where the red-circled resource on the left will be dragged to the trash can on the bottom right.

Text Box: Figure 28. Drag RNS Resource to Trash

E.6.3.2.                       Right panel

Here you will find 3 tabs; Security, Resource properties and EPR Display.  This is Highlighted using Green box in Figure 27.

Security tab: This is selected by default when you first open the client-ui. This tab will display read/write/execute ACLs for selected resource. More information on grid ACLs can be found in section E.2.3. If you grant read/write/execute ACLs on a resource and refresh the Client-ui, the new permissions will be seen in respective ACL text box after refresh.

There is also Username/Password Token sub-panel, this is used to issue username/password access on selected resource to users. These users may or may not have a grid account, all they would need is to have Genesis II client installed and username/password information to access that resource (of course if the resource is in the subtree, they should have access to browse to that part of tree structure).

Figure 29. Drag-and-drop a user to ACL list on a resource

Drag and Drop Permission Management

You can give permissions to everyone (grid and non-grid users) on a selected resource by dragging and dropping Everyone icon onto that ACL box ie. read/write/execute text box. You can grant access to individual grid users using two methods, using grid chmod command in grid shell or drag-and-drop method in UI.In the client-ui window select the resource you want to grant permissions on. You should have write (or admin) permissions on that resource to be able to grant R/W/X access to other users. Locate 'Tear' icon on the left panel (tear icon on right top corner, looks like torn piece of paper), left click on it and drag it while you are still clicking on it. This will create another window showing the tree structure, browse to /users directory in the new window and select by left clicking on the username you need. Now drag that username and drop it onto read/write/execute text box in the main client-ui window. In Figure 29, the tear icon, grid resource (hello file) and username (/users/andrew) on new window and write ACL text box are highlighted. You can select the resource in the RNS tree and it should now have the new credentials listed in the corresponding credential text box in the right panel.

Dragging ACLs to trash

 

Browse the RNS tree structure and select the resource (File/Directory) on which you want to modify the ACLs. Then on the right security panel, select the ACLs from read/write/execute and still holding the mouse click, drag the mouse to recycle bin and release.

 

Text Box: Figure 30. Dragging ACL Entry Into Trash

Resource Properties tab: will display detailed resource information about a selected resource. It includes, the resource type, Permissions, Creation Time, Resource Endpoint Reference (Address where the resource is physically located) etc.

EPR Display tab: will only Address, Reference Parameters and Metadata for that resource.

E.6.3.3.                        Menus

Figure 31. Changing UI Shell font and size

File Menu: Drops down to present multiple selection options and most options here are intuitive. Selecting File->Preferences option will open another frame where you can set your client-ui and shell preferences. After every change, make sure you refresh the client-ui by selecting View->Refresh. Some of the options include Font size and style in the client-ui's shell window. File->Quit menu option will quit the client-ui window. Selecting File->Make Directory will create a new directory and File->Create New File will create a new file in grid name space. Select File->Preferences->Shell and select the font style and change the font size (up arrow to increase size and down arrow to decrease size) and click on OK. Launch grid shell (Tools->Launch Grid Shell) and type a grid command (Ex. grid ls). You can see the changes in font style and size in this grid shell window. Refer to Figure 31.

 You can view a resource’s security ACL information at low, medium or high level.  Select File->Preferences->Security->HIGH. Refresh client-ui by selecting View-Refresh (or F5 button on your keyboard). Refer to Figure 32. Select any grid resource that you have at least read permissions on, in the right panel ACL text box you can now see ACL information in more detail.

Figure 32. Setting UI to show detailed credential information

 

If you select low level, your ACL information will just list the users in the ACL box. If you select medium you can see additional information like what type of resource it is and some additional information on user's ACLs. By selecting High Level, you can see more information about ACLs like Identity type, Resource Type, DN etc.  This is shown in Figure 33.

 

Figure 33. Viewing detailed credential information

 Select File->Preferences->Resource History, Set the job's history information level to desired option Trace, Debug, Information, Warning or Error. Select Trace option, this will provide maximum information about the job. If you just want to see errors or warnings only, select those options. This is useful when a user wants to debug his jobs after submitting them to queue resource. Select a queue resource in the grid RNS name space, select Jobs->Queue Manager. Select a job from the jobs list that you submitted, Right click and select Job History option. In the new window Minimum Event Level is set to Trace (or option that you selected earlier in Step 2).

To get XML diplay Select File->Preferences->XML Display, Either select to view grid resource information as flat XML file or as tree structure.  In Figure 34, the Resource Properties are displayed as a tree structure. If you selected File->Preferences->XML Display->Display XML as a tree in step 2 earlier, the  information will be displayed as shown.

Figure 34. Displaying resource information as tree structure

To Create new file and directory, In RNS tree, select a directory where you have write permissions (Ex. /home/joe). Select File->Create New File option, this will pop up a new window and promt to enter file name. After entering the file name (Ex. abc.txt), click OK button. New file should be in the RNS directory you selected (Ex. /home/joe/abc.txt). Refer to Figure 35.

 

 

 

 

 

 

Figure 35. File->Create New File option

 

View Menu: View->Refresh option can be used to refresh the client-ui window. Click on this option after you make changes to the client-ui window (Ex. changing preferences) to reflect the changes in UI or you create/delete/move/copy new files/directories to grid name space and after refresh client-ui window to reflect the changes. You can also refresh a particular resource by highlighting it and then hitting F5 button in your keyboard but this may depend on how F5 button is configured.

Jobs Menu: From this menu, user can create a new job, view the existing job in the queue resource or view saved job history of a job from the queue. The Jobs->Queue Manager and Jobs->Create Job options will be inactive until you select a queue resource from the tree structure in the left panel. Information on how to create job, submit job and check the job status in queue manager can be found in section E.5. You can also create a job using “grid  job-tool”, this will open a new window where you can enter you job information. Most fields in the job-tool are intuitive.

To Create Job Select Jobs->Create Job option. In the new window, create a new job or open an existing one. Submitting a job from this window will submit job to the selected queue resource. Refer to Figure 36 for an example job. Saving new job as project will save the file on your local file system with .gjp file extension.

Figure 36. Job tool, creating simple ls job

 

Figure 37 below shows how you can enter a project number (such as an allocation number you will get on Kraken) or some other xsede-wide project number. In job-tool, click on the 'Job Projects' text box and you will get a pop-up, click on the '+' sign and you will get another pop-up. Enter the project number and click 'ok', click 'ok' again and your project number will be in the main job-tool window. Also if you forget to enter a necessary field, such as executable name or data file, you will get a warning/error in the bottom pane of the job-tool window.

Figure 37. Job-tool showing project number/allocation

In Basic Job Information Tab,  Job Name can be any meaning name you want to give for your job. Executable is the executable file that your job will use to run. This can be a system executable like /bin/ls or shell script or MPI program executable, or any other form of executable that can be run (Others include, Java class file, C/C++ executable, Matlab, Namd etc). Arguments list is the list of arguments your executable may need, here it is '-l' option for /bin/ls (essentially /bin/ls -l). You can add arguments by clicking in the '+' button in the arguments frame. You can also pass environment variables to your job and this can added by clicking in '+' button in Environment frame. If you decide to delete one or more arguments or environment variables after adding them, select that argument/environment variable and click on '-' button in respective frame.

 

Figure 38. Job Tool, Data tab showing Output and Error Files

You can save the job output and error information to files either in grid name space (using grid protocol) or use other protocols (scp/sftp, ftp or mailto) and save them in Data tab of job tool. To save the standard output and standard error from a job, enter the file names in Standard Output and Standard Error text boxes.  Refer to Figure 38.  Then to save these files to grid or other locations, add the files in Output Stages section and select appropriate Transfer Protocol and corresponding Stage URI path. Note, the file name you enter in Standard output and Standard Error text boxes should match Filename area in Output Stages but these names can change in Stage URI area. The '+' and '-' buttons are used for adding or deleting an entry. Similarly you can stage in file needed to execute your program in Input Staging frame.

 

 

Once a queue resource is selected, you can select Jobs->Queue Manager to view the jobs you submitted to queue and manage resources (if you are the owner of those resources). Selecting  Jobs->Queue Manager will open a new window displaying your jobs and resources information. Selecting Jobs->View Job History will open a File browsing frame displaying your machine's local file system (machine where Genesis II client software is running). You should have saved job's history prior to this to be able to select the job history and view it.

To View Jobs, Queue Manager and Job history select the queue resource in the RNS tree and Selects Jobs->Queue Manager, this will open a Queue Manager window showing all the jobs you own on that queue.        A new window listing the jobs you submitted or jobs you have permissions to view. Refer to Figure 39.

 

Figure 39. Jobs->Queue Manager

 

To see job history of a particular job, select a job in the Queue Manager window and right click on it. Select the Job history option and you will get a new window with job history for that job. Here you can select different level of history information from Minimum Event Level menu (Trace, Debug, Information,

Figure 40. Displaying Job History

Error or Warning). This can also be set via File->Preferences->Resource History tab.  Refer to Figure 40.

 

Parameter Sweep Job

 

To create and submit a parameter sweep job, open the job-tool either by clicking on queue resource or by typing job-tool in grid shell. This will bring up the job tool shown as below.

 

 

Text Box: Figure 41. Job Definition Using Variables

 

 

By default, the tab “Grid Job Variables” is disabled. To add a parameter sweep variable, just use ${var_name} ($ sign followed by open curly brace followed by variable name and close curly brace) in any of the following fields in job-tool.

Job Name - e.g. Ls-job-${i}
Executable arguments - e.g. /bin/ls dir-name-${i}
Data Input/Output Stages - e.g. /home/xsede.org/vana/ls-out-${i}.txt

Text Box: Figure 42. Variable Usage in Output Filename
Once you specify ${var_name} in any one of the above locations, 'Grid Job Variables' tab will be activated and you can define your var_name to be either an integer or double or string. You can also specify the starting value, end value and step values (interval) for your variable.

 

Text Box: Figure 43. Defining Job Variable Values

After Submitting the job, the actual values for ${var_name} will be substituted for the var_name in all the places (Job name, arguments, File Name) specified in the job. Here's the screen shot of the queue where job-name is substituted for actual integer values (starting from i=1 to i=20). For above example, output files generated will also be /home/xsede.org/vana/ls-out-1.txt, /home/xsede.org/vana/ls-out-2.txt … /home/xsede.org/vana/ls-out-20.txt.

Text Box: Figure 44. Queue View with Sweep Jobs

 

Tools tab: Selecting Tools->Launch Grid Shell will open a shell window where you can run grid shell commands like ls, cat, cp etc. Refer to Figure 45.  You can also invoke grid shell directly in command line using grid shell. The UI shell interface supports tab completion where as command line shell interface does not support tab completion. More information on grid commands can be found in section E.4.

Figure 45. Invoking grid shell via Tools->Launch Grid Shell option

 To launch Shell and list your home directory Login to the grid using your grid credentials, Launch a grid shell from Tools->Launch grid Shell option. Run grid pwd to make sure you are in your home directory (by default you will be in your home directory after you login to grid). Run grid ls command and this should list all the files/directories in your grid home directory.

E.6.4.              Drag-and-Drop Feature

This method can be used for copying data files using the GUI Client. You can select a particular File or directory in the left panel tree structure to copy out of grid, then simply drag it while still clicking and release the mount button to drop File/Directory on to your local computer's File system. Reverse also works, where you can select a file/directory from your local machine and drop it in to your grid name space. For this you will need appropriate permissions on that grid resource i.e write permissions to copy files. Refer to section E.3.1.2 for a detailed explanation.

E.6.5.              File associations

This helps you to set up the GUI to open files with a particular application.  Note that the client-ui has recently been updated to use the launching capabilities of the Operating System.  In most cases, the default behavior is sufficient to edit and open assets in the grid.  For situations where the default is not sufficient, this section documents how to override the default applications.

The file called .grid-applications.xml should go in the user's local file system home directory. This file has the list of programs to launch for some mime types that extend the basic launching support in the client-ui. On Windows-XP, the home directory will be "c:\Documents and Settings\myUserName" and on Windows7, the home directory will be "c:\Users\myUserName".  Note that this file is currently using short names for the first argument, which should be a program name. If you do not have your PDF editor or your Word document editor on the path, you will need to put the full path to the executable for the appropriate editor. The file called .mime.types should also go into the user's home directory.  This gives Java an association between file extensions (like .DOC) and the mime type that will be reported for files of that type.

To open a PDF file in grid namespace create .grid-application.xml and .mime.types file, copy them to your $HOME directory or equivalent locations in Mac and Windows. Launch client-ui. Browse the grid RNS space and select a PDF file you wish to open. Double click on the file. File will open Acrobat viewer.

Sample .grid-applications.xml file

<external-applications>
<mime-type name="application/acrobat">
     <application-registration type="Common"  factory-class="edu.virginia.vcgr.externalapp.DefaultExternalApplicationFactory">
            <configuration name="acroread">
                   <argument>acroread</argument>
                   <argument>%s</argument>
            </configuration>
     </application-registration>
</mime-type>

<mime-type name="application/msword">
     <application-registration type="Common" factory-class="edu.virginia.vcgr.externalapp.DefaultExternalApplicationFactory">
            <configuration name="word">
                   <argument>libreoffice</argument>
                   <argument>%s</argument>
            </configuration>
     </application-registration>
</mime-type>
</external-applications>

Sample .mime.types file:

application/acrobat pdf

application/msword doc docx

E.7. Fastgrid Command

The GFFS “grid” command is implemented in Java and loads several libraries at startup.  Thus it can take a few seconds to start “grid” on some platforms.  This leads to fairly annoying slowness when repeatedly running the grid command at the command line or when using it within scripts, if one invokes “grid” for every separate command.

One can also start the grid command once, leave it running, and enter commands into the same grid prompt to avoid repeatedly waiting for Java to load.  This approach works fine, but there is now a command called “fastgrid” that can make even separate invocations of the “grid” command very speedy.

The fastgrid script is available in the Genesis II “bin” directory of the installation.  Once the “set_gffs_vars” script has been loaded, fastgrid can be invoked in place of the normal “grid” command, for example:

fastgrid ls /home
fastgrid cp local:./makefile grid:build/
fastgrid whoami

The fastgrid script has built-in help that can be printed by running:

fastgrid -h

Fastgrid is implemented by starting the regular “grid” client in the background, and passing the user’s commands to that background grid process via named pipes.  The same “grid” process is used for all subsequent command invocations, and the status of each command is gathered from the named pipe to return as the fastgrid command’s exit value.

The --quitServer (or -q) flag can be passed to fastgrid to terminate that “real” grid client’s background process.

Fastgrid can also be run on a command stream of multiple lines by passing the --stdin (or -s) flag to it.  For example, this invocation relies on the Unix “here document” to pass several commands to fastgrid:

fastgrid -s <<eof
echo hello world ‘>’ /home/xsede.org/koeritz/world.hello
cat /home/xsede.org/koeritz/world.hello
rm /home/xsede.org/koeritz/world.hello
eof

 

 


F.     Grid Configuration

This section describes how to create a distributed grid using Genesis II components.  The main components of such a grid are: (1) the GFFS, which provides the file-system linking all the components together, (2) the Grid Queues, which support submitting compute jobs to the computational elements in the grid, and (3) the BESes, which represent each computational element.  Each of these services lives inside a container, which is a Genesis II installation that provides one or more services to the grid via a web-services interface.

Every GFFS grid has one “root container” that provides the root of the GFFS file system, similar to the traditional Unix file system root of “/”.  The remainder of GFFS can be distributed across other containers which are then “linked” into the root container.  Usually, the root container serves all of the top-level folders such as /home, /users and /resources.

This chapter will describe the overall structure for the GFFS filesystem and will provide steps for building a new grid, starting with the root container.

F.1.Structure of the GFFS

There is no definite requirement for any particular structure of the GFFS.  It starts as a clean slate, with only the root node ('/').  All of the top-level directories are defined by convention and generally exist to provide a familiar structure around the grid resources.

This section will describe the purpose of each of these directories.  Note that most of these are created by the GFFS root container deployment, which is described later in this chapter.

The following table documents the basics of the XSEDE namespace that is used for the XSEDE production grid.  The full definition of the the XSEDE namespace is provided by SD&I Activity 126 (https://software.xsede.org/viewvc/xsede/sdi/activities/sdiact-126/trunk/Plans/SDIACT-126_XSEDE_Global_GFFS_Namespace_Design-v8final.docx?view=co).

XSEDE Grid Namespace

/resources/xsede.org/containers
               
Stores the containers installed on the grid.  Each of these is usually a resource fork where a container is linked into the GFFS.

/etc

                Stores the signing certificate generator for container resource identifiers.

/etc/resolvers

                Stores the RNS resolvers for the grid that enable fail-over and replication.

/groups/xsede.org

                Stores group identities for the grid.

/home/xsede.org

                Stores the home folders for users.

/resources/xsede.org/queues

                Stores the queues that are available within the grid.

/users/xsede.org

                Stores the user identities in a convenient grid-wide location.

Other grids can use the XSEDE namespace design for their structure, but the portions of the namespace that mention “xsede.org” are replaced by a more locally appropriate name.  For example, the new XCG (Cross-Campus Grid) namespace at the University of Virginia has folders for /users/xcg.virginia.edu and /resources/xcg.virginia.edu and so forth.  The European GFFS grid has “gffs.eu” in those second tier names.  This approach supports federating multiple grids within the same structure; for example, the XSEDE grid can provide a link to /resources/xcg.virginia.edu within the XSEDE grid in order to reach the resources of the XCG grid.

F.1.1.              Linking a Container into a Grid

Assuming a Genesis II container is installed (either via the interactive installer or using the unified configuration scripts, both discussed in section D), then this container can be “linked” into the grid.  A container that is linked in the grid can then be used for services, based on the path where it has been linked.

Linking a container into the grid requires knowing the “service URL” for that container.  The interactive installer creates a “service-url.txt” file in the install folder for the new container, whereas the unified configuration model creates a “service-url.txt” file in the container’s state directory (pointed at by GENII_USER_DIR).  The contents of this file look similar to this:

https://khandroma.cs.virginia.edu:19235/axis/services/VCGRContainerPortType

This says that the container is running on a host named “khandroma” on port 19235.

Now that the service URL is known for the container, it can be linked into the grid with the following command template:

grid ln --service-url={container’s service URL} \
  /home/xsede.org/{my username}/MyContainer

The last parameter to the link command is the location where the container will appear within the grid.  This location must be writable by your user in order to create the container link.

For example, given my khandroma service URL, this command links my container into my home directory at a path called “khandro” (command should all be on one line):

grid ln --service-url=https://khandroma.cs.virginia.edu:19235/axis/services/VCGRContainerPortType /home/xsede.org/koeritz/khandro

To test if the container is really available at the new link location, try an “ls” command.  The results should appear similar to this:

$ grid ls MyContainer
MyContainer:
resources
filesystem-summary.txt
Services
container.log

If instead only the name “MyContainer” is printed out, then the link has failed.  This can be due to a number of reasons, such as the container not having been started or the container port being blocked by a firewall.

F.2.Deployment of a New GFFS Grid

The deployment of the GFFS requires two major components; a set of containers that are deployed on a host or set of hosts, and a deployment configuration package that enables a grid client or container to connect to the GFFS.  A “grid deployment package” is a directory of configuration items that is required to connect to an existing grid as a client.  This package is also required for configuring a Genesis II Container as a server that allows secure connections.  The deployment package is constructed when building the root container.  The client is provided a limited version of this package which does not contain any of the private keys used by the root container.

There is a “default” deployment shipped with the source code that contains a basic set of configurations necessary to run Genesis II.  A new deployment is created when “bootstrapping” a grid that inherits the “default” deployment.  This enables the basic security configuration of “default” to be extended to provide a secure grid.

Below are instructions to create the “Bootstrap” container that serves as the root of the RNS namespace and the primary source of GFFS services.  Secondary containers (i.e., not the GFFS root) are created using an installer that contains the deployment package produced during the Boostrap configuration process.  Using the installer enables new containers to be deployed very quickly.

Note that the following steps for the Bootstrap Container assume that the grid administrator is working with the Genesis II software as source code, rather than via an installer.  When using the Genesis II installer, these steps are not required for setting up clients or secondary containers.   Building the installer requires some working knowledge of Install4j, an Install4j license, and the root container’s deployment package (created below).  If you would like an installer built for your grid, it is recommended to contact xcghelp@cs.virginia.edu for assistance.

F.2.1.              Preparing the Environment for Generating Deployments

The deployment generation process requires a copy of the Genesis II source code (see Section H.2 if you need to obtain the source code and Section H.1 about installing Java and other prerequisites).  These steps use the GFFS Toolkit for the root container deployment, especially the deployment generator tool (see Section I for more information about the GFFS Toolkit).  The source code includes a copy of the GFFS Toolkit (in a folder called “toolkit”).

F.2.1.1. Configuration Variables for Bootstrapping

The deployment generator uses the same scripting support as the GFFS Toolkit, although it requires a smaller set of configuration items.  This section will describe the critical variables that need to be defined for the bootstrapping process.

The first choice to be made is which namespace the grid should support.  In the description of the process below, we will assume the use of the XSEDE production namespace for bootstrapping the grid.  This step copies the example configuration file for the XSEDE namespace into place as the configuration file for the GFFS Toolkit:

cd $GENII_INSTALL_DIR/toolkit
cp examples/toolkit_config_files/gffs_toolkit.config-xsede gffs_toolkit.config

Establish these variables in the bash shell environment:

·         GENII_INSTALL_DIR: point this at the location of the Genesis II source code.

·         GENII_USER_DIR: set this if you want to store the grid state in a different location than the default.  The default state directory is “$HOME/.genesisII-2.0”.

·         JAVA_HOME: specifies the top-level of the Java JDK or JRE.  This is required during deployment for running the keytool file, which is not always on the application path.

·         NEW_DEPLOYMENT: set this to the intended name of the root container’s deployment.  This name should be chosen carefully as a unique and descriptive name for bootstrapping the root container.  For example, it could be called “xsede_root” for the root container of the XSEDE grid.  It should not be called “default”, “current_grid” or “gffs_eu” which are already in use within the installer or elsewhere.

Important: For users on NFS (Network File System), it is critical that container state directories (pointed at by the GENII_USER_DIR variable) are not stored in an NFS mounted folder.  Corruption of the container state can result if this caution is disregarded.  Instead, the GENII_USER_DIR should be pointed at a folder that is on local storage to avoid the risk of corruption.

Modify the new gffs_toolkit.config for the following variables:

·         DEPLOYMENT_NAME: ensure that the chosen NEW_DEPLOYMENT from above is also stored in the configuration file for this variable.

The other variables defined in the gffs_toolkit.config can be left at their existing values (or can remain commented out) when generating the new grid deployment.

The remainder of the chapter will refer to GENII_INSTALL_DIR, GENII_USER_DIR and NEW_DEPLOYMENT as variables defined in the bash shell environment.  It is very convenient to load the required environment variables using a script rather than typing them again.  Often it makes the best sense to add the variables to the user’s shell startup script, such as $HOME/.bashrc.  Here are some example script commands that set the required variables:

export GENII_INSTALL_DIR=$HOME/genesis2-trunk
source $GENII_INSTALL_DIR/set_gffs_vars
export
NEW_DEPLOYMENT=xsede_root
export GENII_USER_DIR=$HOME/root-state-di
r

F.2.2.              Creating the GFFS Root Deployment

The six main steps to create the root container for GFFS are: (1) setup the trust store for the deployment, (2) generate key-pairs for the various identities needed for a grid container, (3) start up the GFFS root container, (4) create the root of the RNS name space, (5) archive the deployment, and (6) package the deployment for others.  These steps are documented in the following sections.

Prerequisites for Generating a Deployment

·         These procedures assume that the Genesis II code has been acquired and is already compiled.  To build the Genesis II code, refer to Section H.3 on “Building Genesis II from Source on the Command Line” to compile the codebase.  It is very important that the unlimited JCE jars are installed on any machine running the GFFS; refer to section H.1 for more information.

·         The procedures also require the GFFS Toolkit for execution.  The previous section describes configuring the test suite.

·         In the following steps, it is crucial that no user state directory exist before the GFFS container creates it.  If you have $HOME/.genesisII-2.0, then delete it beforehand.  (Or if $GENII_USER_DIR points at a different state directory, be sure to delete that.)

·         The user state directory must not be stored on an NFS file system.  One should point the GENII_USER_DIR at a directory on a local file system.

F.2.2.1. Setup the Trust Store

The basic GFFS security configuration for the root container is established in the deployment generator.  This involves setting up a resource signing keypair, a TLS keypair, an administrative keypair and the container’s trust store.

The first configuration feature is the “override_keys” folder, which allows the deployment to be built with a pre-existing “admin.pfx” and/or “tls-cert.pfx” file.  These files should be in PKCS#12 format with passwords protecting them.  If “admin.pfx” is present in “override_keys”, then it will be used instead of auto-generating an administrative keypair.  If “tls-cert.pfx” is present, then it will be used for the container’s TLS keypair rather than being auto-generated.  The passwords on these PFX files should be incorporated into the “passwords.txt” file discussed in a later section.

The next trust store component is the “trusted-certificates” directory in the deployment_generator.  This should be populated with the most basic CA certificates that need to be present in the container’s trust store.  The CA certificate files can be in DER or PEM format.  Any grid resource whose certificate is signed by a certificate found in this trust store will be accepted as valid resources within the GFFS.  Also, the GFFS client will allow a connection to any TLS certificate that is signed by a certificate in this trust store.  For example:

cd $GFFS_TOOLKIT_ROOT/tools/deployment_generator
cp known-grids/
uva_xcg_certificate_authority.cer trusted-certificates

The third component of the GFFS trust store is the “grid-certificates” directory, where the bulk of well-known TLS CA certificates are stored for the grid.  This directory will be bound into the installation program for the GFFS grid, but at a later time, the automated certificate update process may replace the installed version of those certificates for appropriate clients and containers.  The “grid-certificates” directory can be populated from the official XSEDE certificates folder when building an XSEDE grid as shown:

cd $GFFS_TOOLKIT_ROOT/tools/deployment_generator
cp /etc/grid-security/certificates/* grid-certificates

The deployment generator will use the given configuration to create the complete trust store.  This includes generating a resource signing certificate (“signing-cert.pfx”) for the grid which is built into the trust store file (“trusted.pfx”).  If not provided, the deployment generator will also automatically create the root container’s TLS certificate (“tls-cert.pfx”) and administrative certificate (“admin.pfx”).  The trusted-certificates and grid-certificates folders are included verbatim rather than being bound into trusted.pfx, which permits simpler certificate management later if changes are needed.

F.2.2.1.1.          XSEDE GFFS Root

Building an XSEDE compatible GFFS root requires additional steps.  Because the XSEDE grid uses MyProxy authentication (as well as Kerberos), the deployment generator needs some additional configuration to support it.

MyProxy Configuration

To authenticate MyProxy logins, an appropriate “myproxy.properties” file must reside in the folder “deployment-template/configuration” in the deployment generator.  Below is the default myproxy.properties file that is compatible with XSEDE’s myproxy servers; it is already included in the configuration folder:

edu.virginia.vcgr.genii.client.myproxy.port=7514
edu.virginia.vcgr.genii.client.myproxy.host=myproxy.teragrid.org
edu.virginia.vcgr.genii.client.myproxy.lifetime=950400

A directory called “myproxy-certs” should also exist under the deployment generator.  This directory should contain all the certificates required for myproxy authentication.  The provided configuration template includes a myproxy-certs directory configured to use the official XSEDE MyProxy server; this should be replaced with the appropriate CA certificates if the grid is not intended for use with XSEDE MyProxy.

F.2.2.2. Generate Key-Pairs and Create Deployment

The deployment can be created automatically using the script “populate-deployment.sh” in the deployment_generator folder.  Do not do this step unless it is okay to completely eliminate any existing deployment named $NEW_DEPLOYMENT (which will be located under $GENII_INSTALL_DIR/deployments).

cd $GFFS_TOOLKIT_ROOT/tools/deployment_generator
cp passwords.example passwords.txt

cp certificate-config.example certificate-config.txt

Edit the passwords specified in “passwords.txt”.  These passwords will be used for newly generated key-pairs.  These passwords should be guarded carefully.

Edit the certificate configuration in “certificate-config.txt” to match the internal certificate authority you wish to create for the grid.  The root certificate created with this configuration will be used to generate all container “signing” certificates, which are used to create resource identifiers inside of containers.  Container TLS certificates can also be generated from that root certificate, or they can be provided manually (and their CA certificate should be added to the trust store as described above).  Consult the sections “Container Network Security” and “Container Resource Identity” for a discussion of TLS and signing certificates.

The next step generates the necessary certificate files and copies them into the deployment.  Again, this step will *destroy* any existing deployment stored in the $GENII_INSTALL_DIR/deployments/$NEW_DEPLOYMENT folder.

bash populate-deployment.sh $NEW_DEPLOYMENT \
  {containerPortNumber} {containerHostName}

The container port number is at the installer’s discretion, but it must be reachable through any firewalls if grid clients are to connect to it.  The hostname for the GFFS root must also be provided, and this should be a fully qualified DNS hostname.  The hostname must already exist in DNS records before the installation.

F.2.2.3. Starting Up the GFFS Root Container

These steps can be used to get the GFFS root container service running.  They actually will work for any container built from source:

cd $GENII_INSTALL_DIR

bash runContainer.sh &>/dev/null &

Note that it can take from 30 seconds to a couple minutes before the container is finished starting up and is actually online, depending on the host.  A container is ready to use once the log file mentions the phrase “Done restarting all BES Managers”.  The log file is located by default in $HOME/.GenesisII/container.log.

F.2.2.4. Create the Root of the RNS Name Space

The previous steps have set up the deployment and started a container running.  That container will now be configured as the root container of the GFFS:

cd $GENII_INSTALL_DIR

./grid script local:deployments/$NEW_DEPLOYMENT/configuration/bootstrap.xml

Once the bootstrap process has succeeded, it’s important to clean up the bootstrap script, since it contains the admin password:

rm deployments/$NEW_DEPLOYMENT/configuration/bootstrap.xml

At this point, a very basic grid has been established.  The core directories (such as /home/xsede.org and /resources/xsede.org) have been created.  Any standard groups are created as per the definition of the namespace; for example, the XSEDE bootstrap creates groups called gffs-users for normal users and gffs-admins for administrators.  However, there are no users defined yet (besides the administrative keystore login).

The steps above also generate a crucial file called “$GENII_INSTALL_DIR/context.xml”.  The context.xml file needs to be made available to grid users before they can connect clients and containers to the new root container.  For example, this file could be uploaded to a web-server, or it could be manually given to users, or it could be included in a new Genesis II installation package.

F.2.2.5. Archive the Root Container Deployment

Now that a deployment has been created, it is important to make an archive of all the generated keys and trusted certificates:

# Stop the container.
bash $GFFS_TOOLKIT_ROOT/library/zap_genesis_javas.sh

# Back up the grid key-pairs and certificates.
cd $GFFS_TOOLKIT_ROOT/tools/deployment_generator
bash archive-full-deployment.sh

# Back up the runtime directory for the container.
tar -czf $HOME/grid_runtime_backup.tar.gz $GENII_USER_DIR

# Restart the container.
bash $GENII_INSTALL_DIR/runContainer.sh &>/dev/null &

The contents of the root name space for the grid should be backed up regularly; consult the document section How to Backup a Genesis II Grid Container for more details.

These steps will create archives in the $HOME folder.  These archives should be stored carefully and not shared with anyone but the grid's administrator group.

F.2.2.6. Package the Deployment Generator for Others

The grid administrator can make a package from the root container's deployment generator directory that other grid clients and containers can use to connect to the grid.  The package will provide the same trust store that the root container uses, and it provides new containers with a TLS certificate that will be trusted by grid clients:

cd $GFFS_TOOLKIT_ROOT/tools/deployment_generator

bash package-deployment.sh

This will create a file called deployment_pack_{datestamp}.tar.gz in the user’s home folder.  This archive can be shared with other users who want to set up a container or a grid client using the source code.  The package includes the container’s context.xml file, the trust store (trusted.pfx and other directories), and the admin certificate for the grid.

It is preferred to use a Genesis II installer for all other container and client installations besides the root (bootstrap) container.  The above deployment package should be provided to the person building the installer.  The package building script uses the deployment package to build an installer that can talk to the new root container.

F.2.3.              Changing a Container’s Administrative or Owner Certificate

There is an administrative certificate provided by the installation package for grid containers.  Changing the admin certificate has wide-ranging effects: it controls who can remotely administer the grid container and it changes whether operations can be performed on the container by the grid’s administrator (such as accounting data collection).  Changing the admin certificate for a grid container should not be undertaken lightly.

A related concept to the administrative certificate for a container is the “owner” certificate of the container.  The owner is a per-container certificate, unlike the admin certificate that is usually distributed by the installation program.  The owner certificate can also be changed from the choice that was made at installation time.

Clients whose credentials contain either the admin or owner certificate are essentially always given permission to perform any operation on any of that grid container’s services or on grid resources owned by the container.

For the discussion below, we will refer to the container’s security folder as $SECURITY_FOLDER.  It will be explained subsequently how to determine where this folder is located.

The grid container admin cert is located in $SECURITY_FOLDER/admin.cer.  The .cer file ending here corresponds to a DER-format or PEM-format certificate file.  Replacing the admin.cer file changes the administrative keystore for the container.

The container owner certificate is instead located in $SECURITY_FOLDER/owner.cer, and can also be in DER or PEM format.

The owner and admin certificates are also commonly stored in the $SECURITY_FOLDER/default-owners directory.  The default-owners directory is used to set default access control for a grid resource during its creation when no user security credentials are present.  This is a rather arcane piece of the Genesis II grid container and is mostly used by the grid container during certain container bootstrapping operations.

However, if either certificate is to be changed, then it makes sense to change default-owners too.  Otherwise some resources created during container bootstrapping will be “owned” (accessible) by the original certificates.  Because of this, if you wish to change the admin or owner certificate for a grid container, it is best to prevent the grid container from starting during installation and to immediately change the admin.cer and/or owner.cer files before starting the grid container for the first time.

If the container has inadvertently been started already but still has no important “contents”, then the default-owners can be changed after the fact.  The container should be stopped (e.g. GFFSContainer stop) and the GENII_USER_DIR (by default stored in $HOME/.genesisII-2.0) should be erased to throw out any resources that had the prior administrator certificate associated with them.  Again, only do this if there is nothing important installed on this container already!  Once the admin.cer and/or owner.cer file is updated, restart the container again (e.g. GFFSContainer start).

If the container has been inadvertently started but does have important contents, then the ACLs of affected resources and services can be edited to remove the older certificate.  The easiest method to edit ACLs is to use the client-ui (documented in Section E.6) to navigate to the affected resources and drag the old credential into the trash bin for ACLs it is present in.

Occasionally, a user certificate that owns a container may become invalid or the administrative certificate may need to be swapped out.  To swap the certificates into the proper location, we need to resolve the SECURITY_FOLDER variable to a real location for the container.  This has become more difficult than in the era when there were only interactive Genesis II installations, because containers can be configured differently when using a host-wide installation of the software.  To assist the grid maintainers, a new tool called “tell-config” has been added that can report the security folder location:

grid tell-config security-dir

Given that one has located the proper SECURITY_FOLDER (and has set a shell variable of that name), these steps take a new certificate file ($HOME/hostcert.cer) and make that certificate both the administrator and owner of a container:

# replace administrative certificate:
cp $HOME/hostcert.cer $SECURITY_FOLDER/admin.cer
cp $HOME/hostcert.cer $SECURITY_FOLDER/default-owners/admin.cer
# replace owner certificate:
cp $HOME/hostcert.cer $SECURITY_FOLDER/owner.cer
cp $HOME/hostcert.cer $SECURITY_FOLDER/default-owners/owner.cer

If only the admin or only the owner certificate needs to be updated rather than both, then just perform the appropriate section of commands from above.

F.2.4.              XSEDE Trust Store Customization

When a GFFS container or client is deployed on a host that supports the official XSEDE CA certificates, it is desirable to use the official certificates directory rather than the static copy of the certificates provided by the install package.  This affects two configuration items: the myproxy certificates and the grid’s TLS certificates.

To use the official certificates location for MyProxy, update the “security.properties” file in the container’s deployment configuration folder.  The full path to the file should be $GENII_INSTALL_DIR/deployments/current_grid/configuration/security.properties for most installations.  Editing this file requires root or sudo permissions if the installation is system-wide (e.g. installed from the RPM).  For the root container or other containers with specialized deployments, the path will be based on the active deployment name, such as $GENII_INSTALL_DIR/deployments/xsede_root/configuration/security.properties.  The active deployment folder can be shown by running “grid tell-config active-deployment-dir”.

A helper script called “use_official_trust_store.sh” has been developed and is available in “$GFFS_TOOLKIT_ROOT/tools/xsede_admin”.  This script performs the necessary edits on the security.properties file given that the GENII_INSTALL_DIR variable is set.  Run it without any flags to cause it to point the security.properties at the official certificate locations:

bash $GFFS_TOOLKIT_ROOT/tools/xsede_admin/use_official_trust_store.sh

After this script is run, the remainder of the edits below are not needed, but the container should be restarted so that it will start using the modified trust store (see section G.2.2 regarding restarting a container).  There is a periodic trust store refresh also for containers and clients, but restarting the application will use the new trust store immediately.  This command restarts the local container:

$GENII_INSTALL_DIR/GFFSContainer restart

These re-configuration steps are required again after an RPM install is upgraded, since the deployment’s security.properties file will be replaced with the default version.

If for some reason using the script is not appropriate (or if one has an older installation without the script), the trust store modification can also be performed manually with the following steps.  Modify security.properties to change this entry from the default:

edu.virginia.vcgr.genii.client.security.ssl.myproxy-certificates.location=myproxy-certs

Replace “myproxy-certs” above with the official XSEDE certificate directory location:

edu.virginia.vcgr.genii.client.security.ssl.myproxy-certificates.location=/etc/grid-security/certificates

Similarly, the TLS trust store can be configured to use the official XSEDE CA certificates.  The grid-certificates folder is defined in the same security.properties file, where the original entry looks like this:

edu.virginia.vcgr.genii.client.security.ssl.grid-certificates.location=grid-certificates

The new version should reference the official location of the XSEDE certificates:

edu.virginia.vcgr.genii.client.security.ssl.grid-certificates.location=/etc/grid-security/certificates

Afterwards, the container/client install will rely on the official XSEDE certificates for MyProxy and TLS authentication.

F.2.5.              Detailed Deployment Information

To create a new deployment from scratch, a directory should be created under “$GENII_INSTALL_DIR/deployments” with the name chosen for the new deployment.  That directory should be populated with the same files that the populate-deployment.sh script puts into place.  The deployment should inherit from the default deployment.

There are a few important requirements on certificates used with Genesis II:

·         Signing certificates (set in security.properties in 'resource-identity' variables) must have the CA bit set.  Container TLS certificates do not need the CA bit enabled.

·         Clients can only talk to containers whose TLS identity is in their trust stores (i.e., the CA certificate that created the TLS certificate is listed).

·         When acting as a client, a container also will only talk to other containers whose TLS certificates are in its trust store.

The deployment directory consists of configuration files that specify the container's properties and trusted identities. For an interactive install using the Split Configuration model (see Section D.8.6), these files are rooted in the deployment directory provided by the installer: $GENII_INSTALL_DIR/deployments/{DEPLOYMENT_NAME}. The following files and directories can usually be found in that folder (although changing properties files can change the name expected for particular files):

·         configuration/security.properties: Specifies most keystrength and keystore parameters

o   Determines lifetime of resource certificates.

o   Note that variables in the form of ${installer:X} should have been replaced by the deployment_generator tool.

·         configuration/server-config.xml: Configuration of the container as an internet server.

o   Provides the hostname on which the container will listen, and which other containers will use in links (EPRs) to that container.

·         configuration/web-container.properties: Defines the network location of the container.

o   This configures whether to trust self-signed certificates at the TLS layer; it is recommended to leave this as “true” for test environments.

o   If the deployment_generator tool is not used, then the container's “listen-port” value must be replaced with the appropriate port number.

·         security/admin.cer: Super-user certificate for the container.

o   This certificate is normally generated by deployment_generator tool.

o   The admin.pfx that corresponds to this certificate is capable of bypassing access control on the container and should be guarded extremely carefully.

o   Some grids may configure all containers to be controlled by the same administrative certificate.

·         security/owner.cer: The certificate representing the container’s owner.

o   This is the certificate for a grid user who has complete control over the container.

o   Ownership can be changed by swapping out this certificate and restarting the container.

·         security/default-owners: A directory holding other administrators of this container.

o   Can contain DER or PEM encoded .cer certificates.

o   Any .cer file in this directory is given default permissions on creating services for this container.

·         security/signing-cert.pfx: Holds the container’s signing key that is used to create resource identifiers.

·         security/tls-cert.pfx: Holds the TLS certificate that the container will use for all encrypted connections.

·         security/trusted.pfx: Contains certificates that are members of the container’s trust store.

o   This file encapsulates a set of certificates in PKCS#12 format.

·         security/trusted-certificates: A directory for extending the container’s trust store.

o   Certificate files can be dropped here and will automatically be part of the container trust store after a restart.

o   This is an easier to use alternative to the trusted.pfx file.

·         security/grid-certificates: Similar to trusted-certificates, this is a directory that extends the container’s trust store.

o   These certificates are part of the automatic certificate update process.

o   In the XSEDE grid, this directory often corresponds to the /etc/grid-security/certificates folder.

·         security/myproxy-certs: Storage for myproxy certificates.

o   This directory is the default place myproxyLogin and xsedeLogin use as the trust directory for myproxy integration.

·         configuration/myproxy.properties: Configuration of the myproxy server.

o   This file is necessary for the myproxyLogin and xsedeLogin commands.

Overrides for the above locations in the Unified Configuration model (for more details see Section D.8.6):

·         $GENII_USER_DIR/installation.properties: Property file that overrides some configuration attributes in the Unified Configuration model.  These include certain elements from security.properties, web-container.properties and server-config.xml.

·         $GENII_USER_DIR/certs: Local storage of container certificates in Unified Configuration.

o   The certificate (CER) and PFX files for a container with the Unified Configuration are stored here (unless the container uses a specialized deployment folder, see below).

o   The “grid-certificates” folder can be located here, and overrides the “security” folder of the deployment.

o   A “local-certificates” folder can be stored here to contain additional elements of the container’s trust store.

·         $GENII_USER_DIR/deployments: Storage for specialized container deployment in the Unified Configuration model.

o   This folder is absent unless the user chose to convert a container with a specialized deployment.

o   This case is similar to the Split Configuration described above, except that it resides under the state directory ($GENII_USER_DIR) rather than the install directory ($GENII_INSTALL_DIR).

F.2.6.              Certificate Revocation Management (CRL files)

The Genesis II clients and containers will process Certificate Revocation Lists (CRLs) according to the official certificates directory provided by XSEDE.  Genesis II will use CRL files if they are found in the “grid-certificates” trust store folder (see section F.2.2.1).  This folder can be pointed at an absolute path, such as the official XSEDE certificates directory (see section F.2.4).  The CRL files must end in the characters “.r0” to be recognized as CRL files, and they are expected to be in PEM format as encoded by the fetch-crl tool (http://linux.die.net/man/8/fetch-crl).

The CRL files found in the configured grid-certificates folder will be loaded and used to block connections to containers that are found to be running one of the revoked certificates.  This applies to both a Genesis II client connecting to a container, and also to a container connecting to another container for services.

F.2.6.1. Certificate Package Uploader

On official XSEDE hosts, the grid-certificates configuration should be pointed at the official location, which relies on regular updating of the CRL lists using the fetch-crl tool.  On non-XSEDE hosts, the grid-certificates will initially be provided by the install package, but can be caused to update automatically using a copy of the certificates package from within the GFFS grid.  The certificates package can be built using the “upload_grid_certs.sh” script provided by the install package.  This script creates a copy of the certificates and CRL files found in the official location (/etc/grid-security/certificates) and uploads that package to the GFFS grid (in grid:/etc/grid-security/certificates/grid-certificates-X.tar.gz, where X is replaced by a timestamp).  Example usage of the script:

bash $GFFS_TOOLKIT_ROOT/tools/xsede_admin/upload_grid_certs.sh

The script requires that the logged-in grid user has permission to write the new certificate file as well as permission to create the /etc/grid-security/certificates folder if it does not already exist.  The simplest way to obtain the proper rights is to log in as a member of the gffs-admins group, or to request that a grid administrator enable the permission for the particular grid user that will run the upload process.

The upload script can be added to a cron job in order to regularly update the certificate package in the grid.  Here is an example cron file that runs the upload script every day at 3am:

GENII_INSTALL_DIR=/opt/genesis2-xsede
# m h dom mon dow command
0 3 * * * grid xsedeLogin --username=myUser --password=xq9etc ; bash $GENII_INSTALL_DIR/toolkit/tools/xsede_admin/upload_grid_certs.sh

This cron job logs in as a grid user with appropriate permissions for running the upload script and then runs the upload script.  Afterwards, a new copy of the certificates package should be stored in the grid, and grid clients will periodically update their own copy of the grid-certificates as described in the next section.  If any errors occur during the upload process, messages will be printed to the console and the script’s exit code will be non-zero.

F.2.6.2. Automated Certificate Download and Update

The Genesis II client will periodically check for the presence of a new certificate package file in the grid, and if it is found, the client downloads that file locally and updates the state directory’s copy of the grid-certificates folder (in $GENII_USER_DIR/grid-certificates).  This folder automatically overrides the shipped version of the grid-certificates in order to use the latest CRL lists.

When the grid-certificates configuration is pointed at an absolute path, the certificate update process will not be performed by the client.  This allows the containers and clients to use the official certificates in a local filesystem, as described in section F.2.4.  If the grid-certificates configuration is left as a relative path (by default it is just set to “grid-certificates”), then the automatic certificate update process is enabled.

The file “$GENII_USER_DIR/update-grid-certs.properties” tracks the last runtime of the update process and the last package file that was used.  To force an update of the grid-certificates for the client, remove that file and run a new instance of the Genesis II client.  The state directory’s copy of the grid-certificates in “$GENII_USER_DIR/grid-certificates” will have a recent timestamp after the certificates have been updated successfully.  One can also examine the client log (in $HOME/.GenesisII/grid-client.log) to see information from the update process.

It is important for the grid-certificates to also be kept up to date on Genesis II containers, if they are not running on official XSEDE hosts.  Due to the implementation differences between Genesis II clients and containers, the automated certificate update processing used in the client code cannot be re-used in the container.  However, if the grid client updates the local certificates folder, then a container running as the same Unix user can take advantage of this; that is because the state directory is shared between the client and container running on the same account, and the container handles any CRL files found just as the client does.  The following cron job uses the grid client to regularly update the grid-certificates for the container:

GENII_INSTALL_DIR=/opt/genesis2-xsede
# m h dom mon dow command
0 5 * * * grid ls /

This job just runs the client every day at 5am to list the root directory of GFFS.  The side effect is that the client will also test whether there is a new certificates package and update its local copy of the grid-certificates if a new package is found.  Afterwards, the container will start using the new grid-certificates, which include the latest CRL files.  The container will start using the files after its process is restarted, but running containers will also periodically reload the trust store (every 4 hours by default).

F.3.Grid Containers

The notion of a container is a simple idea at its root.  A container encapsulates the services that a particular Genesis II installation can provide on the web.  In other service models, the container might have been referred to as being a “server”.  However, in the field of web services, any program that can host web services may be called a “container”.  This includes application servers like “tomcat”, “glassfish” and other programs that can host web services.

In the case of Genesis II, commodity container features are provided by the Jetty Application Server and Apache Axis 1.4, which route requests from clients to the specific services in Genesis II.  But in general terms, we refer to any separate Genesis II install that can provide web services to the grid as a “container”.  If a Genesis II installation just uses containers and provides no services of its own, then it is referred to as a “client”.

F.3.1.              Container Structure

Grid containers based on Genesis II have a representation in the GFFS as “resource forks”.  The resource fork provides a handle for manipulating the actual internals of the container, but the resource fork resembles a normal file or directory.  The notion of resource forks provides an easy to use mapping that represents the capabilities of the container (and other resource types) within the GFFS filesystem.

The top-level resource fork is VcgrContainerPortType, which provides access to the container itself.  Once a container is linked into the grid via the VcgrContainerPortType, the other services can be viewed under its Services folder.  This command shows the basic step for linking a container into the grid.  The target location where the container resides must be writable by the user creating this link:

grid ln \
  --service-url=https://{hostname}:{port#}/axis/services/VCGRContainerPortType \
  {/target/path/in/grid}

# example for a personal container in the user’s home folder.
grid ln \
  --service-url=https://cs.vogon.edu:18443/axis/services/VCGRContainerPortType \
  /home/xsede.org/fred/MyContainer

# show the services available on the container:
grid ls /home/xsede.org/fred/MyContainer/Services

The ‘ln’ command links the container into the grid, at which point the owner or administrator of the container can make the container’s services available to other users.

As an example of a container service, the X509AuthnPortType service on a container is where basic X509 grid identities are created.  A user on the XSEDE grid (with appropriate permissions) can list the directory contents for that port-type in the resource fork for the primary STS container by executing:

grid ls /resources/xsede.org/containers/sts-1.xsede.org/Services/X509AuthnPortType

All of the X509 users and groups that have been created based on that port type are shown.  An alternative identity is the Kerberos port type, which can be listed similarly:

grid ls /resources/xsede.org/containers/sts-1.xsede.org/Services/KerbAuthnPortType

However, these user entries are not simple files that can be copied to someplace else in the grid and used like normal files are used.  They only make sense in the context in which the container manages them, which is as IDP entities.  That is why we create a link to the user identity in the /users/xsede.org folder (see the Creating Grid Users section for more details) rather than just copying the identity; the link maintains the special nature of the identity, whereas a copy of it is meaningless.

There are several resource forks under the container's topmost folder.  Each item's name and purpose are documented below.  Note that visibility of these items is controlled by the access rights that have been set, and not every user will be able to list these folders.

resources                                                            This is a directory that lists all the resources, categorized by their types, that have been created in this container.  The directory is accessible to only the administrator of the container.

filesystem-summary.txt                              This file reports the current free space of the container's filesystem.

Services                                                               This directory holds all of the port-types defined on Genesis II containers.  Each port-type offers a different type of service to the grid.

container.log                                                     This is a mapping for the file container.log under the GENII_INSTALL_DIR.  Note that if that is not where the container log is being written, then this GFFS entry will not be accessible.  This is mainly a concern for containers that are built from source, as the graphical installer sets up the configuration properly for the container log.

Within the Services directory, one will see quite a few port types:

ApplicationDeployerPortType                 Deprecated; formerly part of preparing an application to use in the grid (e.g. unpacking from zip or jar file, etc).

ApplicationDescriptionPortType            Deprecated; associated with ApplicationDeployerPortType.

BESActivityPortType                                   OGSA port-type for monitoring and managing a single activity that has been submitted into a basic execution service.

CertGeneratorPortType                              A certificate authority (CA) that can generate new container TLS certificates for use within a grid.  Used in XCG, not expected to be used in XSEDE.

EnhancedRNSPortType                               The Resource Namespace Service (RNS) port-type, which provides directory services to the grid.  In addition to the standard RNS operations, this port-type supports a file creation service (createFile) that creates a file resource in the same container that the RNS resource resides in.

ExportedDirPortType                                  Port-type for accessing and managing a directory that lies inside an exported root.  It is not needed when using the light-weight export mechanism.

ExportedFilePortType                                 Port-type for accessing a file inside an exported directory.  Like the ExportedDirPortType, it is not needed for the light-weight export.

ExportedRootPortType                               Port-type for exporting a directory in local file-system in the GFFS namespace. This port-type is extended by its light-weight version, LightWeightExportPortType, and we recommend using that for exports instead of ExportedRootPortType.

FSProxyPortType                                          Similar to the LightweightExportPortType, but allows any filesystem-like subsystem that has a Genesis II driver to be mounted in the grid.  Examples include ftp sites, http sites, and so forth.

GeniiBESPortType                                         The BES (Basic Execution Management) Service provides the capability to execute jobs submitted to the container.

GeniiPublisherRegistrationPortType    An extension of the WS-Notification web service allowing data providers for subscriptions to register with the grid.  Interoperates with GeniiResolverPortType, GeniiSubscriptionPortType and GeniiWSNBrokerPortType.

GeniiPullPointPortType                              Unimplemented.

GeniiResolverPortType                               A service that provides WS-Naming features to the grid.  Clients can query the location of replicated assets from a resolver.

GeniiSubscriptionPortType                       Port-type for managing a subscription for web-service notifications. All Genesis II resources extends WSN NotificationProducer port-type, and, therefore, can publish notifications. This port-type is used to pause, resume, renew, and destroy any subscription for a Genesis II resource's notifications.

GeniiWSNBrokerPortType                        Deprecated; implements a form of subscription forwarding.

JNDIAuthnPortType                                     An authentication provider based on Java Native Directory Interface (JNDI) to allow grid logins via LDAP servers and other types that JNDI provides a wrapper for.

KerbAuthnPortType                                     This port-type supports Kerberos Authentication.

LightWeightExportPortType                    Port-type for exposing a directory in local file-system into the GFFS namespace.  Users with appropriate credentials can access and manipulate an exported directory just like a typical RNS resource.  The port-type also support a quitExport operation that detaches the exported directory from the GFFS namespace.

PipePortType                                                   A non-standard port-type for creating a unidirectional, streamable ByteIO communications channel.  Once the pipe is created, the client can push data into one end and it will be delivered to the other end of the pipe.  This is a less storage-intensive way to transfer data around the grid, because there does not need to be any intermediate copy of the data stored on a hard-drive or network location.

QueuePortType                                               A meta-scheduler port-type for submitting and managing jobs in multiple basic execution services (BES).  The user can submit, query status, reschedule, kill, etc. one or more activities through this port-type as well as can configure how many slots of individual BESes will be used by the meta-scheduler.

RExportDirPortType                                    An extension of the ExportedDirPortType that supports replication.

RExportFilePortTyp                                     An extension of the ExportedFilePortType that supports replication.

RExportResolverFactoryPortType        This port-type creates instances of the RExportResolver port-type.

RExportResolverPortType                        A port-type whose EPR is embedded into an exported directory or file's EPR to support resolving to a replica on failure.

RandomByteIOPortType                            Port-Type for accessing a bulk data source in a session-less, random way.  The user can read and write blocks of data starting at any given offset.  In other words, the port-type exposes a data-resource as a traditional random-access file in a local file-system.

StreamableByteIOPortType                      Port-type for accessing a bulk data source via a state-full session resource.  It supports the seekRead and seekWrite operations.

TTYPortType                                                   An earlier implementation of the PipePortType that was used at one time for managing login sessions.

VCGRContainerPortType                            The container port-type provides the top-level handle that can be used to link containers into the GFFS.  This port-type represents the container as a whole.  When this port-type is linked into the grid, users can see the container structure under that link (including, eventually, the VCGRContainerPortType).

WSIteratorPortType                                     Port-type for iterating over a long list of aggregated data, instead of retrieving it in a single SOAP response to its entirety. This port-type is used in conjunction with other port types such as in entry listing of an RNS resource or job listing in a queue. Its interface has exactly one operation: the iterate operation.

X509AuthnPortType                                    This port-type is the Identity Provider (IDP) for the container.  New identities can be created under this resource fork, and existing identities can be listed or linked from here.

F.3.2.              Where Do My Files Really Live?

When a user looks at the contents of the GFFS, the real locations of the files and directories are hidden by design.  The EPR for files (ByteIO) and directories (RNS resources) can be queried to find where they really reside, but usually this is not of interest to users during their daily activities in the grid.  It is much more convenient to consider the files as living “in the grid”, inside the unified filesystem of the GFFS.

However, this convenient view is not always sufficient, and a user may need to be very aware of where the files really reside.  For example, streaming a large data file from the root container of the GFFS to a BES container that is half-way around the world is simply not efficient.  In the next section, we describe how to store files on whatever container is desired.  But first, it is important to be able to determine where the files really reside.  For example, a user is given a home directory when joining a grid, but where does that home directory really live?

To determine which container is providing the storage location for a directory, use the following command:

# Show the EPR for a directory:
grid ls -e -d {/path/to/directory}

This will produce a lengthy XML description of the EPR, and included in that will be an xml element called ns2:Address that looks like the following:

<ns2:Address xsi:type="ns2:AttributedURIType">
https://server.grid.edu:18230/axis/services/EnhancedRNSPortType?genii-container-id=52451897-8A90-5BE4-1FAD-5D983AD2224C
</ns2:Address>

This provides several useful pieces of information.  The hostname server.grid.edu in this example is the host where the container really lives.  The port number after the colon (18230 in this example) is the port where container provides its web-services.  In addition, the unique id of the RNS resource itself (the directory being queried) is shown.

A file's presence in a particular directory in the GFFS (i.e., an RNS path) does not necessarily mean that the file actually resides in the same container as the directory.  That is because files linked into the directory from other containers are still actually stored in the original location.  To show which container a file or directory is really stored on, use the following command to display the item’s EPR:

# Show the EPR for a file.
grid ls -e {/path/to/file}

This produces another EPR dump, which will again have an ns2:Address entry:

<ns2:Address xsi:type="ns2:AttributedURIType">
https://wallaby.grid.edu:19090/axis/services/RandomByteIOPortType?genii-container-id=46D0CE0D-7F85-C5BA-5AC8-695A77E7668A
</ns2:Address>

Here again, we can see the host and port of the container storing the file.  In this case, the file is stored on the container at wallaby.grid.edu on port 19090.

F.3.3.              Serving GFFS Folders from a Specific Container

By default, the mkdir command will create directories using the GFFS root container as the storage location.  It is desirable to store files on other containers to reduce load on the root container.  It is also faster to access data files when they are closer geographically.  To create a directory on a particular container, use the following steps.  Afterwards, any files or directories stored in the new folder will be physically stored on the {containerPath} specified:

# Create the directory using the container's RNS port-type.
grid mkdir --rns-service={containerPath}/Services/EnhancedRNSPortType {/path/to/newDir}

Note that if links are created within the newDir, their storage locations are dictated by where the real file is stored, and not where the newDir is stored.  Only the link itself is stored in newDir.

F.3.4.              Container Network Security

All connections between GFFS containers use TLS (Transport Layer Security) to encrypt the SOAP communications and avoid exposing critical data on the network.  The basis of the TLS connection is a certificate file called the “TLS certificate”, which is configured in a container’s deployment in the “security.properties” file.

The TLS certificate represents the container’s identity on the network.  Incoming connections to a container will see that certificate as “who” they are connecting to.  When the container makes outgoing connections, it will use this certificate as its outgoing identity also.

In the case of some grids, the TLS certificates can be created automatically for a container at installation time using the grid’s Certificate Generator.  This is handled automatically by the Genesis II GFFS installer.

Other grids may have stricter security requirements, such that they provide their own TLS certificates from a trusted CA.  The installer can support such a container when the user is already in possession of an approved TLS certificate; there is an install dialog for adding the certificate.  If the certificate is not yet available, the user can go ahead and generate a temporary TLS certificate, and replace that later with the official certificate when available.

The TLS certificate for a container can be replaced at any time.  After switching to a different TLS certificate and updating the configuration for the container, one must restart the container to cause the new TLS certificate to take effect.

A GFFS grid client will only connect to a container if the TLS certificate of the container is known to the client, by its presence in the client’s trust store.  This ensures that the container is intentionally part of the grid, rather than being from some unknown source.  GFFS containers also follow this restriction when they act as clients (to connect to other containers for services).

Grid clients for a given grid will automatically trust the TLS certificates generated by the grid’s Certificate Generator.  If specific TLS certificates are used for each container, then each of the CA certificates that created the TLS certificates must be added to the installation’s trust store.  Once those CA certificates are present, grid clients and containers will then allow connections to be made to the affected container.  Further information on configuring the TLS certificate is available in Section F.2.5 as well as in the internal documentation in the deployment’s security.properties file.

F.3.5.              Container Resource Identity

Besides the TLS certificate described in the last section, there is another type of certificate used by containers.  This certificate is called the “Signing Certificate” and it is used for generating resource identifiers for the assets owned by a container.

The signing certificate is always created by the grid’s Certificate Generator.  It is an internal certificate that will not be visible at the TLS level, and so does not participate in the network connection process.  Instead, the Signing certificate is used to achieve a cryptographically secure form of GUID (Globally Unique IDentifier) for each resource in a container.  Each resource has a unique identity generated for it by the container using the Signing certificate.  This allows a container to know whether a resource was generated by it (e.g., when the resource is owned by the container and “lives” inside of it), or if the resource was generated by a different container.

All of the Signing certificates in a grid are “descended from” the root signing certificate used by the Certificate Generator, so it is also clear whether a resource was generated inside this grid or generated elsewhere.

The resource identifiers created by a container’s Signing certificate are primarily used in SAML (Security Assertion Markup Language) Trust Delegations.  Each resource can be uniquely identified by its particular certificate, which allows a clear specification for when a grid user has permitted a grid resource to act on her behalf (such as when the user delegates job execution capability to a Queue resource, which in turn may delegate the capability to a BES resource).

The Signing certificate thus enables containers to create resources which can be described in a standardized manner, as SAML assertions, in order to interoperate with other software, such as UNICORE EMS services.

F.3.6.              User Quota Configuration

It is possible to restrict the ability of grid users to create any files on a container.  It is also possible to permit file creations according to a quota system.  Either approach can be done on a per-user basis.

All files stored in a grid container (that is, “random byte IO” files) are located in the $GENII_USER_DIR/rbyteio-data folder.  Each grid user’s name is the first directory component under the rbyteio-data folder, allowing individualized treatment of the user’s ability to create files.

F.3.6.1. Blocking user ability to create files on a container:

If a user is to be disallowed from storing any byteio type files on the container, then it is sufficient to change the user’s data file folder permission to disallow writes for the OS account running the container.

For example: The container is running as user “gffs”.  The user “jed” is to be disallowed from creating any files in that container.  The user’s random byte IO storage folder can be modified like so:

chmod 500 $GENII_USER_DIR/rbyteio-data/jed

To enable the user to create files on the container again, increase the permission level like so:

chmod 700 $GENII_USER_DIR/rbyteio-data/jed

F.3.6.2. Establishing quotas on space occupied by user files:

Limits can be set on the space occupied by a user’s random byte IO files, enabling the sysadmin to prohibit users from flooding the entire disk with their data.  The following is one approach for establishing a per-directory limit for the user’s data files.

Assuming that a user named “jed” is to be given a hard quota limit of 2 gigabytes, the following steps will restrict jed’s total file usage using a virtual disk approach:

# Removal assumes the user had not important data yet!
rm -rf $GENII_USER_DIR/rbyteio-data/jed
# Create a virtual disk.
dd if=/dev/zero of=/var/virtual_disks/jed-store.ext3 bs=2G count=1
# Format the virtual disk.
mkfs.ext3 /var/virtual_disks/jed-store.ext3
# Mount the virtual disk in place of jed’s random byte IO folder.
mount -o loop,rw,usrquota,grpquota /var/virtual_disks/jed-store.ext3 \
   $GENII_USER_DIR/rbyteio-data/jed

The /var/virtual_disks path above must be accessible by the account running the container.

F.3.7.              Genesis Database Management

The Genesis II GFFS containers rely on the Apache Derby Embedded Database implementation for their database support.  Much of the time the database engine is trouble free, but occasionally it does need maintenance.  This section covers topics related to the management of the GFFS database.

F.3.7.1. Using the Derby Database Tools

The following procedure requires the Java Development Kit (JDK) and the Apache Derby software.  Java is available at Oracle at http://www.oracle.com/technetwork/java/javase/downloads/index.html.  The derby-db tools can be downloaded from http://db.apache.org/derby/derby_downloads.html

Decompress the derby tools and set the DERBY_INSTALL environment variable to point to path where derby tools reside.

F.3.7.2. Configure Embedded Derby

This link provides the material that is summarized below: http://db.apache.org/derby/papers/DerbyTut/install_software.html#derby_configure

To use Derby in its embedded mode set your CLASSPATH to include the jar files listed below:

derby.jar: contains the Derby engine and the Derby Embedded JDBC driver

derbytools.jar: optional, provides the ij tool that is used by a couple of sections in this tutorial

You can set your CLASSPATH explicitly with the command shown below:

Windows:

C:\> set CLASSPATH=%DERBY_INSTALL%\lib\derby.jar;%DERBY_INSTALL%\lib\derbytools.jar;.

UNIX:

$ export CLASSPATH=$DERBY_INSTALL/lib/derby.jar:$DERBY_INSTALL/lib/derbytools.jar:.

The Derby software provides another way to set CLASSPATH, using shell scripts (UNIX) and batch files (Windows). This tutorial shows how to set CLASSPATH explicitly and also how to use the Derby scripts to set it.

Change directory now into the DERBY_INSTALL/bin directory. The setEmbeddedCP.bat (Windows) and setEmbeddedCP (UNIX) scripts use the DERBY_INSTALL variable to set the CLASSPATH for Derby embedded usage.

You can edit the script itself to set DERBY_INSTALL, or you can let the script get DERBY_INSTALL from your environment. Since you already set DERBY_INSTALL, you don't need to edit the script, so go ahead and execute it as shown below:

Windows:

C:\> cd %DERBY_INSTALL%\bin C:\Apache\db-derby-10.10.1.1-bin\bin> setEmbeddedCP.bat

UNIX:

$ cd $DERBY_INSTALL/bin $ . setEmbeddedCP.ksh

F.3.7.3. Verify Derby

Run the sysinfo command, as shown below, to output Derby system information:

java org.apache.derby.tools.sysinfo

Successful output will look something like this:

------------------ Java Information ------------------ Java Version: 1.7.0_11 Java Vendor: Oracle Corporation Java home: /Library/Java/JavaVirtualMachines/jdk1.7.0_11.jdk/Contents/Home/jre Java classpath: /Users/me/src:/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derby.jar:/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derby.war:/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derbyLocale_cs.jar:/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derbyLocale_de_DE.jar:/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derbyLocale_es.jar:/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derbyLocale_fr.jar:/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derbyLocale_hu.jar:/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derbyLocale_it.jar:/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derbyLocale_ja_JP.jar:/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derbyLocale_ko_KR.jar:/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derbyLocale_pl.jar:/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derbyLocale_pt_BR.jar:/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derbyLocale_ru.jar:/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derbyLocale_zh_CN.jar:/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derbyLocale_zh_TW.jar:/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derbyclient.jar:/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derbynet.jar:/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derbyrun.jar:/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derbytools.jar:/Users/me/sw/db2jcc/lib/db2jcc.jar:/Users/me/sw/db2jcc/lib/db2jcc_license_c.jar:/Users/me/src:/Users/me/sw/demo/tableFunctionWhitePaper/jars/vtis-example.jar OS name: Mac OS X OS architecture: x86_64 OS version: 10.7.5 Java user name: me Java user home: /Users/me Java user dir: /Users/me/derby/mainline java.specification.name: Java Platform API Specification java.specification.version: 1.7 java.runtime.version: 1.7.0_11-b21 --------- Derby Information -------- [/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derby.jar] 10.10.1.1 - (1458268) [/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derbytools.jar] 10.10.1.1 - (1458268) [/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derbynet.jar] 10.10.1.1 - (1458268) [/Users/me/sw/z/10.10.1/db-derby-10.10.1.1-bin/lib/derbyclient.jar] 10.10.1.1 - (1458268) ------------------------------------------------------ ----------------- Locale Information ----------------- Current Locale : [English/United States [en_US]] Found support for locale: [cs] version: 10.10.1.1 - (1458268) Found support for locale: [de_DE] version: 10.10.1.1 - (1458268) Found support for locale: [es] version: 10.10.1.1 - (1458268) Found support for locale: [fr] version: 10.10.1.1 - (1458268) Found support for locale: [hu] version: 10.10.1.1 - (1458268) Found support for locale: [it] version: 10.10.1.1 - (1458268) Found support for locale: [ja_JP] version: 10.10.1.1 - (1458268) Found support for locale: [ko_KR] version: 10.10.1.1 - (1458268) Found support for locale: [pl] version: 10.10.1.1 - (1458268) Found support for locale: [pt_BR] version: 10.10.1.1 - (1458268) Found support for locale: [ru] version: 10.10.1.1 - (1458268) Found support for locale: [zh_CN] version: 10.10.1.1 - (1458268) Found support for locale: [zh_TW] version: 10.10.1.1 - (1458268) ------------------------------------------------------

The output on your system will probably be somewhat different from the output shown above, but it should reflect the correct location of jar files on your machine and there shouldn't be any errors. If you see an error like the one below, it means your class path is not correctly set:

$ java org.apache.derby.tools.sysinfo

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/derby/tools/sysinfo

F.3.7.4. Start up ij

Start up ij with this command:

java org.apache.derby.tools.ij

You should see the output shown below:

ij version 10.4 ij>

The error below means the class path isn't set correctly:

java org.apache.derby.tools.ij Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/derby/tools/ij

F.3.7.5. Connect to Genesis II database

Start up ij again and connect to the database in Genesis II state directory:

 

java org.apache.derby.tools.ij

ij> connect 'jdbc:derby:/<path-to-genesis2-state-dir>/derby-db';

 

F.3.7.6. Execute SQL statements

Once you connect to a database, you can execute SQL statements. ij expects each statement to be terminated with a semicolon (;); for example:

ij> create table derbyDB(num int, addr varchar(40));

F.3.7.7. Disconnect from a database

The disconnect command disconnects from the current database:

ij> disconnect;

F.3.7.8. Exit

The exit command quits out of ij and, in embedded mode, shuts down the Derby database:

ij> exit;

F.3.7.9. Run SQL Scripts to compress Genesis II state directory

Derby does not provide automatic database compaction, and hence the database can grow quite large over months of operation.  This section provides the techniques needed to compact the database.

You can execute SQL scripts in ij as shown below:

ij> run 'compress.db';

Here is sample compress.db script (script name need not end with db, it can be any extension):

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'ACCOUNTINGRECORDS', 0);                            call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'ACCTCOMMANDLINES', 0);                   

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'ACCTRECCREDMAP', 0);                     

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'ALARMTABLE', 0);                  

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'BESACTIVITIESTABLE', 0);                

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'BESACTIVITYFAULTSTABLE', 0);                 call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'BESACTIVITYPROPERTIESTABLE', 0);               call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'BESPOLICYTABLE', 0);                      

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'CLOUDACTIVITIES', 0);                    

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'CLOUDRESOURCES', 0);                     

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'CONTAINERPROPERTIES', 0);                   

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'CONTAINERSERVICESPROPERTIES', 0);                  call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'CREDENTIALS', 0);                  

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'ENTRIES', 0);                      

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'EXPORTEDDIR', 0);                  

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'EXPORTEDDIRENTRY', 0);                   

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'EXPORTEDENTRYATTR', 0);             

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'EXPORTEDFILE', 0);                       

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'HISTORYRECORDS', 0);                     

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'ITERATORS', 0);                    

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'MATCHINGPARAMS', 0);                     

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'PERSISTEDPROPERTIES', 0);                

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'PERSISTENTOUTCALLS', 0);                   

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'PROPERTIES', 0);                  

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'Q2EPRS', 0);                      

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'Q2ERRORS', 0);                    

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'Q2JOBHISTORYTOKENS', 0);                

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'Q2JOBLOGTARGETS', 0);                    

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'Q2JOBPINGS', 0);                  

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'Q2JOBS', 0);                      

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'Q2LOGS', 0);                      

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'Q2RESOURCES', 0);                  

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'RESOLVERMAPPING', 0);                    

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'RESOURCES', 0);                   

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'RESOURCES2', 0);                  

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'REXPORT', 0);                     

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'REXPORTENTRY', 0);                       

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'REXPORTENTRYATTR', 0);                   

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'SWAPMGRDIRECTORIES', 0);            

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'SWAPMGRDIRECTORYRESERVATIONS', 0);                 

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'UNKNOWNATTRS', 0);                        

call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('SA', 'WSNSUBSCRIPTIONS', 0);

F.4.Grid Queues

The Genesis II system provides a queuing feature for scheduling jobs on a variety of different types of BES services.  The queue matches the job requirements (in terms of number of CPUs, required memory, parameters for matching types of service required, and other factors) with a BES that is suited to execute the job.  When other jobs are already executing on the necessary resources, the queue keeps the job waiting until the resources become available.  Queues also provide services to users for checking on their jobs' states and managing their jobs while in the queue.

F.4.1.              Creating a Genesis II Queue

A queue in the GFFS generally does not do any job processing on its own.  It does all of the processing via the BES resources that have been added to the queue.  The following shows how to create the queue itself; later sections describe how to add resources of different types to the queue:

# Create the queue resource on a container.
grid create-resource {containerPath}/Services/QueuePortType {/queues/queueName}

Note that, by convention, publicly available queues are stored in the /queues folder in GFFS, but they can be created anywhere in the GFFS that the user has access rights.

# Give the owner full administrative rights on the new queue.
grid chmod {/queues/queueName} +rwx {/users/ownerName}

# Or, give a group rights to use the queue.
grid chmod {/queues/queueName} +rwx {/groups/groupName}

It is generally considered wiser to give a particular group rights to the queue; then members can be added and removed from the group without causing a lot of repeated maintenance on the queue itself.

Example of Steps in Action:

grid create-resource /containers/poe/Services/QueuePortType /queues/poe-queue

grid chmod /queues/poe-queue +rwx /users/drake

grid chmod /queues/poe-queue +rwx /groups/uva-idp-group

F.4.2.              Linking a BES as a Queue Resource

The computational elements in the grid are represented as Basic Execution Services (BES) containers.  These will be discussed more in the next section, but assuming that a BES is already available, it can be added as a resource on a grid queue with the following steps.  Once added as a resource, the queue can start feeding appropriate jobs to that BES for processing.

Make the BES available on the queue:

grid ln {/bes-containers/besName} {/queues/queueName}/resources/{besName}

The mere presence of the BES as a resource indicates to the queue that it should start using the BES for job processing.

# Set the number of queue slots on the resource:
grid qconfigure {/queues/queueName} {besName} {queueSlots}

Example of Steps in Action:

grid ln /bes-containers/poe-bes /queues/poe-queue/resources/poe-bes

grid qconfigure /queues/poe-queue poe-bes 23

F.5.Basic Execution Services (BES)

To configure Genesis II BES on a Linux machine, the grid administrator should install a Genesis II container on that machine first.  Usually this machine will be a submit node on a cluster, and it should have a batch job submission system (such as UNICORE, PBS, SGE, etc) set up on the cluster.  Once the BES is configured, it will talk to the underlying batch job submission system and submits users' jobs to the nodes on the cluster.  Grid admin can also configure attributes specific to that machine while setting up the BES.

F.5.1.              How to Create a Fork/Exec BES

The native BES type provided by Genesis II is a fork/exec BES.  This type of BES simply accepts jobs and runs them, and offers no special functionality or cluster support.  It offers a very basic way to build a moderately-sized computation cluster, if needed.

Adding a BES service requires that a Genesis II container already be installed on the host where the BES will be located.  To create a BES on that container, use the following steps:

# Create the BES on the container.
grid create-resource {containerPath}/Services/GeniiBESPortType {/bes-containers/besName}

# Give the BES owner all rights on the BES.
# (Write permission makes the user an administrator for the BES):
grid chmod {/bes-containers/besName} +rwx {/users/ownerName}

# Give the appropriate queue permissions to use the BES for submitting jobs.  # This includes any queue where the BES will be a resource:
grid chmod {/bes-containers/besName} +rx {/queues/queueName}

# Give the managing group rights to the BES.
grid chmod {/bes-containers/besName} +rx {/groups/managingGroup}

Example of Steps in Action:

grid create-resource /containers/poe/Services/GeniiBESPortType /bes-containers/poe-bes

grid chmod /bes-containers/poe-bes +rwx /users/drake

grid chmod /bes-containers/poe-bes +rx /queues/poe-queue

grid chmod /bes-containers/poe-bes +rx /groups/uva-idp-group

F.5.2.              Running a BES Container With Sudo

In the previous example, the BES was created on a machine to run jobs submitted by grid users.  These jobs execute on the local machine (Fork/Exec) or on the system's compute nodes through the local queuing system interface (PBS, Torque, etc). From the local machine's standpoint, all of these jobs come from one user; that is, all of the local processes or job submissions are associated with the same local uid.  The (local) user that submits the jobs is the same (local) user that owns the container.

This situation can lead to a security vulnerability: depending on the local configuration of the disk resources, if a (grid) user can submit jobs to the BES to run arbitrary code as the same (local) user as the container, that job will have access to all of the state (including grid credentials, files, etc) stored on the local file system.

To protect the container and other grid resources from the jobs, the BES may be configured to run the jobs as a unique local user account, which has limited permissions within the file system and execution environment.  This account can be configured to have access to only the files and directories specifically for that job, and thereby protect the container and the local operating system. This is accomplished using Linux's built-in command “sudo”, which changes the effective user for the process as it runs.

In the following, we will assume that the container was installed and runs as the user “besuser”. The effective user for running jobs will be “jobuser”. Setting up the local system (sudo, users, etc) requires administrative rights on the local machine, so it is assumed that “besuser” has the ability to execute administrative commands. All commands that require this permission will start with “sudo” (e.g: sudo adduser jobuser). Some aspects of configuration are common among any deployment using this mechanism, while other aspects depend on the type of BES (Fork/Exec or Native Queue) or the capabilities of the local operating system. These are described below.

F.5.2.1. Common Configuration

To enable execution of jobs as a unique user, first that user must exist on the local operating system. It is recommended that a new user is created specifically for the task of running jobs, with minimal permissions on system resources.

Within the local operating system, create “jobuser”:

sudo adduser jobuser

Set jobuser's default umask to enable group access, by adding umask 0002” to $HOME/.bashrc or similar. This will ensure that any files created by a running job can be managed by the container once the job has terminated.

Grant jobuser access to any shared resources necessary to execute jobs on the current system, such as job queues, or shared file-systems.

F.5.2.2. Extended ACLs vs. Groups

When a BES container sets up the working directory for a job, the files it creates/stages in are owned by the besuser. The jobuser must have access permissions to these files to successfully execute the requested job. There are two mechanisms by which the system may grant these permissions: groups or extended access control lists (Extended ACLs).

Extended ACLs are the preferred method for extending file permissions to another user, and are available in most modern Linux deployments. They provide a files owner the ability to grant read, write, and execute permissions on a per-user basis for each file. Compare this to Linux Groups, where every user in the group receives the same permissions.

In the following commands, we assume the Job state directory for the BES will be the default location at $GENII_USER_DIR/bes-activities. If it is configured to be located somewhere else in the file system, adjust the commands below accordingly.

If the BES is to be configured using Extended ACLs:

# Set the default access on the Job state directory and its children,
# so permission propagates to new job directories:
sudo setfacl -R --set d:u:besuser:rwx,d:u:jobuser:rwx $GENII_USER_DIR/bes-activities

# Grant jobuser access to the Job state directory and its existing children:
sudo setfacl -mR u:besuser:rwx,u:jobuser:rwx $GENII_USER_DIR/bes-activities

# If the BES is to be configured using Groups:

# Create a new group, for this example “g2”.
sudo addgroup g2

# Add jobuser and besuser to group “g2”:
sudo usermod -aG g2 besuser
sudo usermod -aG g2 jobuser

# Depending on the local operating system's configuration, it may be necessary # to set group “g2” as the default group for besuser and jobuser:
sudo usermod -g g2 besuser
sudo usermod -g g2 jobuser

# Change the owning group of the Job state directory
# and its children to group “g2”:
sudo chgrp -R g2 $GENII_USER_DIR/bes-activities

# Set the “sticky” bit on the Job state directory,
# so permission propagates to new job directories:
sudo chmod g+s $GENII_USER_DIR/bes-activities

# Grant explicit permission to the job process wrapper executable:
sudo chmod -f g+rx $GENII_USER_DIR/bes-activities/pwrapper*

F.5.2.3. Fork/Exec vs. Native Queue

Once the Job state directory has been configured, either with groups or Extended ACLs, the ability to execute jobs as the jobuser must be granted to the besuser. This is accomplished using Linux's built-in “sudo” command. To enable a user to user “sudo”, an administrator must add an entry into the “sudoers” file. This entry should limit the set of commands that the user may execute using “sudo”, or no actual security is gained by creating another user. It is recommended that “sudo” only be granted for the specific commands required to launch the job.

F.5.2.3.1.          Fork/Exec BES

If the BES is to be a Fork/Exec flavor BES, the ability to run with “sudo” should be granted only to the job process wrapper executable. This executable is included with the Genesis II deployment and is located in the Job state directory. The executable used depends on the local operating system, but the filename will always begin with “pwrapper”. To grant “sudo” ability for this executable, add an entry like the following to the file “/etc/sudoers”:

besuser ALL=(jobuser) NOPASSWD: {jobStateDir}/{pwrapperFile}

Where {jobStateDir} is the full path to the Job state directory, and {pwrapperFile} is the filename of the process wrapper executable. Note that “sudo” does not dereference environment variables, so the full path must be specified in the entry. For example, if the Job state directory is located at “/home/besuser/.genesisII-2.0/bes-activities” and the operating system is 32-bit Linux, the process wrapper executable will be “pwrapper-linux-32”, the “sudoers” entry should be:

besuser ALL=(jobuser) NOPASSWD: /home/besuser/.genesisII-2.0/bes-activities/pwrapper-linux-32

Once “sudo” has been granted to besuser, it may be necessary to restart the operating system before the changes take effect.

Once the local configuration is complete, a BES should be created which utilizes the “sudo” capability. This is accomplished by specifying a “sudo-pwrapper” cmdline-manipulator type in the construction properties for the new BES. An example construction properties file is included below, which we will call sudo-pwrapper.xml.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<construction-parameters xmlns="http://vcgr.cs.virginia.edu/construction-parameters" xmlns:ns2="http://vcgr.cs.virginia.edu/construction-parameters/bes" xmlns:ns3="http://vcgr.cs.virginia.edu/GenesisII/bes/environment-export" xmlns:ns4="http://vcgr.cs.virginia.edu/cmdline-manipulators" xmlns:ns5="http://vcgr.cs.virginia.edu/native-queue">

    <ns2:fuse-directory xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>

    <ns3:environment-export xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>

    <ns2:cmdline-manipulators>

        <ns4:manipulator-variation name="sudo_pwrapper" type="sudo-pwrapper">

            <ns4:sudo-pwrapper-configuration>

                <ns4:target-user> {jobuser} </ns4:target-user>

                <ns4:sudo-bin-path> {sudo} </ns4:sudo-bin-path>

            </ns4:sudo-pwrapper-configuration>

        </ns4:manipulator-variation>

        <ns4:call-chain>

            <ns4:manipulator-name>sudo_pwrapper</ns4:manipulator-name>

        </ns4:call-chain>

    </ns2:cmdline-manipulators>

<ns2:post-execution-delay xsi:nil="true"/>

<ns2:pre-execution-delay xsi:nil="true"/>

<ns2:resource-overrides/>

</construction-parameters>

Note that there are two parameters in the above example that require system-specific values: the “target-user” element which is shown with value {jobuser} and the “sudo-bin-path” element which is shown with value {sudo}{jobuser} should be the user name of the account under which the jobs will execute (the “jobuser” in all of the examples provided), and {sudo} should be the absolute path to the sudo executable (e.g. “/bin/sudo”).

To create the BES with these properties, execute the following:

grid create-resource --construction-properties=local:./sudo-pwrapper.xml {containerPath}/Services/GeniiBESPortType {/bes-containers/newBES}

F.5.2.3.2.          Native-Queue BES

If the BES is to be a Native Queue flavor BES, the ability to run with “sudo” should be granted only to the queue executables, e.g. “qsub”, “qstat”, and “qdel” on PBS-based systems. To grant “sudo” ability for these executables, add an entry like the following to the file “/etc/sudoers”:

besuser ALL=(jobuser) NOPASSWD: {bin-path}/{qsub}, {bin-path}/{qstat}, {bin-path}/qdel

Where {bin-path} is the full path to the directory where the queuing system executables are located, and {qsub}, {qstat}, and {qdel} are the filenames of the queuing system executables for submitting a job, checking a job's status, and removing a job from the queue, respectively. Note that “sudo” does not dereference environment variables, so the full path must be specified in the entry. For example, if the queue executables are installed in the directory “/bin”, and the native queue is PBS, the “sudoers” entry should be:

besuser ALL=(jobuser) NOPASSWD: /bin/qsub, /bin/qstat, /bin/qdel

Once “sudo” has been granted to besuser, it may be necessary to restart the operating system before the changes take effect.

Once the local configuration is complete, a BES should be created which utilizes the “sudo” capability. This is accomplished in the construction properties for the BES by prefacing the paths for the queue executables with the sudo command and parameters to indicate the jobuser. An example snippet from a construction properties file is shown below. Substituting these elements for the corresponding “pbs-configuration” element in the cons-prop.xml shown above will result in a construction properties for a sudo-enabled variant of the native-queue BES, which we will call sudo-native-queue.xml.

<ns5:pbs-configuration xmlns="" xmlns:ns7="http://vcgr.cs.virginia.edu/native-queue" queue-name="My-Queue">

    <ns7:qsub path="{sudo}">

        <ns7:additional-argument>-u</ns7:additional-argument>

        <ns7:additional-argument> {jobuser} </ns7:additional-argument>

        <ns7:additional-argument> {bin-path}/qsub </ns7:additional-argument>

        <ns7:additional-argument>-W</ns7:additional-argument>

        <ns7:additional-argument>umask=0007</ns7:additional-argument>

    </ns7:qsub>

    <ns7:qstat path="{sudo}">

        <ns7:additional-argument>-u</ns7:additional-argument>

        <ns7:additional-argument> {jobuser} </ns7:additional-argument>

        <ns7:additional-argument> {bin-path}/qstat </ns7:additional-argument>

    </ns7:qstat>

    <ns7:qdel path="{sudo}">

        <ns7:additional-argument>-u</ns7:additional-argument>

        <ns7:additional-argument> {jobuser} </ns7:additional-argument>

        <ns7:additional-argument> {bin-path}/qdel </ns7:additional-argument>

    </ns7:qdel>

</ns5:pbs-configuration>

Note that there are three parameters in the above example that require system-specific values:  {jobuser}, {bin-path} and {sudo}. {jobuser} should be the user name of the account under which the jobs will execute (the “jobuser” in all of the examples provided), {bin-path} should be the absolute path to the directory where the queue executables are located (same as the “sudoers” entry above), and {sudo} should be the absolute path to the “sudo” executable (e.g. “/bin/sudo”).

To create the BES with these properties, execute the following:

grid create-resource --construction-properties=local:./sudo-native-queue.xml {containerPath}/Services/GeniiBESPortType {/bes-containers/newBES}

F.6.Grid Inter-Operation

One of the strengths of the Genesis II GFFS software is its ability to connect heterogeneous resources into one unified namespace, which provides access to the full diversity of scientific computing facilities via a standardized, filesystem interface.  This section describes how to link resources into the GFFS from other sources, such as the UNICORE BES implementation of EMS and PBS-based queues for job processing.

F.6.1.              How to Create a BES using Construction Properties

To set up the BES wrapper on a machine that will submit to a queuing system, the user should know properties for the cluster configuration such as memory, number of cores on each node, the maximum slots that can be used to submit jobs on the cluster, and any other relevant options.  These properties need to be specified in the construction-properties file which is used while creating a BES resource.  Also the grid administrator should have already installed a Genesis II container on the head or submit node of PBS or similar job submission system.  A sample construction properties file is below, which we will call cons-props.xml:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<construction-parameters xmlns="http://vcgr.cs.virginia.edu/construction-parameters" xmlns:ns2="http://vcgr.cs.virginia.edu/construction-parameters/bes" xmlns:ns3="http://vcgr.cs.virginia.edu/GenesisII/bes/environment-export" xmlns:ns4="http://vcgr.cs.virginia.edu/cmdline-manipulators" xmlns:ns5="http://vcgr.cs.virginia.edu/native-queue">

    <ns2:fuse-directory xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>

    <ns3:environment-export xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>

    <ns2:cmdline-manipulators>

        <ns4:manipulator-variation name="default_pwrapper" type="pwrapper"/>

        <ns4:manipulator-variation name="mpich2" type="mpich">

            <ns4:mpich-configuration>

                <ns4:exec-command>mpiexec</ns4:exec-command>

                          <ns4:additional-arg>-comm mpich2-pmi</ns4:additional-arg>

                <ns4:supported-spmd-variation>

                   http://www.ogf.org/jsdl/2007/02/jsdl-spmd/MPICH2

                </ns4:supported-spmd-variation>

            </ns4:mpich-configuration>

        </ns4:manipulator-variation>

        <ns4:call-chain>

            <ns4:manipulator-name>mpich2</ns4:manipulator-name>

            <ns4:manipulator-name>default_pwrapper</ns4:manipulator-name>

        </ns4:call-chain>

    </ns2:cmdline-manipulators>

    <ns2:nativeq shared-directory="/nfs/shared-directory/bes" provider="pbs">

        <ns5:pbs-configuration xmlns="" xmlns:ns7="http://vcgr.cs.virginia.edu/native-queue" queue-name="My-Queue">

            <ns7:qsub/>

            <ns7:qstat/>

            <ns7:qdel/>

        </ns5:pbs-configuration>

    </ns2:nativeq>

    <ns2:post-execution-delay>15.000000 Seconds</ns2:post-execution-delay>

    <ns2:pre-execution-delay>15.000000 Seconds</ns2:pre-execution-delay>

    <ns2:resource-overrides>

        <ns2:cpu-count>16</ns2:cpu-count>

        <ns2:physical-memory>100000000000.000000 B</ns2:physical-memory>

        <ns2:wallclock-time-limit>168 hours</ns2:wallclock-time-limit>

    </ns2:resource-overrides>

</construction-parameters>

The BES can be created using the following command in grid command line:

# Create the actual BES resource.
grid create-resource --construction-properties=local:./cons-props.xml {containerPath}/Services/GeniiBESPortType {/bes-containers/newBES}

# Link the new BES container into the queue as a resource:
grid ln {/bes-containers/newBES} {queuePath}/resources/{newBES}

Once the BES is created, users can be added to the read and execute ACLs of the BES to allow those users to run jobs on that BES.

F.6.1.1. Job state in shared directory

In the above construction-properties file, the element <ns2:nativeq shared-directory> specifies the shared directory where job state of all the jobs submitted to that BES will be stored. A unique directory gets created for each job when a job gets scheduled on the BES and it is destroyed when the job completes. This path should be visible to all the nodes on the cluster and hence should be on a cluster wide shared directory.

F.6.1.2. Scratch space in shared directory

To configure Scratch space on BES, a special file called ScratchFSManagerContainerService.xml specifying path to scratch space should be created in the deployments configuration's cservices directory ($GENII_INSTALL_DIR/deployments/$DEPLOYMENT_NAME/configuration/cservices).

Below is a sample file to configure scratch space:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<container-service class="edu.virginia.vcgr.genii.container.cservices.scratchmgr.ScratchFSManagerContainerService">

<property name="scratch-directory" value="{/nfs/shared-directory/scratch}"/>

</container-service>

If scratch space is configured after BES is configured, then the BES container must be restarted once the scratch directory is set up.

F.6.1.3. Download Directory (Download manager) in shared directory

When users submit jobs that stage-in and stage-out files, the BES download manager downloads these files to a temporary download directory. If it is not explicitly configured while setting up the BES, it is created in container's state directory $GENII_USER_DIR/download-tmp. Usually container state directory is stored in local path and download directory should be on a shared directory like Job directory and Scratch directory. Also if $GENII_USER_DIR/download-dir and scratch directory are not on the same partition, BES may not copy/move the stage-in/stage-out files properly between download and Scratch-directory. It is highly advised they be on the same partition.

To configure the download directory, the path should be specified in a special file called DownloadManagerContainerService.xml that is located in the deployment's cservices directory ($GENII_INSTALL_DIR/deployments/$DEPLOYMENT_NAME/configuration/cservices).

Below is a sample file to configure the download directory:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<container-service class="edu.virginia.vcgr.genii.container.cservices.downloadmgr.DownloadManagerContainerService">

<property name="download-tmpdir" value="{/path/to/download-tmpdir}"/>

</container-service>

If the download directory is configured after the BES is configured, then the BES container must be restarted once the download directory is established.

F.6.1.4. BES attributes

A BES can be configured with specific matching parameters to direct jobs that might need these properties specifically to execute the jobs. Example some clusters may support MPI while some clusters may be 32-bit compatible while others can be 64-bit compatible. If jobs need certain requirement to be met, then those jobs will specify the requirements in the JSDL. The queue will match those jobs to BESes where these attributes are available. To set matching parameter user grid command 'matching-parameters' command.

For example, to add or specify that a particular BES supports MPI jobs, run this command on the queue:

grid matching-params {/queues/queuePath}/resources/{newBES} "add(supports-mpi, true)"

Some other matching parameters that are supported are:

add(x86_64_compatible, true)

add(GLIBC_2_4_COMPATIBLE, true)

add(blastp-2.2.24, true)

F.6.2.              Adding a PBS Queue to a Genesis II Queue

The main consideration for adding a PBS Queue to a Genesis II Queue is to wrap the PBS Queue in a Genesis II BES using a construction properties file.  The example cons-props.xml above shows such a file for a PBS Queue with MPI capability.  Given an appropriate construction properties file for the system, these steps will create the BES and add it as a resource to the queue.

# Create the actual BES resource.
grid create-resource --construction-properties=local:./cons-props.xml {containerPath}/Services/GeniiBESPortType {/bes-containers/newBES}

# Link the new BES container into the queue as a resource.
grid ln {/bes-containers/newBES} {/queues/queuePath}/resources/{newBES}

# Adjust the number of jobs that the queue will submit
# to the BES simultaneously.
grid qconfigure {/queues/queuePath} {newBES} {jobsMax}

F.6.3.              Adding a UNICORE BES to a Genesis II queue

F.6.3.1. UNICORE Interoperation with XSEDE approved certificates

The following assumes that the Genesis II Queue container is using an approved XSEDE certificate for its TLS certificate(s) and that only XSEDE MyProxy-based users need to be supported.  This also assumes that the UNICORE gateway, UNICORE/X and TSI components are already installed and available for use.

UNICORE 6: The Genesis II interoperation package “unicore-in-xsede-1.6.x.zip” needs to be installed to enable UNICORE to stage date in and out from the GFFS grid.  This is available at the SD&I Activity 123 subversion repository (which is listed on the Activity 123 page at https://www.xsede.org/web/staff/staff-wiki/-/wiki/Main/Activity+-+Integrated+EMS+and+GFFS+Increment+3+Update+-+SDI+Plan+Sep+2012).

UNICORE 7: The Genesis II interoperation package “unicore-in-xsede-7.3.0.tar” needs to be installed to enable UNICORE to stage date in and out from the GFFS grid.  This is available at the SD&I Activity 148 subversion repository (which is listed on the Activity 123 page at https://jira.xsede.org/browse/SDIACT-148).  There is additional information in the GFFS Service Provider Guide, available at https://software.xsede.org/viewvc/xsede/sdi/activities/sdiact-220/trunk/Plans/Service_Provider_Guide.pdf?view=log .

Note that the maximum SOAP header size may need to be modified, if the grid users sending jobs to UNICORE will be in a substantial number of groups.  When the maximum size is too small, there will be complaints in the UNICORE gateway log.  The header size can be adjusted by changing the line in the gateway/conf/gateway.properties file as follows (below permits approximately a 400 kilobyte SOAP header):

gateway.soapMaxHeader=409600

Additionally, the user configuring the UNICORE BES must be logged in with a grid account that permits creating the necessary links in the GFFS (such accounts include the keystore login for the grid, grid administrator accounts, or groups created for this purpose by the grid administrators).

Add the CA certificate of all valid TLS identities for SAML credentials into the UNICORE/X directory-based trust store.  This should include the MyProxy CA certificates, at the very least:

# copy the PEM into the UNICORE trust store.
cp CAcert1.pem CAcert2.pem . . . \
   $UNICORE_INSTALL_PATH/unicorex/conf/trustedIdentityProviders

A simpler method than the above can be used to point the UNICORE/X trust store at all valid XSEDE CA certs.  This involves editing “unicorex/conf/uas.config” to add this line:

genii.trusted.dir=/etc/grid-security/certificates

Given that the container’s own TLS certificate is an official XSEDE-approved certificate, the UNICORE gateway trust store should already allow connections from the container.  However, the UNICORE/X component requires one of the above methods to trust the XSEDE user identity certificates before the queue can successfully submit jobs on behalf of XSEDE users.

XSEDE users are mapped to local operating system users as part of UNICORE authentication.  To enable a new XSEDE user, add the XSEDE portal certificate into the grid map file.  (This may already have been done on official XSEDE hosts.)

# edit the grid-mapfile and add a line similar to this (the grid-mapfile is
# usually found in /etc/grid-security/grid-mapfile):
"/C=US/O=National Center for Supercomputing Applications/CN={XSEDE_NAME}" {UNIX_USER_NAME}

The {UNIX_USER_NAME} above is the identity that will be used on the local system for running jobs.  This should be the same user that installed the UNICORE software.  The {XSEDE_NAME} above is the XSEDE portal user name for your XSEDE MyProxy identity.  This information can be obtained by authenticating with xsedeLogin with the grid client and then issuing a whoami command:

# user and password will be prompted for on console or in graphical dialog:
grid xsedeLogin
# alternatively they can both be provided:
grid xsedeLogin --username=tony --password=tiger

# try this if there are weird problems with console version of login:
unset DISPLAY
# the above disables a potentially defective X windows display; try logging
# in again afterwards.

# finally… show the XSEDE DN.
grid whoami --oneline

Acquire the CA Certificate that generated the certificate being used for the UNICORE Gateway and for UNICORE/X (this can be one certificate, or two if they are generated separately by different CAs).  Add that into the trusted-certificates directory on the Genesis Queue container.  Repeat this step on any containers or clients where you would like to be able to directly connect to the UNICORE BES.  If all users will submit jobs via the queue, then only the queue container needs to be updated:

# Example with unicore container using u6-ca-cert.pem and a Genesis
# deployment named ‘current_grid’.  In reality, this may involve more than
# one host, which would lead to a file transfer step and then a copy.
cp $UNICORE_INSTALL_DIR/certs/u6-ca-cert.pem \
 $GENII_INSTALL_DIR/deployments/current_grid/security/trusted-certificates

Alternatively, since the UNICORE TLS certificate is assumed to be generated using XSEDE CA certificates, then the following step is sufficient (rather than copying individual certificates):

# assumes the XSEDE certificates are installed at /etc/grid-security/certificates.
mv $GENII_INSTALL_DIR/deployments/current_grid/grid-certificates \
  $GENII_INSTALL_DIR/deployments/current_grid/old-grid-certificates
ln -s /etc/grid-security/certificates \
  $GENII_INSTALL_DIR/deployments/current_grid/grid-certificates

Now that mutual trust between the UNICORE BES and the Genesis II GFFS has been established, link the UNICORE BES into the Genesis II namespace with the following command:

grid mint-epr --link={/bes-containers/u6-bes} \
 --certificate-chain=local:{/path/to/unicorex-tls-cert} {bes-url}

The unicorex-tls-cert is the certificate used by the UNICORE/X container for TLS (aka SSL) communication.  Note that this is different from the UNICORE CA certificate in the last step; this should be the actual TLS certificate and not its CA.  Also, be sure to provide the UNICORE/X TLS certificate rather than the Gateway TLS certificate (if these are different); otherwise trust delegations cannot be extended by UNICORE/X (in the uas-genesis component) and grid stage-in and stage-out will not work in submitted jobs.  The bes-url has the following form (as documented in the UNICORE manual installation guide, at http://www.unicore.eu/documentation/manuals/unicore/):

https://{u6GatewayHost}:{u6GatewayPort}/{u6XserverName}/services/BESFactory?res=default_bes_factory

This is an example of the UNICORE EPR minting process combined in one command:

grid mint-epr --link=/bes-containers/unicore-bes \
 --certificate-chain=local:$HOME/unicorex-tls-cert.pem \
 https://u6.del2.com:20423/U6Test/services/BESFactory?res=default_bes_factory

Once the UNICORE BES is linked into the grid, it can be linked into an existing queue as a resource:

grid ln {/bes-containers/u6-bes} {/queues/theQueue}/resources/{resource-name}

The resource-name above can be chosen freely, but is often named after the BES that it links to.

Adjust the number of jobs that the queue will submit to the BES simultaneously (where jobsMax is an integer):

grid qconfigure {/queues/theQueue} {resource-name} {jobsMax}

To remove the UNICORE BES at a later time, it can be unlinked from the queue by calling:

grid unlink {/queues/theQueue}/resources/{resource-name}

F.6.3.2. UNICORE Interoperation with non-XSEDE Certificates

The steps taken in the previous section are still necessary for setting up a UNICORE BES when one does not possess XSEDE-approved certificates.  However, to configure security appropriately to let users and GFFS queues submit jobs, there are a few additional steps required for non-XSEDE grids.

For each TLS identity that will connect directly to the BES, add the CA certificate that issued the certificate into the Gateway and UNICORE/x trust stores.  If there are multiple certificates in the CA chain, then each should be added to the trust stores.

# Add CA certificate to the gateway’s trust store folder.  Consult the
# gateway/conf/security.properties file to determine the actual location
# of the gateway trust store directory.
cp users-CA-cert.pem {/gateway/cert/store}

# Add the CA certificate to the directory trust store for UNICORE/X.
cp users-CA-cert.pem \
  $UNICORE_INSTALL_PATH/unicorex/conf/trustedIdentityProviders

Once the users’ CA certificates and the queues’ CA certificates have been added, the UNICORE BES can be configured as described in the prior section, and then it should start accepting jobs directly from users as well as from the queue container.

F.6.3.3. Debugging UNICORE BES Installations

If there are problems inter-operating between the GFFS and UNICORE, then it can be difficult to determine the cause given the complexity of the required configuration.  One very useful tool is to increase logging on the UNICORE servers and GFFS containers involved.

For UNICORE, the “debug” level of logging provides more details about when connections are made and why they are rejected.  This can be updated in the gateway/conf/logging.properties file and also in the unicorex/conf/logging.properties file.  Modify the root logger line in each file to enable DEBUG logging as follows:

log4j.rootLogger=DEBUG, A1

For the Genesis II GFFS, the appropriate logging configuration files are in the installation directory in lib/production.container.log4j.properties and lib/production.client.log4j.properties.  For each of those files, debug-level logging can provide additional information about job submissions by changing the rootCategory line accordingly:

log4j.rootCategory=DEBUG, LOGFILE

F.6.4.              Adding an MPI Cluster to a Grid Queue

F.6.4.1. Create a BES with MPI configuration for the MPI-enabled cluster

Create a Native Queue BES using a construction properties file (for use with BES creation) that specifies the MPI types supported by the cluster along with the syntax for executing MPI jobs and any special commandline arguments.  The “manipulator-variation” structure under the “cmdline-manipulators” structure specifies the MPI related details for the cluster.  The “supported-spmd-variation” field gives the SPMD type as per the specification.  The “exec-command” filed specifies the execution command for running MPI jobs on the cluster.  The “additional-arg” field specifies any additional command-line arguments required to run an MPI job on the cluster.  An example construction properties file for the Centurion Cluster is provided below.

<?xml version="1.0" encoding="UTF-8"?>

<genii:construction-parameters

        xmlns:genii="http://vcgr.cs.virginia.edu/construction-parameters"

        xmlns:bes="http://vcgr.cs.virginia.edu/construction-parameters/bes"

        human-name="PBS-Centurion BES with MPI and Pwrapper">

        <bes:post-execution-delay>15.000000 Seconds</bes:post-execution-delay>

        <bes:pre-execution-delay>15.000000 Seconds</bes:pre-execution-delay>

        <bes:resource-overrides>

                <bes:cpu-count>2</bes:cpu-count>

                <bes:physical-memory>2060000000.000000 B</bes:physical-memory>

        </bes:resource-overrides>

        <bes:nativeq provider="pbs"

                shared-directory="/home/gbg/shared-directory"

                xmlns:nq="http://vcgr.cs.virginia.edu/native-queue">

                <nq:pbs-configuration queue-name="centurion">

                        <nq:qsub/>

                        <nq:qstat/>

                        <nq:qdel/>

                </nq:pbs-configuration>

        </bes:nativeq>

        <bes:cmdline-manipulators

                xmlns:clm="http://vcgr.cs.virginia.edu/cmdline-manipulators">

                <clm:manipulator-variation

                        type="pwrapper"

                        name="pwrapper">

                </clm:manipulator-variation>

                <clm:manipulator-variation

                        type="mpich"

                        name="mpi1">

                        <clm:mpich-configuration>

                                <clm:supported-spmd-variation>http://www.ogf.org/jsdl/2007/02/jsdl-spmd/MPICH1</clm:supported-spmd-variation>

                                <clm:exec-command>mpiexec</clm:exec-command>

               <clm:additional-arg>-p4</clm:additional-arg>

                        </clm:mpich-configuration>

                </clm:manipulator-variation>

                <clm:call-chain>

                        <clm:manipulator-name>mpi1</clm:manipulator-name>

                        <clm:manipulator-name>pwrapper</clm:manipulator-name>

                </clm:call-chain>

        </bes:cmdline-manipulators>

</genii:construction-parameters>

F.6.4.2. Add MPI-Enabled BES to Queue

Add BES resource to queue like any other BES.

F.6.4.3. Set matching parameters

Set up matching parameters (as described in the matching parameters section) and use them for matching jobs to this BES.  This is required because Genesis II queues are not MPI-aware.

F.6.5.              Establishing Campus Bridging Configurations

Educational institutions may wish to participate in the XSEDE grid to share computational resources, either by utilizing the resources XSEDE already has available or by adding resources from their campus computing clusters to the XSEDE grid for others’ use.  There are a few requirements for sharing resources in this way, and they are described in the following sections.

F.6.5.1. Obtaining an XSEDE Portal ID

One primary requirement for using XSEDE resources is to obtain an XSEDE portal ID.  The portal ID can be obtained from the XSEDE website at http://xsede.org.  Once the ID is obtained, the user’s grid account needs to be enabled by a grid admin.  The XSEDE ID can then be used to log into the XSEDE grid.

A grid user can create files and directories within the GFFS, which is required for adding any new resources to the grid.  Further, the XSEDE grid account enables the grid user to be given access to existing XSEDE grid resources.

F.6.5.2. Link Campus Identity Into XSEDE Grid

Another primary requirement for campus bridging is to link the campus identity for a user into the XSEDE grid.  After the campus user has obtained an XSEDE grid account, she will have a home folder and a user identity within the XSEDE grid.  However, at this point the XSEDE grid has no connection to the user’s identity on campus.   Since the campus identity may be required to use the campus resources, it is important that the user’s credentials wallet contain both the campus and XSEDE identities.

For example, campus user identities may be managed via a Kerberos server.  By following the instructions in the section on “Using a Kerberos STS”, an XSEDE admin has linked the STS for campus user “hugo” at “/users/hugo”.  Assuming that the user’s XSEDE portal ID is “drake” and that identity is stored in “/users/drake”, the two identities can be linked together in the XSEDE grid with:

# give drake the right to use the hugo identity.
grid chmod /users/hugo +rx /users/drake 

# link campus id under xsede id.
grid ln /users/hugo /users/drake/hugo 

The XSEDE user drake will thus automatically attain the identity of the campus user hugo when drake logs in.  After this, drake will seamlessly be able to utilize both the XSEDE grid resources as drake and the campus resources as hugo.

F.6.5.3. Link Campus Resources Into XSEDE Grid

Campus researchers may wish to share their local compute resources with others in the XSEDE grid.  In order to do this, the campus user should wrap the resource as a BES service and link it to the grid as described in the section on “How to Create a BES with Construction Properties”.  That resource can then be added to a grid queue or queues by following the steps in the section “Linking a BES as a Queue Resource”.

Assuming that the BES is successfully linked to a grid queue, users with rights on the grid queue should be able to send compute jobs to the linked campus resource automatically.  If it is desired to give an individual user the privilege to submit jobs directly to the BES, this can be done with the “chmod” tool.  For example, the user “drake” could be given access to a newly-linked PBS-based BES as follows:

grid chmod /bes-containers/GridU-pbs-bes +rx /users/drake

F.6.5.4. Utilizing XSEDE Resources

Campus researchers may wish to use the resources already available in the XSEDE grid.  At its simplest, this is achieved by adding the user to a grid group that has access to the queue possessing the desired resources.  The user can also be given individual access to resources by using chmod, as detailed in the last section.

This situation can become more complex when the resources are governed by allocation constraints or other jurisdictional issues.  This may require the user to obtain access through consultation with the resource owners, or to take other steps that are generally beyond the scope of this document.

 


G.    Grid Management

Once a grid configuration has been established and the queuing and computational resources are set up, there are still a number of topics that come up for day to day grid operation.  These include managing users and groups, performing backups and restores on container state, grid accounting and other topics.  These will be discussed in the following sections.

G.1.        User and Group Management

User and group identities exist in the GFFS as they do for most filesystems.  These identities can be given access rights on grid resources using some familiar patterns from modern day operating systems.  It is useful to keep in mind that a user or a group is simply an identity provided by an IDP (Identity Provider) that the grid recognizes.  IDPs can be provided by Genesis II, Kerberos and other authentication servers.

Creating and managing user identities in the GFFS requires permissions on the {containerPath} in the following commands.

G.1.1.             Creating Grid Users

To create a new user that can log in to the grid, use the following steps:

# Create the user's identity.
grid create-user {containerPath}/Services/X509AuthnPortType \
 {userName} --login-name={userName} --login-password={userPassword} \
 --validDuration={Xyears|Ydays}

# Link the user identity into the /users folder (this is a grid convention).
grid ln {containerPath}/Services/X509AuthnPortType/{userName} \
 /users/{userName}

# Take away self-administrative rights for user.
grid chmod /users/{username} 5 /users/{username}

# Create a home folder for the user's files.
grid mkdir /home/{userName}
grid chmod /home/{userName} "+rwx" /users/{userName}

Example of Steps in Action:

grid create-user /containers/poe/Services/X509AuthnPortType drake \
 --login-name=drake --login-password=pwdx –validDuration=1years

grid ln /containers/poe/Services/X509AuthnPortType/drake /users/drake
grid chmod /users/drake 5 /users/drake
grid mkdir /home/drake
grid chmod /home/drake "+rwx" /users/drake

G.1.2.             Creating a Group

Group identities in the grid are very similar to user identities.  They can be given access to other resources and identities.  They are created using the following steps:

# Create the group identity
grid idp {containerPath}/Services/X509AuthnPortType {groupName}

# Link the group identity to the /groups folder (by convention).
grid ln {containerPath}/Services/X509AuthnPortType/{groupName} \
 /groups/{groupName}

Example of Steps in Action:

grid idp /containers/poe/Services/X509AuthnPortType uva-idp-group

grid ln /containers/poe/Services/X509AuthnPortType/uva-idp-group \
 /groups/uva-idp-group

G.1.3.             Adding a User to a Group

When they are created, groups have no members.  Users and other groups can be added to a group using the following commands:

# Give user permissions to the group.
grid chmod /groups/{groupName} +rx /users/{userName}

The userName will have read and execute access to the groupName afterwards.  Instead of /users/{userName}, a group could be used instead.

# Link the group identity under the user identity to enable automatic login.
grid ln /groups/{groupName} /users/{userName}/{groupName}

The next time the userName logs in, she will also automatically log into the group identity. Note that this will fail if the first step above (to grant access) was not performed.

Example of Steps in Action:

grid chmod /groups/uva-idp-group +rx /users/drake

grid ln /groups/uva-idp-group /users/drake/uva-idp-group

G.1.4.             Removing a User from a Group

Taking a user back out of a group is basically the reverse process of adding:

# Unlink the group identity under the user's identity.
grid unlink /users/{userName}/{groupName}

# Remove permissions on the group for that user.
grid chmod /groups/{groupName} 0 /users/{userName}

Example of Steps in Action:

grid unlink /users/drake/uva-idp-group

grid chmod /groups/uva-idp-group 0 /users/drake

G.1.5.             Removing a User

Occasionally a grid user needs to be removed.  These steps will erase their identity:

# Remove the home folder for the user's files.
grid rm -rf /home/{userName}

Note that this will destroy all of the user's files!  Do not do this if their data is intended to be retained.

# Unlink any groups that the user was added to.
grid unlink /users/{userName}/{groupName}

# Unlink the user identity from the /users folder.
grid unlink /users/{userName}

# Delete the user's identity.
grid rm -f {containerPath}/Services/X509AuthnPortType/{userName}

Example of Steps in Action:

grid rm -rf /home/drake

grid unlink /users/drake/uva-idp-group

grid unlink /users/drake

grid rm -f /containers/poe/Services/X509AuthnPortType/drake

G.1.6.             Removing a Group

It is much more serious to remove a group than a simple user, because groups can be used and linked in numerous places.  This is especially true for resources, which administrators often prefer to control access to using groups rather than users.  But in the eventuality that a group must be removed, here are the steps:

Unlink any users from the group:

grid unlink /users/{userName}/{groupName}

Omitting the removal of a group link from a user's directory may render the user unable to log in if the group is destroyed.

# Clear up any access control lists that the group was involved in.
grid chmod {/path/to/resource} 0 /groups/{groupName}

# Remove the group identity from the /groups folder.
grid unlink /groups/{groupName}

# Destroy the group identity itself.
grid rm -f {containerPath}/Services/X509AuthnPortType/{groupName}

Example of Steps in Action:

# repeat per occurrence of the group under every user...
grid unlink /users/drake/uva-idp-group
grid unlink /users/joe/uva-idp-group

# repeat per occurrence of the group in object ACLs...
grid chmod /queues/poe-queue 0 /groups/uva-idp-group

# finally, remove the group.

grid unlink /groups/uva-idp-group

grid rm -f /containers/poe/Services/X509AuthnPortType/uva-idp-group

G.1.7.             Changing a User's Password

It is often necessary to change a user's password after one has already been assigned.  For the XSEDE logins using Kerberos and MyProxy, this cannot be done on the Genesis II side; the user needs to make a request to the XSEDE administrators (for more information, see http://xsede.org).  But for standard Genesis II grid IDP accounts, the password can be changed using the following steps:

First remove the existing password token using the grid client, started with:

grid client-ui

Navigate to the appropriate user in the /users folder, and remove all entries that are marked as (Username-Token) in the security permissions.

(Alternatively, this command would normally work for the same purpose, but currently there is a bug that prevents it from removing the existing username&password token:

grid chmod {/users/drake} 0 --username={drake} –password={oldPassword}

This may be fixed in a future revision, and would work for scripting a password change.)

After removing the previous username token, add a new token for the user that has the new password:

grid chmod {/users/drake} +x --username={drake} –password={newPassword}

Once this is done, the new login can be tested to ensure it works:

grid logout --all
grid login --username={drake} –password={newPassword}

G.1.8.             Using a Kerberos STS

The grid administrator can create an STS based on Kerberos that will allow users to use the Kerberos identity as their grid identity.  This requires an existing Kerberos server and an identity on that server.  To create an STS for the grid that uses the server, do the following:

# Create the Kerberos STS.
grid idp –kerbRealm={KERB-REALM.COM} –kerbKdc={kdc.kerb-realm.com} {containerPath}/Services/KerbAuthnPortType {kerberosUserName}

# User can then log in like so...
grid login {containerPath}/Services/KerbAuthnPortType/{kerberosUserName}

The first step created an STS object in the GFFS under the specified Kerberos service and user name.  This path can then be relied upon for logins as shown.  Linking to a /users/kerberosUserName folder (as is done for IDP logins) may also be desired.  See the next section for a more complete example of how an XSEDE login is created using both Kerberos and MyProxy.

Example of Steps in Action:

grid idp --kerbRealm=FEISTYMEOW.ORG --kerbKdc=serene.feistymeow.org \
  /containers/khandroma/Services/KerbAuthnPortType drake

grid login /containers/khandroma/Services/KerbAuthnPortType/drake \
  --username=drake

G.1.9.             Creating XSEDE Compatible Users

This procedure is used to create user identities that are suitable for use with the XSEDE grid.  Users of this type must log in using the “xsedeLogin” command.  It is necessary for the user's account to be enabled on both the XSEDE kerberos server (which requires an XSEDE allocation) and the XSEDE myproxy server.

To create an XSEDE compatible user as an administrator, follow these steps (if there is no administrator for Kerberos Users yet, see the end of this section):

# Login as an existing grid user with appropriate permissions to create new users.
grid login --username={adminUser}
or
grid xsedeLogin --username={adminUser}

# Create new xsede user STS:
grid idp --kerbRealm=TERAGRID.ORG --kerbKdc=kerberos.teragrid.org \
 {containerPath}/Services/KerbAuthnPortType {portalID}

# Link user into grid.
grid ln {containerPath}/Services/KerbAuthnPortType/{portalID} \
 /users/xsede.org/{portalID}

# Take away self-administrative rights for user.
grid chmod /users/xsede.org/{portalId} 5 /users/xsede.org/{portalId}

# Create user's home directory.
grid mkdir /home/xsede.org/{portalID}

# Give user all rights on home directory.
grid chmod /home/xsede.org/{portalID} +rwx /users/xsede.org/{portalID}

Example of the steps in action:

grid idp --kerbRealm=TERAGRID.ORG --kerbKdc=kerberos.teragrid.org \
  /resources/xsede.org/containers/poe/Services/KerbAuthnPortType drake

grid ln /resources/xsede.org/containers/poe/Services/KerbAuthnPortType/drake \
  /users/xsede.org/drake
grid chmod /users/xsede.org/drake 5 /users/xsede.org/drake

grid mkdir /home/xsede.org/drake

grid chmod /home/xsede.org/drake +rwx /users/xsede.org/drake

Once the user identity has been created using the above process, the user can be added to groups or given access rights on resources exactly like other grid users.

These steps have been encapsulated for the XSEDE grid in a script in the toolkit:

grid script local:$GFFS_TOOLKIT_ROOT/tools/xsede_admin/create-xsede-user.xml \
  {portalUserName}

The process for removing an XSEDE compatible user is identical to the process for removing a standard grid user (Section G.1.5), except for the last step.  For an XSEDE compatible user, the last step is:

# delete the user’s identity.
grid rm -f {containerPath}/Services/KerbAuthnPortType/{portalID}

G.1.9.1.                       Creating an administrator for XSEDE users

The following is only applicable for grid administrators.  To enable an existing user (referred to as {newadmin} below) to create new XSEDE users, follow these steps:

# Login as the super-user keystore login for the grid.
grid keystoreLogin local:admin.pfx --password={thePassword}

# Give the user permissions to create Kerberos accounts:
grid chmod /groups/xsede.org/gffs-admins +rx /users/xsede.org/{newadmin}
grid ln /groups/xsede.org/gffs-admins /users/xsedeorg/{newadmin}/gffs-admins
grid chmod /groups/xsede.org/gffs-amie +rx /users/xsede.org/{newadmin}
grid ln /groups/xsede.org/gffs-amie /users/xsedeorg/{newadmin}/gffs-amie

After these steps, the user is capable of Kerberos user administration.

G.1.10.        Configuring Kerberos Authorization on a Container

By itself, authenticating to a Kerberos KDC for a user is not enough to ensure that the user is properly vetted.  Kerberos authorization to a service principal is also needed for the STS to fully authenticate and authorize the user against the Kerberos realm.

This needs to be configured on each container that participates as a Kerberos STS.  For a small grid, this may only be the root container of the RNS namespace, or a complex grid may have several STS containers.  Each of these STS containers must have a separate service principal created for it, and the container must use the keytab corresponding to that service principal.  Both the service principal and the keytab file must be provided by the realm’s Kerberos administrator.

Once the keytab and service principal have been acquired, the container owner can set up the container to use them by editing the “security.properties” file found in the deployment’s “configuration” folder.  This assumes the container is using the “Split Configuration Model” provided by the interactive installer (see below for configuring RPM installs with the “Unified Configuration Model”).  The keytab file should be stored in the deployment’s “security” folder, rather than the “configuration” folder.   The security.properties file has internal documentation to assist configuring, but this section will go over the important details.

When using an RPM or DEB install package in the “Unified Configuration Model”, the file “$GENII_USER_DIR/installation.properties” should be edited instead of the deployment’s “security.properties”.  The storage folder for keytab files based on an RPM or DEB install is “$GENII_USER_DIR/certs” instead of the deployment’s security folder.

When requesting a service principal from the Kerberos realm’s admistrator, it is recommended to use the following form:

gffs-sts/STS_HOST_NAME@KERBEROS_REALM_NAME

This naming convention makes it clear that the service in question is “gffs-sts”, or the GFFS Server Trust Store.  It includes the hostname of the STS container as well as the Kerberos realm in which the service principal is valid.  An example of a “real” service principal is below:

gffs-sts/KHANDROMA.CS.VIRGINIA.EDU@TERAGRID.ORG

This is the service principal that a testing machine uses to authenticate to the TERAGRID.ORG realm maintained by XSEDE.

The Kerberos administrator will also provide a keytab file for this service principal.  It is crucial that this keytab file be used on just a single STS host.  This file does not participate in replication of Kerberos STS within the GFFS, and it should not be copied off machine or replicated by other means.

The container’s security.properties file (or installation.properties) records the container’s Kerberos authorization configuration in two lines per STS host.  One specifies the service principal name, and the other the keytab file.  Each entry has the Kerberos realm name appended to the key name, making them unique in case there are actually multiple Kerberos realms being used by the same container.

 The key name “gffs-sts.kerberos.keytab.REALMNAME” is used to specify the keytab file.  The keytab should be located in the “security” folder of the deployment.

The key name “gffs-sts.kerberos.principal.REALMNAME” is used to specify the service principal name for the realm.

Here is another real-world example for the “khandroma” service principal (lines have been split for readability, but these should each be one line in the configuration and should not contain spaces before or after the equals sign):

gffs-sts.kerberos.keytab.TERAGRID.ORG=\
  KHANDROMA.CS.VIRGINIA.EDU@TERAGRID.ORG.gffs-sts.keytab
gffs-sts.kerberos.principal.TERAGRID.ORG=\
  gffs-sts/KHANDROMA.CS.VIRGINIA.EDU@TERAGRID.ORG

The name of the keytab file is provided without any path; the file will automatically be sought in the same deployment’s “security” folder (or state directory “certs” folder for RPM/DEB install).

Testing the Kerberos configuration can be done by creating an XSEDE compatible user (see prior section) and attempting to log in as that user.  This requires a portal account with XSEDE and an XSEDE allocation.  Warning: it may not be appropriate to test the Kerberos authentication yet if setting up an XSEDE-style grid; testing should not be done until after the STS migration process has occurred.

It may be interesting to note that even after divulging all this critical security information about the khandroma container in the discussion above, no breach of security has been accomplished.  This is true because the keytab for this service principal has not been provided, and one will not be able to successfully authenticate to this service principal without it.

If a keytab is accidentally divulged, that is not a total calamity, but it is important to immediately stop that container from authorizing the Kerberos realm affected by the exposed keytab file and to request a new keytab from the Kerberos administrator.  Once the new keytab is deployed in the container, normal authorization can resume.  After the Kerberos administrator has generated the new keytab, the older one will no longer authorize properly and so the security risk has been mitigated.

G.1.11.        Setting Up an InCommon STS

As described in the section “Logging in with InCommon” (Section E.2.2.6), the iclogin tool allows a user to log in using credentials for an InCommon IDP. In order to accommodate this tool, a link must be established between the InCommon identity and another existing grid identity which has access permissions on the intended resources. The target identity may be any of the existing STS types (Kerberos, X509, etc).

The first step is to determine the InCommon identity's Certificate Subject, as follows:

Navigate a browser to https://cilogon.org, and log on with the InCommon credentials. For example, the user might select the ProtectNetwork identity provider, and then click the “Log On” button. This will redirect the browser to that IDP's login page. The user will then provide appropriate credentials for that IDP to login. The browser will then redirect to the CILogon page. At the top of the page is listed the Certificate Subject for the current user. For example, for an example user “inCommonTester” this string might be:

/DC=org/DC=cilogon/C=US/O=ProtectNetwork/CN=IC Tester A1234

This information may also be retrieved from an instance of the user's certificate, if the administrator has been provided with a copy for this purpose.

Next, the administrator should obtain a copy of the CILogon.org “Basic” CA certificate from https://cilogon.org/cilogon-basic.pem. From the command line, run the following command:

wget https://cilogon.org/cilogon-basic.pem .

This will place a copy of the certificate in the current local directory.

Assuming the administrator is currently logged in to his own grid credentials, the next step is to add execute permissions to the target credentials for the CILogon certificate. Using the example certificate subject above, and an example XSEDE STS at “/users/xsede.org/xsedeTester”, the administrator would run the following command:

grid chmod /users/xsede.org/xsedeTester +x local:cilogon-basic.pem \
  --pattern="DC=org,DC=cilogon,C=US,O=ProtectNetwork,CN=IC Tester A1234"

Note how the “pattern” string is the Certificate Subject returned by cilogon.org, with the leading slash removed, and all other slashes replaced with commas.

Finally, place a link to the target grid credentials in the InCommon user directory. Using the example credentials, the administrator would run the following command from the command line:

grid ln /users/xsede.org/xsedeTester /users/incommon.org/inCommonTester

The user may now authenticate using the iclogin tool and their InCommon IDP's credentials.

Note that, at this time, the STS link must be in the “/users/incommon.org” directory, and must be named with the InCommon IDP username used to log in. The iclogin tool assumes this location when looking up the grid credentials once the IDP authentication is complete. A more robust solution for linking InCommon identities with grid credentials is in development.

G.2.        Container Management

The grid containers are important assets that the grid administrator must ensure continue to operate, even in the face of hardware failures.  Thus it is important to have backups for the container's run-time state, especially for those containers that hold critical assets for the GFFS.  Thus it is especially important that the root container is backed up, because there really is no grid without it.  The following sections discuss how to stop a container, how to back it up and restore it, and how to start the container running again.  The backup procedure should be done regularly on all critical containers.

G.2.1.             How to Stop a Grid Container

The grid container process does not have a shutdown command as such, but it responds to the control-C (break) signal and stops operation.  There are many different methods that would work to cause the container to shut down.  The easiest case is for when the Genesis II Installation Program was used to install the container, but for source-based installs we also document how to use the Linux command-line tools to shut the container down and how to use a script in the GFFS Toolkit to stop the container.

G.2.1.1.                       Stopping a Container When Using the Interactive or RPM/DEB Installer

If the container was installed via the interactive installation program or from Linux packages, then stopping it is quite simple:

$GENII_INSTALL_DIR/GFFSContainer stop

G.2.1.2.                       Stopping a Container Using Linux Commands

If the container was installed from source, it can be stopped with this procedure.  This queries the process list in the operating system and sends a break signal to the Genesis II Java processes:

# List the Java processes that are running:
ps wuax | grep -i java | grep -i $USER

# Among those processes, find ones that mention ‘genii-container-application.properties’

# For each of those process numbers, run the command:
kill {processNumber}

# Afterwards, there should be no processes listed when checking again:
ps wuax | grep -i java | grep -i $USER

# There should be no output from the command if the processes are all gone.

G.2.1.3.                       Stopping a Container Using the GFFS Toolkit

The GFFS Toolkit makes terminating the Genesis II processes a bit easier than the above shell commands:

bash $GFFS_TOOLKIT_ROOT/library/zap_genesis_javas.sh

The above finds the running processes similarly to the manual steps (in the previous section) and stops the processes.  To ensure they are gone, one can run this command:

bash $GFFS_TOOLKIT_ROOT/library/list_genesis_javas.sh

If that command has no output, then no Genesis II processes are left.

G.2.2.             How to Start a Grid Container

The method for starting a grid container depends on what type of container is installed.  If the container is installed from source, then the commands for starting it are:

cd $GENII_INSTALL_DIR

bash runContainer.sh &>/dev/null &

If instead, the container was installed using the Genesis II installation program or an RPM package, then starting the container is done using this step:

$GENII_INSTALL_DIR/GFFSContainer start

If the container is already running and needs to be restarted, then execute this command:

$GENII_INSTALL_DIR/GFFSContainer restart

 

G.2.3.             How to Backup a Genesis II Grid Container

Archiving the data from the root GFFS container can take hours, or even days, depending on the amount of data stored on the root.  This may need to be taken into account for scheduling the container down time.

G.2.3.1.                       Automated Container Backup

The backup process has been automated in a script available in the GFFS Toolkit (documented in section I), located in $GFFS_TOOLKIT_ROOT/library/backup_container_state.sh.  The container should manually be stopped before running the script, and manually restarted afterwards.  For example:

source ~/GenesisII/set_gffs_vars    # replace ~/GenesisII with install path.
$GENII_INSTALL_DIR/GFFSContainer stop
bash $GFFS_TOOLKIT_ROOT/library/backup_container_state.sh
$GENII_INSTALL_DIR/GFFSContainer start

G.2.3.2.                       Manual Container Backup

The procedure below describes how to save a snapshot of a Genesis II container's run-time state.  This includes all of its databases, which in turns contain the RNS folders and ByteIO files that live on the container.  These steps should work with any container.

When backing up the root GFFS container's data, note that this can be a truly huge amount of data.  If users tend to rely on storing their data files in their home folder, and that folder is located on the root GFFS container, then the administrator is backing up all of those data files when the root container is backed up.  This is one reason it is recommended to share the load for home folders by storing them across other containers (see the section on “Where Do My Files Really Live” for more details).

To backup a container, use the following steps.  Note that it is expected that GENII_USER_DIR is already set to the right location for this container:

First, stop the container as described in the previous section.

Then back up the container:

# Zip up the container's state directory.
zip -r $HOME/container_bkup
_$(date +%Y-%m-%d-%H%M).zip $GENII_USER_DIR

# or use tar instead...
tar -czf $HOME/container_bkup
_$(date +%Y-%m-%d-%H%M).tar.gz $GENII_USER_DIR

Start up the container again, as described in section G.2.2.

G.2.4.             How to Restore a Genesis II Grid Container

G.2.4.1.                       Automated Container Restore

The restore process above has been automated in a script available in the GFFS Toolkit (documented in section I), located in $GFFS_TOOLKIT_ROOT/library/restore_container_state.sh.  The restore script relies on the backup having been produced by the corresponding backup_container_state.sh script.  The container should manually be stopped before running the restore script, and manually restarted afterwards.

There are two restoration scenarios that may be encountered; either the container data has been trashed, or the installation itself has been trashed.  This first situation, where only the container data needs to be restored, is taken care of by the basic restoration process:

source ~/GenesisII/set_gffs_vars    # replace ~/GenesisII with install path.
$GENII_INSTALL_DIR/GFFSContainer stop
bash $GFFS_TOOLKIT_ROOT/library/restore_container_state.sh \
  $HOME/gffs_state_backup....tar.gz
$GENII_INSTALL_DIR/GFFSContainer start

If the installation itself has been damaged, then additional steps may be needed.  Note that this should only ever be a concern for an interactive installation with the “Split Configuration” model; for the RPM installation or Unified Configuration installation, the above process is sufficient.  But the split configuration approach stores some configuration data for the container in the installation directory, and more steps are needed to completely restore both the damaged data and configuration.

Before the “split configuration” restore is attempted, a healthy installation of the appropriate version of Genesis II GFFS should be installed.  This installation does not need to be configured identically to the container being restored, as the configuration information will be put back into place in the next steps.  Once the installation is available, these steps should perform a full repair of the configuration:

source ~/GenesisII/set_gffs_vars    # replace ~/GenesisII with install path.
$GENII_INSTALL_DIR/GFFSContainer stop
bash $GFFS_TOOLKIT_ROOT/library/restore_container_state.sh \
  $HOME/gffs_state_backup....tar.gz
cp $GENII_USER_DIR/breadcrumbs/container.properties $GENII_INSTALL_DIR/lib
pushd $GENII_INSTALL_DIR/deployments
mv current_grid current_grid.old
cp -R $GENII_USER_DIR/breadcrumbs/deployments/current_grid .
popd
$GENII_INSTALL_DIR/GFFSContainer start

For deployments other than XCG or XSEDE, the actual deployment name of “current_grid” may differ.  The real deployment name will be visible in the deployments folder of the install.

G.2.4.2.                       Manual Container Restore

When the grid container has been backed up and saved at an external location, the grid administrators are protected from catastrophic hardware failures and can restore the grid to the state of the last backup.  This section assumes that the administrator is in possession of such a backup.

First, stop the container as described in the section “How to Safely Stop a Grid Container”.

# Make a temporary folder for storing the state.
mkdir $HOME/temporary
cd $HOME/temporary

# Clean up any existing run-time state and recreate the state directory.
rm -rf $GENII_USER_DIR
mkdir $GENII_USER_DIR

# Extract the container's state directory from the archive.
unzip $HOME/container_backup_{backupDate}.zip
# or...
tar -xf $HOME/container_backup_{backupDate}.tar.gz

# Move the files into the appropriate place.
mv {relative/path/to/userDir}/* $GENII_USER_DIR

# Clean up the temporary files.
cd
rm -rf $HOME/temporary

Start up the container again, as described in section G.2.2.

G.2.5.             Replication of GFFS Assets

Replication in the GFFS can be used for fault tolerance and disaster recovery.  For example, replication can be used to create a fail-over system, where the loss of services of a crucial container does not necessarily mean that the grid is down.  Replication can also be used to create a backup system that automatically copies assets that are modified on one container onto a container at a different physical location, ensuring that even the total destruction of the first container's host does not lead to data loss.

This section describes how to set up replicated files and directories, and how to create the resolvers that are used to locate replicated assets.

G.2.5.1.                      Replicating a New Directory Hierarchy

USE CASE: The user is creating a new project. The project starts with an empty home directory, such as /home/project. The project’s home directory should be replicated.

In this case, run these commands:

mkdir /home/project

resolver -p /home/project /containers/backup

replicate -p /home/project /containers/backup

The “resolver” command defines a “policy” that whenever a new file or subdirectory is created under /home/project, that new resource will be registered with a resolver in /containers/backup.

The “replicate” command defines a “policy” that whenever a new file or subdirectory is created under /home/project, that new resource will be replicated in /containers/backup.

That’s it. Whenever a file or directory is created, modified, or deleted in the directory tree in the first container, that change will be propagated to the backup container. Whenever a security ACL is modified in the first container, that change will be propagated too. If the first container dies, then clients will silently fail-over to the second container. If resources are modified on the second container, then those changes will be propagated back to the first container when possible.

G.2.5.2.                       Replicating an Existing Directory Hierarchy

USE CASE: The project already exists. There are files and directories in /home/project. These resources should be replicated, as well as any new resources that are created in the directory tree.

In this case, simply add the -r option to the resolver command:

resolver -r -p /home/project /containers/backup

replicate -p /home/project /containers/backup

The “resolver” command registers all existing resources with a resolver, and it defines the policy for new resources. The “replicate” command replicates all existing resources, and it defines the policy for new resources.

G.2.5.3.                       Choose Specific Resources to Replicate

USE CASE: The user wants to replicate a handful of specific resources.  No new replication policies (or auto-replication of new resources) are desired.

In this case, omit the -p option:

resolver /home/project/filename /containers/backup

replicate /home/project/filename /containers/backup

This case is only useful for certain unusual setups involving hard links or other rarities.

In general, if fail-over is enabled for some file, then it should also be enable for the file’s parent directory. In other words, the directions for replicating an existing directory hierarchy should be used.

G.2.5.4.                       Create a Named Resolver

USE CASE: The user wants to create a resolver for replicated files and directories.  Or the user wants to give other users access to a resolver, so that those users can create new replicas that can be used for failover.

In this case, create a resolver resource using the create-resource command:

create-resource /containers/primary/Services/GeniiResolverPortType /etc/myResolver

# Now, the resolver can be replicated.
replicate /etc/myResolver /containers/backup

# And the resolver’s ACLs can be modified.
chmod /etc/myResolver +rwx /users/sal

# To use a named resolver, specify the resolver (by pathname)
# rather than the container on the resolver command line.
resolver -p /home/project /etc/myResolver

G.2.5.5.                       Replicating Top-Level Folders for a Grid

USE CASE: The user is configuring a new grid using Genesis II and would like the top-level folders to be replicated, including the root folder (/) and the next few levels below (/home, /users, /groups, etc.).

Adding a replicated root makes the most important top-level folders available through the resolver.  Should the root GFFS container be unavailable, each of the items in the replicated folders is still available from the mirror container.  Currently only grid administrators may add replication in the RNS namespace for the XSEDE grid.

Prerequisites for Top-Level Folder Replication

·         These steps should be performed on a separate client installation, not on a container, to isolate the new context properly.

·         On the separate client install, remove the folder pointed at by $GENII_USER_DIR, which will start the process with a completely clean slate.  This is shown in the steps below.

·         This section assumes that the root container has already been deployed, and that a mirror container (aka root replica) has been installed, is running, but is not yet configured.

·         The user executing the following commands requires administrator permissions via an admin.pfx keystore login.  Note that if the admin certificates for the root and replica containers are distinct, then one should login with the keystore file for both the root and the replica container.  Only the applicable keystore logins for the containers involved should be performed; do not login as an XSEDE user or other grid user first.  For example:

grid logout --all
grid keystoreLogin local:$HOME/key1.pfx
# repeat for as many keys as needed.

Steps for Replicating Top-Level Grid Folders

This example sets up replication on the top-level grid folders within the XSEDE namespace.  Note that this example uses the official host names for XSEDE hosts (e.g. gffs-2.xsede.org) and the default port (18443).  These may need to vary based on your actual setup:

# GENII_INSTALL_DIR and GENII_USER_DIR are already established.
# This action is being taken on an isolated client install that points at the new grid;
# do not run this on the root or root replica container!

# Clean out the state directory before hand.
rm -rf $GENII_USER_DIR

# Login as the administrative keystore; repeat for all applicable keys.
grid keystoreLogin local:$HOME/admin.pfx

# Run the replication script; replace hostname and port as appropriate for replica host.
bash $GFFS_TOOLKIT_ROOT/tools/xsede_admin/top_level_replication_setup.sh \
  gffs-2.xsede.org 18443

# If no errors occurred, the new replication-aware context file is stored in:
# $HOME/replicated-context.xml

Note: allow the containers 5-10 minutes to finish replicating before shutting any of them down.

After replication has finished (and all containers seem to be in a quiescent state), it is important to backup both the root and the mirror container data (see section G.2.3 for backup instructions).

The replicated-context.xml file created by the above steps needs to be made available to grid users within an installation package.  It is especially important to use this installation package for all future container installs.  Submit the file to UVa Developers (xcghelp@cs.virginia.edu) for binding into an updated version of the grid’s installer program.  Installations of the GFFS that are created using the new installer will automatically see the replicated version of the root.

Testing Basic Replication

It is important to validate the grid’s replication behavior before attempting to use any replicated resources.  The new replication configuration can be tested with the following steps:

·         On the host of the root container, stop the root container process:

bash $GFFS_TOOLKIT_ROOT/library/zap_genesis_javas.sh

·         On a different host than the root, use the grid client and log out of all identities (the remainder of the steps also use this client host):

grid logout --all

·         List the root folder in RNS (/).  If this does not show the top-level replicated folders, then something is wrong with the replication configuration:

grid ls /

·         If the above test succeeds, try a few other publically visible folders:

grid ls /users
grid ls /groups

Neither of the above commands should report any errors.

If the commands above work as described, then basic RNS replication is working properly.  This is assured by having shut down the root container; the only source of RNS records that is still active in the grid is the mirror container.

G.2.5.6.                       Replicating User (STS) Entries

The replicated STS feature is used similarly to the replicated RNS & ByteIO feature. Suppose Joe represents a Kerberos or X509-Certificate STS resource created for a user Joe. In addition, assume that Joe has access to a group Group1 (so Group1 should be a sub-directory under Joe in the global namespace).  Suppose Joe and Group1 reside in arbitrary containers and we want to replicate them. Then the sequence of steps for replication should be as follows.

1.       Associate a resolver with the two resources.  This step is not required if the users folder already has a resolver established:

grid resolver -p -r /users/xsede.org/Joe \
  /resources/xsede.org/containers/{Container-Hosting-Resolver}

2.       Replicate the STS resource.  This can be done more than once to create multiple replicas:

grid replicate -p /users/xsede.org/Joe \
  /resources/xsede.org/containers/{Container-Hosting-Replica}

If we want only Joe to be replicated but not Group1 then we drop the -r flag (indicating “recursion”) from the resolver command.

Note that the container hosting the resolver and the container hosting replicas are different in the above example, but they do not have to be different containers. However, neither of them should be the primary container where Joe or Group1 are stored, as that defeats the goal of replication.

To replicate the entire /users/xsede.org users hierarchy, use similar steps:

1.       Associate a resolver with the users hierarchy.  Skip this step if a resolver already exists on the users hierarchy:

grid resolver -p -r /users/xsede.org \
  /resources/xsede.org/containers/{Container-Hosting-Resolver}

2.       Replicate the STS resource:

grid replicate -p /users/xsede.org \
  /resources/xsede.org/containers/{Container-Hosting-Replica}

G.2.5.7.                       Serving User and Group Identities from a Replicated STS Container

Initially all user and group identities will be stored on the Root container.  The authentication processing for the grid can be migrated to a different container, possibly one reserved for managing STS resources such as users.  The following sections describe how to accomplish the move in the context of the XSEDE namespace, where there are two STS containers (a primary and a secondary which replicates the primary).

Prerequisites for Migrating to an STS Container

·         Ensure the root container’s top levels are already replicated (see section G.2.5.5 if that has not already been done).

·         The steps for migrating the STS should be executed on an administrative client host that has been installed with the replication-aware installer (produced in section G.2.5.5).

·         The primary and secondary STS containers must be installed before executing the migration process (section D.4 or D.5).

·         The user executing the following commands requires administrator permissions via an admin.pfx keystore login.  One should login with the admin keystore for the root, the root replica, and the STS containers.  Only the applicable keystore logins for the containers involved should be performed; do not login as an XSEDE user or other grid user first.

Steps for Migrating to a New Primary STS Container

This section brings up both STS containers and configures them before any replication is added:

·         Run the script below with the host name and port number for the two STS servers.  In this example, we use the official hostnames.  Test systems should use the appropriate actual hostnames instead.

# variables GENII_INSTALL_DIR, GENII_USER_DIR and GFFS_TOOLKIT_ROOT have
# already been established.
# This action is being taken on an isolated client install that points at the new grid.

# Run the STS migration script.  The hostnames below are based on the official
# XSEDE hosts and should be modified for a test system.
bash $GFFS_TOOLKIT_ROOT/tools/xsede_admin/migrate_to_sts_containers.sh \
  sts-1.xsede.org 18443 sts-2.xsede.org 18443

# If no errors occurred, then the two STS containers are now handling any new
# user account creations and will authenticate users of these new accounts.

Add an XSEDE User and Test Replication

The GFFS Toolkit provides scripts for adding users to the namespace and for adding the users to groups.  Add an XSEDE MyProxy/Kerberos for testing as follows (this step requires being logged in as an account that can manage other XSEDE accounts, such as by using the administrative keystores for the grid):

# create the user.
grid script \
  local:$GFFS_TOOLKIT_ROOT/tools/xsede_admin/create-xsede-user.xml \
  {myPortalID};
# link the user into gffs-users.
grid script \
   local:$GFFS_TOOLKIT_ROOT/library/link-user-to-group.xml \
  /users/xsede.org  {myPortalID}  /groups/xsede.org  gffs-users

The user specified in {myPortalID} needs to be an XSEDE user ID that can be logged into by the tester.  This account must be able to log in to the MyProxy and Kerberos servers at XSEDE.

Before testing replication below, ensure that the user account is working for login:

# relinquish administrative login:
grid logout --all
# try to login as the portal user:
grid xsedeLogin --username={myPortalID}
# show the result of logging in:
grid whoami

The “whoami” command should print out the actual XSEDE user name that was configured above, and should also show group membership in the “gffs-users” group.

Allow the containers a couple of minutes to finish replicating before shutting any of them down.  Once the user and groups have been replicated, the soundness of the replication configuration can be tested with the following steps:

·         On the host of the root container, stop the root container process:

bash $GFFS_TOOLKIT_ROOT/library/zap_genesis_javas.sh

·         Run the above command on the primary STS container also, to stop that container.

·         On an administrative client install with replication enabled (i.e., not on the primary containers), use the grid client and log out of all identities, then log in to the new user again:

grid logout --all
grid xsedeLogin --username={myPortalID}
grid whoami

If the login attempt above works and whoami still shows the details of the appropriate user and groups, then the STS replication configuration is almost certainly correct.  Having shut down the primary STS container, the only source of authentication active in the grid is the secondary STS container.  Similarly, all RNS requests must be served by the mirror container, since the root RNS container is down.

There are four cases that completely test the failover scenarios.  The above test is listed in the table as test 1.  To ensure that replication has been configured correctly, it is advisable to test the remaining three cases also:

1.       Root down & sts-1 down

2.       Mirror down & sts-1 down

3.       Root down & sts-2 down

4.       Mirror down & sts-1 down

If these steps are successful, then the new primary and secondary STS containers are now responsible for authentication and authorization services for the grid.  Any new STS identities will be created on the STS container rather than on the root container.  Even if the root container is unavailable, users will still be able to log in to their grid accounts as long as one root replica and one STS container is still available.  (Any other required login services, such as MyProxy or Kerberos, must also still be available.)

G.3.        RNS & ByteIO Caching

The Genesis II software supports client-side caching of GFFS resources to reduce the impact on the containers that actually store the files.  By enabling this feature on a container, an administrator can allow users to cache files on their own hosts rather than always accessing the files on the container.  This actually benefits both the administrator and the user, because the administrator will see fewer remote procedure calls requesting data from their containers and the users will see faster access times for the resources they are using frequently.

To enable the subscription based caching feature, it is necessary to add a permission on the port-type that implements the cache service:

grid chmod \
  {containerPath}/Services/EnhancedNotificationBrokerFactoryPortType \
  +x --everyone

After the port type is enabled, a grid command’s most frequently used files will automatically be cached in memory.  If the container's version of the file changes, the client is notified via its subscription to the cached data item, and the cached copy is automatically updated.

G.4.        Grid Accounting

The Genesis II codebase offers some accounting features to track usage of the grid queue and the number of processed jobs.  This section describes how to set up the accounting features and how to create a web site for viewing the accounting data.

This information is intended for local grid administrators (such as the XCG3 admins) and is not currently in use by the XSEDE project.

G.4.1.             Accounting Prerequisites

There are several pieces to collecting and processing job accounting data for the XCG.

G.4.2.             Background

Raw job accounting data is kept on the home container of the BES that runs the job.  It will stay there forever, unless someone signals that the container can delete accounting information up to a specified accounting record ID (we call this “committing” the records).  We use a grid command line tool named “accounting” to collect accounting records, put the collected data into a database (currently this is the department’s MySQL database), and to commit all records collected on the container so that the container can delete them from its local database.  In order to support our on-demand online accounting graphs, the raw accounting data collected from the containers must be processed into a format that the graphing pages can use. 

So, overall, the collection process has 2 parts:

  1. collect raw data from containers and store in raw accounting database tables.
  2. process the data to update tables supporting accounting graphs.

G.4.3.             Accounting Database

The raw accounting data is placed into 5 related tables by the accounting collection tool.  In order to make the data easier to process for usage graphs, a stored procedure named procPopulateXCGJobInfo crunches all of the data in the 5 raw accounting data tables and stores them in 2 derived “tmp” tables for use by the accounting graph php pages on the web site.

The vcgr database currently contains 15 tables, only about half related to the new way of doing accounting.

Accounting related tables set by accounting collection tool

xcgaccoutingrecords                                Each row holds the raw accounting information for a single XCG job. 

xcgbescontainers                       Each row holds information about a BES container that has had accounting data collected for it.  The key for the BES record in this table is used to match accounting records in the xcgaccoutingrecords table to the BES container it ran on.  Records in this table are matched during the accounting record collection process based on the BES’s EPI, so re-rolling a BES will cause a new BES entry to appear in this table.  The besmachinename field in this table is populated by the machine’s IP address when it is first created and this is used by our accounting graphs as the name of the BES.  However, the besmachinename field can be manually updated to put in a more human friendly name and to tie together records from two BES instances that have served the same purpose.  This is something I do periodically to make the usage graphs more readable and informative.

xcgcredentials                              Contains a list of every individual credential used by any job.  Multiple jobs using the same credential will share an entry in this table.  Since the relationship between jobs (xcgaccountingrecords) and credentials (xcgcredentials) is many-to-many, the xcgareccredmap table provides the relationship mapping between them.  The credentialtype field is set to NULL for new entries, but can be set to values “Client”, “Service”, “User” or “Group” manually.  I occasionally manually edit this field to set the proper designation for new entries. 

xcgareccredmap                          Associative table between xcgaccountingrecords and xcgcredentials.

xcgcommandlines                      Contains each portion of a job’s commandline – one entry for argument (including the executable argument).  The accounting tool did not work properly at first and only recorded the last argument for jobs.  This was fixed sometime after initial rollout of the new accounting infrastructure.

G.4.4.             Denormalized accounting data for usage graphs

In order to easily support reasonably fast creation of a wide range of usage graphs, we use a stored procedure to create two tables to store pre-processed denormalized information about jobs.

tmpXCGJobInfo            This table contains 1 row per job with pretty much everything in it that we can run a report against.  This includes our best guess the job’s “owner” in human friendly terms, the bes container’s name, various run time information, and information about the day, week, month, and year of execution.

tmpXCGUsers                                This table is an intermediate table used by the stored procedure that creates the tmpXCGJobInfo table.  It really can be deleted as soon as the stored procedure finishes – not sure why it isn’t…

G.4.5.             The denormalization process

The denormalization process deletes and re-creates the tmpXCGJobInfo and tmpXCGUsers tables from data in the raw accounting tables.  Denormalization is done by running the procPopulateXCGJobInfo stored procedure. 

N.B. Besides denormalizing the data so that there is single row per job, it also tries to figure out which user “owns” the job and stores it’s best guess in the username field of the tmpXCGJobInfo table.   This field is used by the usage graphs to display/filter who ran a job.

The algorithm for doing so is imperfect, but must be understood to properly understand the usage graph behavior.  A job’s owner is determined as the credential with the lowest cid (credential id) associated with the job that is designated as a “User” (it may also require that the credential has the string X509AuthnPortType in it).  It then assumes that the credential is in the format of those we mint for ordinary XCg users and extracts the owner’s name from the CN field of the credential.

This process will only work if these conditions are met:

·         The job has at least one credential minted as a normal user by XCG in the usual format.

·         The credential has been manually marked as a “User” credential in the xcgcredentials table.

Users who use a different type of credential (e.g. username/password or other outside credential), were run only by admin (different cred format)  or have not had their entry in the xcgcredentials table manually updated to type “User” will be labeled as owned by “unknown”.

For denormalization, there are two main procedures: 1) collection of raw data and 2) processing the data for the online graphs.

1) The grid tool “accounting” is used to collect raw accounting data and store it in the CS department database.  The tool takes several options/arguments:

·         --collect: tells the tools to do the actual data collection.  NOTE: unless the “--no-commit” flag is specified, the --collect flag will commit all accounting records successfully collected from the target container(s).

·         --no-commit: optional flag to stop tool from committing (i.e. erasing) records from the target container(s).

·         --recursive: allows the user to specify a directory of containers and the tools will recursively go through all entries in directory and collect from each one.

·         --max-threads=<# threads>: allows the collection to be multi-threaded

·         <source container> | <source container directory>: which container (directory) to collect.

·         <target database>: connect information for database that will store collected data.

Typical use (as admin – other users will not have permission to collect or commit accounting data from most/all containers):

accounting --collect --recursive /containers /accounting/CS-MySQL

The tool uses grid containers, not BES containers as targets. Even though the accounting records do identify which actual BES container the job was associated with, the tools collect all of the data for all BESes it contains at once.

1.       We can use the directory /containers recursively because we try to maintain /containers such that it has all of our useful containers in it and no other extraneous entries. This helps simplify the process significantly.

2.       We use the RNS path /accounting/CS-MySQL as the target database.  Mark set up this RNS entry with the proper connection information for the vcgr database on the department server to help the collection process.  The tool can handle extracting the connection info from the EPR contained by the RNS entry.

3.       There will be a prompt for the password for the vcgr_mmm2a account on the CS department database server.

4.       The tool will collect data from each container in turn.  Note exceptions – there are sometimes entries left for containers that are no longer present in the /containers directory.

2) To process the data for the online graphs:

G.4.6.             Linking the Accounting Database Into the Grid

To link a Database into Grid name space, use the following command:

mint-epr --link=/accounting/XCG-DB  \
  jdbc:mysql://mysql.cs.virginia.edu/vcgr?user=vcgr_mmm2a

G.4.7.             Migrating Accounting Info to a New Grid

For linking the XCG2 accounting info to new XCG, all we need to do is create a DB EPR in the new XCG grid name space. This is done using:

mint-epr --link=/accounting/XCG3-CS-MySQL  \
  jdbc:mysql://mysql.cs.virginia.edu/vcgr?user=vcgr_mmm2a

This will create a new EPR which can be used with 'accounting' tool like this:

accounting --collect --recursive /<grid-path>/<containers-directory> \
  /accounting/XCG3-CS-MySQL

This will use the same database tables in the existing database, and all the information will be preserved from the previous grid.

G.4.8.             Usage Graph Web Site

The statistics gathered from the grid can be displayed in a web site that supports queries based on operating system and time ranges.  An example implementation is provided within the GFFS toolkit, which is bundled with the GFFS client installer and which is also available at the svn repository:

svn://svn.xcg.virginia.edu:9002/GENREPO/GenesisII/trunk/toolkit/tools/usage_graphs

These files are an example only, and would need to be configured appropriately for the site's Ploticus installation location, the usage graph site's location on the web server, and the login information for the statistics database.

This implementation uses PHP and the ploticus application ( http://ploticus.sourceforge.net/doc/ welcome.html) to build graphs per a user request.  The figure below shows the running site, with a form for requesting accounting info using the available filters.  Given a query with a given date range and a daily report, the output might resemble the Usage Graph in the next figure.

A description...

Figure 46. User request form

A description...

Figure 47. Example of daily usage graph

G.4.9.             Database Table Structure for Accounting

G.4.9.1.                       USERS Table

Field

Type

Null

Key

Default

Extra

id

bigint(20)

NO

PRI

NULL

auto_increment

name

varchar(128)

NO

 

NULL

 

title

varchar(128)

YES

 

NULL

 

department

varchar(128)

YES

 

NULL

 

organization

varchar(128)

YES

 

NULL

 

country

varchar(64)

NO

 

NULL

 

email

varchar(128)

NO

 

NULL

 

username

varchar(64)

NO

 

NULL

 

password

varchar(128)

NO

 

NULL

 

idppath

varchar(256)

NO

 

NULL

 

homedir

varchar(256)

NO

 

NULL

 

G.4.9.2.                       XCGCREDENTIALS Table

Field

Type

Null

Key

Default

Extra

cid

bigint(20)

NO

PRI

NULL

auto_increment

credential

blob

NO

 

NULL

 

credentialtype

varchar(16)

YES

 

NULL

 

credentialdesc

varchar(512)

NO

 

NULL

 

credentialhash

int(11)

NO

 

NULL

 

Timeadded

timestamp

NO

 

CURRENT_TIMESTAMP

 

G.4.9.3.                       GENIIJOBLOG Table

Field

Type

Null

Key

Default

Extra

eventid

bigint(20)

NO

PRI

NULL

auto_increment

eventtype

varchar(64)

NO

 

NULL

 

identities

varchar(1024)

NO

 

NULL

 

eventtime

datetime

NO

MUL

NULL

 

epi

varchar(256)

YES

MUL

NULL

 

jobid

varchar(256)

YES

 

NULL

 

processtime

bigint(20)

YES

 

NULL

 

hostname

varchar(256)

NO

MUL

NULL

 

G.4.9.4.                       GROUPS Table

Field

Type

Null

Key

Default

Extra

id

bigint(20)

NO

PRI

NULL

auto_increment

name

varchar(128)

NO

 

NULL

 

description

varchar(512)

YES

 

NULL

 

path

varchar(256)

NO

 

NULL

 

G.4.9.5.                       MEMBERSHIP Table

Field

Type

Null

Key

Default

Extra

id

bigint(20)

NO

PRI

NULL

auto_increment

uid

bigint(20)

NO

MUL

NULL

 

gid

bigint(20)

NO

 

NULL

 

G.4.9.6.                       tmpXCGJobInfo Table

Field

Type

Null

Key

Default

Extra

arid

bigint(20)

NO

PRI

NULL

 

besaccountingrecordid

bigint(20)

NO

 

NULL

 

besid

bigint(20)

NO

 

NULL

 

exitcode

int(11)

NO

 

NULL

 

jobuserhrs

decimal(23,4)

NO

 

NULL

 

jobkernelhrs

decimal(23,4)

NO

 

NULL

 

jobwallclockhrs

decimal(23,4)

NO

 

NULL

 

maxrssbytes

bigint(20)

YES

 

NULL

 

recordtimestamp

timestamp

NO

MUL

CURRENT_TIMESTAMP

on update CURRENT_TIMESTAMP

jobyear

smallint(6)

NO

 

NULL

 

jobmonth

smallint(6)

NO

 

NULL

 

jobmonthname

char(10)

NO

 

NULL

 

jobday

smallint(6)

NO

 

NULL

 

jobdayofyear

smallint(6)

NO

 

NULL

 

jobmondaydate

date

NO

MUL

NULL

 

besmachinename

varchar(256)

YES

MUL

NULL

 

arch

varchar(64)

YES

 

NULL

 

os

varchar(64)

YES

 

NULL

 

ownercid

bigint(20)

YES

 

NULL

 

username

varchar(256)

YES

MUL

NULL

 

linuxuserhrs

decimal(23,4)

NO

 

NULL

 

linuxkernelhrs

decimal(23,4)

NO

 

NULL

 

linuxwallclockhrs

decimal(23,4)

NO

 

NULL

 

windowsuserhrs

decimal(23,4)

NO

 

NULL

 

windowskernelhrs

decimal(23,4)

NO

 

NULL

 

windowswallclockhrs

decimal(23,4)

NO

 

NULL

 

macosuserhrs

decimal(23,4)

NO

 

NULL

 

macoskernelhrs

decimal(23,4)

NO

 

NULL

 

macoswallclockhrs

decimal(23,4)

NO

 

NULL

 

G.4.9.7.                       tmpXCGUsers Table

Field

Type

Null

Key

Default

Extra

cid

bigint(20)

NO

PRI

NULL

 

username

varchar(256)

NO

 

NULL

 

G.4.9.8.                       XCGACCOUNTINGRECORDS Table

Field

Type

Null

Key

Default

Extra

arid

bigint(20)

NO

PRI

NULL

auto_increment

besaccountingrecordid

bigint(20)

NO

MUL

NULL

 

besid

bigint(20)

NO

MUL

NULL

 

exitcode

int(11)

NO

 

NULL

 

usertimemicrosecs

bigint(20)

NO

 

NULL

 

kerneltimemicrosecs

bigint(20)

NO

 

NULL

 

wallclocktimemicrosecs

bigint(20)

NO

 

NULL

 

maxrssbytes

bigint(20)

YES

 

NULL

 

recordtimestamp

timestamp

YES

 

NULL

 

G.4.9.9.                       XCGARECCREDMAP Table

Field

Type

Null

Key

Default

Extra

mid

bigint(20)

NO

PRI

NULL

auto_increment

cid

bigint(20)

NO

MUL

NULL

 

arid

bigint(20)

NO

 

NULL

 

G.4.9.10.                   XCGBESCONTAINERS Table

Field

Type

Null

Key

Default

Extra

besid

bigint(20)

NO

PRI

NULL

auto_increment

besepi

varchar(256)

NO

UNI

NULL

 

besmachinename

varchar(256)

YES

 

NULL

 

arch

varchar(64)

YES

 

NULL

 

os

varchar(64)

YES

 

NULL

 

timeadded

timestamp

NO

 

CURRENT_TIMESTAMP

 

G.4.9.11.                   XCGCOMMANDLINES Table

Field

Type

Null

Key

Default

Extra

clid

bigint(20)

NO

PRI

NULL

auto_increment

arid

bigint(20)

NO

 

NULL

 

elementindex

int(11)

NO

 

NULL

 

element

varchar(512)

NO

 

NULL

 

 


G.4.10.        Creating the Accounting Database

This SQL source code will create the temporary job tables after the user has run the accounting tool.  The database must already exist and be structured as above.

ProcPopulateXCGJobInfo routine_definition

BEGIN

DROP TABLE IF EXISTS tmpXCGUsers;
DROP TABLE IF EXISTS tmpXCGJobOwnerIds;
DROP TABLE IF EXISTS tmpXCGJobOwners;
DROP TABLE IF EXISTS tmpXCGJobInfo;

CREATE TABLE `tmpXCGUsers` (
 `cid` bigint(20) NOT NULL,
 `username` varchar(256) NOT NULL,
 PRIMARY KEY (`cid`)
) DEFAULT CHARSET=latin1;

DELETE FROM tmpXCGUsers;

INSERT INTO tmpXCGUsers
(
 `cid`,
 `username`
)
SELECT
 `xcgcredentials`.`cid`,
 substring_index(substr(`xcgcredentials`.`credentialdesc`,42),',',1)
FROM
 `xcgcredentials`
WHERE
 ((`xcgcredentials`.`credentialtype` = 'User') AND
  (`xcgcredentials`.`credentialdesc` like '%X509AuthnPortType%'));

CREATE TABLE `tmpXCGJobOwnerIds` (
 `arid` bigint(20) NOT NULL,
 `ownercid` bigint(20) NOT NULL,
 PRIMARY KEY (`arid`)
) DEFAULT CHARSET=latin1;

DELETE FROM tmpXCGJobOwnerIds;

CREATE INDEX `tmpXCGJobOwnerIds_ownercid_Idx`
ON `tmpXCGJobOwnerIds` (`ownercid`);

INSERT INTO tmpXCGJobOwnerIds
(
 `arid`,
 `ownercid`
)
SELECT
 `ar`.`arid`,
 min(`cred`.`cid`)
FROM
 `xcgaccountingrecords` `ar`,
 `xcgareccredmap` `map`
  LEFT JOIN `xcgcredentials` `cred` ON (((`map`.`cid` = `cred`.`cid`) AND
   (`cred`.`credentialtype` = 'User') AND
   (not((`cred`.`credentialdesc` like '%Admin%')))))
WHERE (`ar`.`arid` = `map`.`arid`)
GROUP BY
 `ar`.`arid`;

CREATE TABLE `tmpXCGJobOwners` (
 `arid` bigint(20) NOT NULL,
 `ownercid` bigint(20) NOT NULL,
 `username` varchar(256) NULL DEFAULT NULL,
 PRIMARY KEY (`arid`)
) DEFAULT CHARSET=latin1;

DELETE FROM tmpXCGJobOwners;

CREATE INDEX `tmpXCGJobOwners_ownercid_Idx`
ON `tmpXCGJobOwners` (`ownercid`);

INSERT INTO tmpXCGJobOwners
(
 `arid`,
 `ownercid`,
 `username`
)
SELECT
 `jo`.`arid`,
 `jo`.`ownercid`,
 `users`.`username`
FROM
 `tmpXCGJobOwnerIds` `jo`,
 `tmpXCGUsers` `users`
WHERE
 (`jo`.`ownercid` = `users`.`cid`);

DROP TABLE IF EXISTS tmpXCGJobOwnerIds;

DROP TABLE IF EXISTS tmpXCGJobInfo;

CREATE TABLE `tmpXCGJobInfo` (
 `arid` bigint(20) NOT NULL,
 `besaccountingrecordid` bigint(20) NOT NULL,
 `besid` bigint(20) NOT NULL,
 `exitcode` int(11) NOT NULL,
 `jobuserhrs` decimal(23,4) NOT NULL,
 `jobkernelhrs` decimal(23,4) NOT NULL,
 `jobwallclockhrs` decimal(23,4) NOT NULL,
 `maxrssbytes` bigint(20) DEFAULT NULL,
 `recordtimestamp` timestamp NOT NULL,
 `jobyear` smallint NOT NULL,
 `jobmonth` smallint NOT NULL,
 `jobmonthname` char(10) NOT NULL,
 `jobday` smallint NOT NULL,
 `jobdayofyear` smallint NOT NULL,
 `jobmondaydate` date NOT NULL,
 `besmachinename` varchar(256) NULL DEFAULT NULL,
 `arch` varchar(64) NULL DEFAULT NULL,
 `os` varchar(64) NULL DEFAULT NULL,
 `ownercid` bigint(20) NULL DEFAULT NULL,
 `username` varchar(256) NULL DEFAULT NULL,
 `linuxuserhrs` decimal(23,4) NOT NULL,
 `linuxkernelhrs` decimal(23,4) NOT NULL,
 `linuxwallclockhrs` decimal(23,4) NOT NULL,
 `windowsuserhrs` decimal(23,4) NOT NULL,
 `windowskernelhrs` decimal(23,4) NOT NULL,
 `windowswallclockhrs` decimal(23,4) NOT NULL,
 `macosuserhrs` decimal(23,4) NOT NULL,
 `macoskernelhrs` decimal(23,4) NOT NULL,
 `macoswallclockhrs` decimal(23,4) NOT NULL,
 PRIMARY KEY (`arid`)
) DEFAULT CHARSET=latin1;

CREATE INDEX `tmpXCGJobInfo_besmachinename_Idx`
ON `tmpXCGJobInfo` (`besmachinename`);

CREATE INDEX `tmpXCGJobInfo_username_Idx`
ON `tmpXCGJobInfo` (`username`);

CREATE INDEX `tmpXCGJobInfo_recordtimestamp_Idx`
ON `tmpXCGJobInfo` (`recordtimestamp`);

CREATE INDEX `tmpXCGJobInfo_jobmondaydate_Idx`
ON `tmpXCGJobInfo` (`jobmondaydate`);

INSERT INTO tmpXCGJobInfo
(
 `arid`,
 `besaccountingrecordid`,
 `besid`,
 `exitcode`,
 `jobuserhrs`,
 `jobkernelhrs`,
 `jobwallclockhrs`,
 `maxrssbytes`,
 `recordtimestamp`,
 `jobyear`,
 `jobmonth`,
 `jobmonthname`,
 `jobday`,
 `jobdayofyear`,
 `jobmondaydate`,
 `besmachinename`,
 `arch`,
 `os`,
 `ownercid`,
 `username`,
 `linuxuserhrs`,
 `linuxkernelhrs`,
 `linuxwallclockhrs`,
 `windowsuserhrs`,
 `windowskernelhrs`,
 `windowswallclockhrs`,
 `macosuserhrs`,
 `macoskernelhrs`,
 `macoswallclockhrs`
)
SELECT
 `ar`.`arid`,
 `ar`.`besaccountingrecordid`,
 `ar`.`besid`,
 `ar`.`exitcode`,
 (`ar`.`usertimemicrosecs` / 3600000000),
 (`ar`.`kerneltimemicrosecs` / 3600000000),
 (`ar`.`wallclocktimemicrosecs` / 3600000000),
 `ar`.`maxrssbytes`,
 `ar`.`recordtimestamp`,
 year(`ar`.`recordtimestamp`),
 month(`ar`.`recordtimestamp`),
 monthname(`ar`.`recordtimestamp`),
 dayofmonth(`ar`.`recordtimestamp`),
 dayofyear(`ar`.`recordtimestamp`),
 cast((`ar`.`recordtimestamp` - interval weekday(`ar`.`recordtimestamp`) day) as date),
 `bes`.`besmachinename`,
 `bes`.`arch`,
 `bes`.`os`,
 `owners`.`ownercid`,
 `owners`.`username`,
 IF(bes.os = 'LINUX', (`ar`.`usertimemicrosecs` / 3600000000), 0),
 IF(bes.os = 'LINUX', (`ar`.`kerneltimemicrosecs` / 3600000000), 0),
 IF(bes.os = 'LINUX', (`ar`.`wallclocktimemicrosecs` / 3600000000), 0),
 IF(bes.os = 'Windows_XP', (`ar`.`usertimemicrosecs` / 3600000000), 0),
 IF(bes.os = 'Windows_XP', (`ar`.`kerneltimemicrosecs` / 3600000000), 0),
 IF(bes.os = 'Windows_XP', (`ar`.`wallclocktimemicrosecs` / 3600000000), 0),
 IF(bes.os = 'MACOS', (`ar`.`usertimemicrosecs` / 3600000000), 0),
 IF(bes.os = 'MACOS', (`ar`.`kerneltimemicrosecs` / 3600000000), 0),
 IF(bes.os = 'MACOS', (`ar`.`wallclocktimemicrosecs` / 3600000000), 0)
FROM
   ((`xcgaccountingrecords` `ar`
    LEFT JOIN `xcgbescontainers` `bes` ON((`bes`.`besid` = `ar`.`besid`)))
    LEFT JOIN `tmpXCGJobOwners` `owners` on((`owners`.`arid` = `ar`.`arid`)));

DROP TABLE IF EXISTS tmpXCGJobOwners;
END

G.5.        Grid Inter-Operation

The Genesis II software fully supports grid federation, where resources can be shared between multiple grids.  This enables researchers to connect to a low-latency grid that is geographically convenient while still sharing data and BES resources with researchers on other grids.  The XSEDE namespace provides a convenient method to achieve “grid isomorphism”, where the locations of other grids’ resources can be found at the identical location in RNS regardless of which grid one is connected to.

For example, the XSEDE Operations Grid is a Genesis II GFFS grid that is maintained by the XSEDE project.  The Cross-Campus Grid (XCG) is also a Genesis II GFFS grid, but it is maintained by the University of Virginia.  Despite these grids being within very different administrative domains, the users on XCG grid can log into their accounts and access their home directories on the XSEDE grid.  This is accomplished by linking parts of the XSEDE grid into the XCG namespace structure.

The interconnections from XCG to XSEDE were created by the XCG administrator. Each “foreign” grid is given a well-defined location in the /mount directory where the remote grid is linked.  For the XSEDE grid, the top-level (/) of the grid has been linked into /mount/xsede.org.  Listing the contents of that folder shows the root of XSEDE’s grid; note that this command is executed on an XCG grid client, not an XSEDE grid client:

# grid ls /mount/xsede.org
xsede.org:
bin
doc
etc
groups
home
mount
resources
users

This is the same list of folders one sees if one is connected to the XSEDE grid and lists the top-level of RNS, but in this case, it is visible via the link in the XCG grid.

To gain fully isomorphic grid folders, one makes links for each of the major items in the foreign grid under the appropriate folders in one’s own grid.  For example, XCG has a folder for its local users called /users/xcg.virginia.edu, but it also has a folder called /users/xsede.org for the remote users that live in the XSEDE grid.  From the XSEDE grid’s perspective, it would have a link for /users/xcg.virginia.edu that connects to the XCG grid.  Using the foreign path, one can authenticate against the XSEDE grid’s STS for a user even though one is connected to the XCG grid.  This provides for fine-grained access control across the multiple grids, and ensures that the user can acquire whatever credentials are needed to use the remote grid’s resources.

Similarly, the home folders of the XSEDE grid are available in the XCG grid, as /home/xsede.org.  This allows a person who is connected to the XCG to access their remote files and directories that reside in their XSEDE grid home directory.  Using this capability, researchers can submit jobs that stage files in and out from any grid that they have access to, and can share their data with other researchers on any of these interconnected grids.

G.5.1.             Connecting a Foreign Grid

Making a non-local grid available on one’s home grid is achieved by the following steps.  In the actions below, we are using the concrete example of linking the XSEDE grid into the XCG grid, but any two grids could be linked in this manner.  To achieve grid isomorphism, it is important to pick an appropriate name for the foreign grid and for that name to be used consistently across all federated grids.  Otherwise, a path on grid X may be named differently on grid Y, which will lead to many problems with creating grid jobs that run seamlessly on either of the two grids (since expected stage-out paths may not be there without consistent naming).

1.       Acquire the root EPR of the foreign grid.  This can be accomplished when actually connected to the foreign grid, using its installer or grid deployment.  The step below is assumed to be running from the XSEDE grid client:

grid ls -ed / | tail -n +2 | sed -e 's/^\s//' >xsede_context.epr

2.       The above creates a context file that can be used to link the foreign grid.  This step must be performed using the grid client for one’s local grid (which is about to be augmented with a link to the foreign grid):

grid ln --epr-file=local:xsede_context.epr /mount/xsede.org

3.       Test the new link by listing its contents.  It should show the top-level folders of the foreign grid:

grid ls /mount/xsede.org

4.       If the prior step is unsuccessful, then it is possible the local grid does not trust the remote grid.  To establish trust between the grids, the CA certificate of the remote grid’s TLS certificate(s) should be added to the local grid’s trust store.  Below, it is assumed that “current_grid” is the specific deployment in use in the local grid and that “remoteCA.cer” is a CA certificate that issued the remote grid’s TLS certificates.  Adding more than one certificate is fine, and the certificates can either be in DER or PEM format:

cp $HOME/remoteCA.cer \
  $GENII_INSTALL_DIR/deployments/current_grid/security/trusted-certificates

5.       After the remote grid can be listed successfully at its location in mount, the remote hierarchies can be added to the local grid.  These should continue the naming convention established for the mount, so that isomorphism is maintained between the grids:

grid ln /mount/xsede.org/users  /users/xsede.org
grid ln /mount/xsede.org/home  /home/xsede.org
grid ln /mount/xsede.org/resources  /resources/xsede.org
grid ln /mount/xsede.org/groups /groups/xsede.org

6.       Listing each of the new folders should show the appropriate type of resources.  With a successful top-level mount, this step should always succeed.

This procedure can be repeated as needed to federate other grids alongside one’s own grid.  A grid structured isomorphically is a joy for researchers to use, since all paths are precisely arranged and named in a way that makes their true home clear.  Jobs can be executed on any BES or queue that the researcher has access to, and the staging output can be delivered on any of the connected grids that the researcher desires.  In addition, one’s colleagues on other grids can provide access to their research data in a regular and easy to understand structure, even when the grids are in completely different countries and administrative domains.


H.    XSEDE Development with Genesis II

This section focuses on building the XSEDE GFFS components from the Genesis II source code.  Support for basic EMS (via the fork/exec BES) is included in the Genesis II source; building the UNICORE EMS is not addressed here.  This section may be found useful by hard-core users who want to run their container from source and by developers who want to fix bugs or add features to the Genesis II software.

Note that at this time, development of the Genesis II software is only supported in Java.  The Genesis II components can be controlled by a variety of methods (grid client, Xscript, client-ui), but the Genesis II software is extended by writing new Java classes or modifying existing ones.

H.1.        Installing Java

The configuration of Java can be quite confusing to a neophyte, but going very deeply into that process is beyond the scope of this document.  This section does explain the basics of setting up Java for building and running Genesis II.  It is expected that a Genesis II developer has prior training in Java, but normal users of Genesis II should not need Java proficiency.

Genesis II currently requires Oracle Java 8 for building and running the codebase.  Java 6 and 7 (aka 1.6 and 1.7) are no longer supported for Genesis II builds.  The latest versions of Java can be downloaded at Oracle’s Java Downloads: http://www.oracle.com/technetwork/java/javase/downloads/index.html

The JRE can be difficult to install on Centos, and this guide has been helpful in previous installations: http://wiki.centos.org/HowTos/JavaRuntimeEnvironment  If you intend to recompile the Genesis II code base, then download a JDK (Java Development Kit) and not just the JRE (Java Runtime Engine).  There is an installation guide at Oracle for the Java 8 JDK available from here: https://docs.oracle.com/javase/8/docs/technotes/guides/install/install_overview.html

Some of the Genesis II scripts rely on the JAVA_HOME variable being set.  This should point to the top directory for the JDK being used to build Genesis II.  For example, if the Java JDK is installed at /usr/lib/jvm/java-8-oracle, then the JAVA_HOME variable could be set with:

export JAVA_HOME=/usr/lib/jvm/java-8-oracle

To determine what version of java is in the path, run:

java -version

If that does not show the appropriate version, the PATH variable may need to be modified.  For example:

export PATH=$JAVA_HOME/bin:$PATH

Enabling Strong Encryption in the JRE

Building and running GFFS containers requires that full-strength JCE security is enabled for the Java Runtime Environment (JRE).  Otherwise the JRE will not allow the certificates to be generated with the necessary key length.  The unlimited strength JCE jars are available at: http://www.oracle.com/technetwork/java/javase/downloads/index.html

As an example, after downloading the unlimited JCE zip file for Oracle Java 8 on Linux, the security jars might be updated with these steps:

unzip UnlimitedJCEPolicyJDK8.zip

sudo cp UnlimitedJCEPolicy/*jar /usr/lib/jvm/java-8-oracle/jre/lib/security

The location of the JVM can vary widely on different platforms, but these jar files generally belong in a subdirectory ending in ‘jre/lib/security’.

H.1.1.             Centos Build Dependencies

On Centos 6.4 (or later) Linux, the following packages may be needed to build Genesis II.

·         Ant and additional ant packages were installed with:

sudo yum install ant ant-contrib ant-nodeps ant-jsch ant-trax ant-junit

H.1.2.             Ubuntu Build Dependencies

On Ubuntu Linux, the following packages may be needed to build Genesis II.

·         Ant was installed with:

sudo apt-get install ant

H.2.        Getting the Genesis II Source Code

Genesis II source is checked into a subversion revision control system. To retrieve the latest version, use the following:

Check out the Genesis II source code:

svn co svn://svn.xcg.virginia.edu:9002/GENREPO/GenesisII/trunk

Base the GENII_INSTALL_DIR on the new check-out:

export GENII_INSTALL_DIR=$HOME/trunk
# location depends on the check-out that occurred in the first step.

H.3.        Building Genesis II from Source on the Command Line

Ensure that the GENII_INSTALL_DIR has been set to point at the source code location, that Java is installed with unlimited JCE encryption, and that JAVA_HOME is set (see section H.1).

To perform the main build of the Genesis II trunk, change to the source code location and run “ant -Dbuild.targetArch=32 build” (for 32 bit platforms) or “ant -Dbuild.targetArch=64 build” (for 64 bit platforms).  If neither targetArch flag is provided, then a 64 bit build is assumed.

The follow example rebuilds the source code for 64 bit platforms:

cd $GENII_INSTALL_DIR
# Set the ANT options to increase the memory limits:
export ANT_OPTS='-Xms512m -Xmx768m -XX:MaxPermSize=768m'
# clean generated files from any prior builds.
ant clean
# perform the build.
ant -Dbuild.targetArch=64 build

The ANT_OPTS above are required because the web-services build requires more memory than the default amount allocated by ant.

It is important to rebuild the source code on the target machine, rather than using a build from someone else, to ensure that any embedded script paths are regenerated properly.

After building the source code, one needs a grid to test against.  If you have an existing grid and the necessary deployment information, then that is sufficient.  But if you want to test on an isolated grid that is under your control, consult the GFFS Toolkit chapter I.2 on how to “How to Bootstrap a Miniature Test Grid” for details on setting up a local grid for testing.

H.4.        Developing Genesis II in Eclipse

Eclipse is an integrated development environment for Java and other languages, and many developers prefer to manage their coding process with Eclipse.  These instructions should assist an Eclipse developer to become comfortable with building and debugging the Genesis II codebase.

H.4.1.             Getting Eclipse

Download the newest version of the Eclipse IDE for Java Developers from http://www.eclipse.org/.  The eclipse projects for Genesis II currently rely on features found in the “Mars” version of Eclipse.  As well as providing an excellent software development environment, Eclipse can be used for debugging with dynamic code injection, for call-hierarchy searches, and for Java code auto-formatting.

There is a plugin called Subclipse which integrates Eclipse with SVN with GUI features for diff’ing workspaces, files, etc.

The Genesis II team has had success using a Java profiler called “YourKit Profiler” which can be integrated with Eclipse.

When first running Eclipse, the user will be asked to select a workspace. Do not specify a path that contains spaces (this just generally makes life easier, although it may not be strictly necessary).

H.4.2.             Getting Subclipse

Note that use of subclipse is deprecated.  Using basic console “svn” commands to manage the checked out source code is sufficient for most purposes.

Subclipse is a useful add-in for Eclipse that provides subversion repository support.  To obtain Subclipse, go to http://subclipse.tigris.org/. Click on "Download and Install". Follow the instructions to install the Subclipse plugin in Eclipse. The best version of the SVN client to use with our SVN server is version 1.6.x.

If Eclipse fails to install Subclipse, then the user may need to install the "Mylyn" plugin. The Mylyn update site is http://download.eclipse.org/mylyn/releases/latest. With most versions of Eclipse, there is no need to worry about this.

If Eclipse complains about not finding JavaHL on Linux, then it may be that /usr/lib/jni needs to be added to the Java build path in Eclipse.  This article has more information about this issue: Failing to load JavaHL on Ubuntu

H.4.3.             Eclipse Package Explorer

For easier browsing of the Genesis II source code, setup the Package Explorer view with the following options:

·         Right click on the package explorer menu button (down arrow) and under Package Presentation select Hierarchical.

·         Right click on the package explorer menu button (down arrow) and select Filters. Add the "Libraries from external" filter.

·         Add additional tags for “to-do” items to eclipse’s list.  This causes all of the to-do / fix-it notes in the GFFS code to show up under the “Tasks” tab.  Open the Eclipse settings using the “Window | Preferences” menu item.  Within the settings, navigate to the “Java | Compiler | Task Tags” setting.  Add the following tags to the list along with their priority:

future:     Low
hmmm:       High

H.4.3.1.                       Projects to Load in Eclipse

There is a main trunk project for Genesis II called GenesisII-trunk.  Once you have downloaded the Genesis II project source code, you can load this using Eclipse’s “Import Existing Projects into Workspace” choice.  Browse to the folder where the trunk resides in the “Select root directory” field.  Enable the option “Search for nested projects”.  Disable the option to “Copy projects into workspace”.  Select “Finish” to complete the project import.  This should now show several projects in the package explorer.

Loading the project will cause Eclipse to build its representation of the Java classes.  This will fail if an ant build has not been done before (see above section for building from the command line).  Once an ant build has been done, select the “Project | Clean” menu item and clean all projects; this will cause Eclipse to rebuild the classes.

H.4.3.2.                       Setting the “derived” Type on Specific Folders

Eclipse will search for classes in any directory that is listed in a project.  This sometimes is irksome, as it will find matches for X.class as well as X.java, but X.class is a compiled class output file and is not useful for reading or setting breakpoints.  Luckily eclipse also provides an approach for forcing it to ignore file hierarchies.  Eclipse ignores any folder that has the “derived” flag set on it.  This can be applied to a directory by right-clicking on it and selecting “Properties”.  The resulting pop-up window will show a Resource tab for the folder, with a check-box for a derived attribute.

Note that each developer must set his own “derived” attributes on folders, since these attributes are not stored in the project file (they live in the user’s workbench).

It is recommended that any project’s generated file folders be marked as derived, which includes the following folders:

bin.ant (in every project)
codegen (in gffs-webservices)
genned-obj (in gffs-webservices)
libraries (in GenesisII-trunk)

The “libraries” folder is not generated, but its contents are already provided by other projects.

After marking all of the folders that contain generated .class files as derived, future searches for classes in eclipse should only match .java files.  This can be tested with the “open reference” command (Ctrl-Shift-r).

H.4.4.             Ant Builds

To build Genesis II, we use Ant. The two Ant targets that are most often used: build and clean.

The “ant build” target performs the following activities:

1.       Creates directories for generated sources

2.       Normalizes our extended form of WSDL (GWSDL) into proper service WSDL

3.       Runs Axis WSDL2Java stub generation on our service WSDL to create the Java stub classes used by client tools and the data-structure classes for representing operation parameters within both client and server code.

4.       Copies the generated .wsdd files into the "deployments/default/services" directory, so that the Axis web application can find the Java classes that implement the port types

5.    &nbs