The UVa Cross Campus Grid-XCG

The UVa Cross Campus Grid (XCG) is a computing and data sharing platform created and maintained jointly by researchers in the UVa Department of Computer Science and the UVa Alliance for Computational Science & Engineering (UVACSE). The XCG manages a large collection of across a number of departments at UVa as well as a number of institutions outside of UVa.

Mission


The overall mission of the XCG is to improve the infrastructure available to UVa researchers who need access to significant computational resources and to simultaneously improve the ability of researchers to share data with collaborators.

  • Manage compute and storage resources efficiently and effectively -- to deliver more computational power to researchers at less cost.
  • Improve the effectiveness of researchers -- the XCG provides researchers with access to more computational power than otherwise available to them. XCG also provides researchers with software tools and infrastructure to manage data sharing with collaborators and to manage running large numbers of computational tasks.
  • Further the state of the art of grid computing -- by acting as a platform for grid research and standardization efforts.

Who Can Use XCG?


The XCG is available to all UVA researchers and has been used by faculty, students, and research staff across a variety of disciplines, including Economics, Biology, Systems Engineering, Physics, Mechanical Engineering, Materials Science, and many others.

Getting Started


To get started on the XCG, you will need an XCG account - which is not the same as your ITC or other machine account. To receive an account, you can request one via the XCG Help mailing list: Requesting an XCG Account (using the XCG Account Request form is temporarily out of service). To use the XCG you will need to install the GenesisII software on the machine or machines from where you will access the grid. GenesisII installers can be downloaded from the GenesisII downloads page. Follow the instructions on that page and within the installer. Most users will only need to install the GenesisII client package which simply contains a command line tool and several GUI programs to access the XCG. For users that wish to export data from their local machine into the XCG, they will need to install the full GenesisII package.

Once you have an account and have the appropriate GenesisII software installed, you are ready to access the XCG. The GenesisII tutorials page provides details on a variety of topics including:

  • logging in
  • getting started
  • monitoring jobs

Getting Help


  • To request help with software problems or grid questions, send email to the XCG Help Request Mailing List.
  • To receive updates about the XCG, including when the XCG is scheduled for maintenance, you can subscribe to the XCG-users mailing list here: XCG-Users list.
  • To get personalized help using the XCG, send email to the UVA Computational Science and Engineering (UVACSE)

group at uvacse@virginia.edu to set up a meeting. UVACSE staff can help you determine if the XCG is the right solution for you and can help you plan how it can best meet your needs.

  • UVACSE also provides a service to consult on computational science projects for researchers at the university through a process called a tiger team. If necessary you can request creating a Tiger Team to provide training and assistance to get your project up and going on the Grid.

See the UVACSE Home Page for more information on UVACSE in general and the Consultation Page to request starting a tiger team.

What Can I do with the XCG?


There are basically two things you can do with the XCG: you can use it to access Grid compute resources, or to export and access data.

Need speed?

Suppose you have an application that you need to run for your research, and that you need to execute the application many times; for example many executions with different parameters or input files, or simply a large number of times to establish a statistical property. In this case, you have a high-throughput problem. If each execution takes many minutes or hours, and you have hundreds or thousands of jobs, this could take days or weeks to run on your desktop. The XCG compute queues can be used to solve this sort of problem. Descriptions of the jobs to be executed are submitted to the XCG, and the jobs are distributed throughout the XCG and executed concurrently. Data management and movement in/out of your local compute environment is managed by the XCG. The result is you get your results tens to hundreds of times faster.

Similarly, if you already have an MPI application you can use the XCG to select a cluster for your job and run it.

Need to share data?

Suppose you are working on a project with a colleague in another department or at another institution, and having direct access to each others files would accelerate your research. For example, you have a colleague with an instrument that writes its data directly onto a hard disk in their lab, and you would like to be able to read those files as if they were local to your machine. You may perhaps post-process those results and store them locally, and your colleagues may in turn want to be able to read, modify, or view the results. This is a data sharing problem.

To solve this sort of problem using the XCG, the person who wants to share their data “exports” the data into the XCG, and the person(s) who want to access that data mount the Grid into their local Linux or Windows file system and then access the data as if it were in a local file system. Data access is fully secure using the latest Web Services security standards.

What does it mean to “share” or “export” data? The basic idea is really simple. Suppose you have a directory (also called a “folder” in Windows) on your hard drive that you want to share with a friend or colleague. You can use the Genesis II export Grid Shell command. Invoked without any command-line options, the export command starts up a simple GUI (which requires X-windows in Linux environments). The GUI allows you to specify a local directory path that you want to export, and the directory path where you want to place it in the XCG name space. The process is similar to mapping/mounting a drive, except that you are mounting a portion of your local file system into the Grid namespace.

Once exported, the directory and its contents can be manipulated both via local users and by XCG users. Updates made by local users are visible by remote users, and vice versa. Access by remote users is governed by access-control lists that you establish. The identity(ies) of the user who performed the export are given initial, exclusive Grid access to the exported resources. After performing an export, you will typically modify the access-control policies for the newly-exported resources to specify which users and groups can read, write, and execute (i.e., make subdirectories) these exported resources.

How Does XCG Compare to Using ITC Clusters


  • ITC manages compute cluster for UVa users
    • Linux
    • PBS for job management
  • XCG uses part of ITS compute cluster for jobs
  • XCG incorporates a large number of additional compute resources outside of ITC compute clusters
  • UVa users can use either or both systems

A Quick Comparison of ITS and XCG Clusters

 ITS CLUSTERXCG
AccountsRequires ITS accountRequires XCG account
Jobs SubmissionSubmit to PBS from cluster front end machineSubmit to grid queue from GenesisII client
ResourcesEntire Linux ITS clusterPart of ITS Linux cluster plus CS clusters, SURA Cluster,
SEAS cluster, FutureGrid clusters, etc.
Application DeploymentLogin to ITC cluster, Copy & configure application in
user home directory or other shared directory,
Application available to machines that share directory.
Create a package that can be copied to a destination
machine and create necessary scripts to unpack/configure it.
Specify copying package and/or script during job submission:
XCG will copy to target machine.
Data StagingData must be copied to and from shared directory on cluster
or program must manually stage data. User must collect output data
from cluster unless program manually copies out.
Specify data to copy into and out of job:
XCG will copy specified files to and from target machine.
Job MonitoringLog into cluster and use PBS to check status of job.Use XCG client software on any XCG enabled client machine.

XCG Resources


The XCG draws its computational resources from a number of sources and includes machines from desktops to sizable clusters and spans Windows, MacOS and Linux operating systems.

The grid queue shows the jobs that are currently active. This can be accessed from the client-ui provided by GenesisII or the summary view can be seen here.

Summary


Since the XCG uses resources from a number of different departments and institutions, it must adhere to the policies of the resource owners. This often means restricting the total number of processors that XCG will use on a particular machine or cluster at once - much like how a single user is restricted to how many processors he/she can use at one time. The XCG uses the term "slots" to describe the number of concurrent jobs the the XCG can maximally use at once on a resource. </br> So, to describe the size of the XCG system, two different metrics are important: the total number of processors on resources that the XCG can access and the number of concurrent jobs that XCG restricts itself to on those resources. Please note that resources come and go as new resources are added, current resources undergo maintenance or experience problems, etc.

  • 721 Linux slots across more than 4000 total processors
  • 16 Windows slots processors across 16 Windows processors
    • Note: Until recently, the XCG had access to 200+ Windows processors in UVa public labs. A recent upgrade to Windows 7 forced us to remove these resources from the XCG until we can deploy onto those machines.
  • 200 MacOS/Power PC slots over 1100 total processors
  • 2 MacOS/Intel slots across 2 iMac processors

Within UVa

Windows XP

  • 16 Windows slots processors across 16 Windows processors

Linux

  • ITC: 256 slots across 1576 processors
  • ITC/SURA: 76 slots across 76 processors
  • Department of Computer Science: 113 slots across 212 processors (more to be added when maintenance completed on cluster)
  • School of Engineering & Applied Science: 26 slots across 64 processors
  • Astronomy Department: 50 slots across 384 processors

MacOS/Intel

  • CS Department Desktops: 2 slots across 2 processors

Outside UVa

Linux

  • VT:
    • Hess Cluster: 1 slot across 768 processors
  • Indiana University
    • Cray xTM5: 100 slots across 672 processors
    • India Cluster: 0 slots across 400 processors (still in testing)
  • San Diego Supercomputing Center
    • Sierra Cluster: 100 slots across 280 processors

MacOS/Power PC

  • VT:
    • System X Cluster: 200 slots across 1100 processors

Coming Soon

  • 100+ Linux slots from University of Chicago Hotel Cluster
  • 100+ Linux slots from Texas Advanced Computing Center (TACC) Alamo Cluster

Software


The XCG is implemented using Genesis II developed at the University of Virginia. Genesis II is the first integrated implementation of the standards and profiles coming out of the Open Grid Forum (OGF) Open Grid Service Architecture (OGSA) Working Group. The OGF is the standards organization for Grids. Genesis II is a complete set of Grid services for users and applications which not only follows our maxim “'by default the user should not have to think'”, but is also a from-scratch implementation of the standards and profiles. Genesis II is open source under the Apache license.

History


XCG version 1.0 ran from approximately January 2008 - January 2009. The current XCG version 2.0 has been running continuously since January 2009.

XCG Demonstrations


Clicking on the link will take you to an MP4 movies that demonstrate some of the features of the XCG.