Grid Computing Toolbox Shell Scripts and Batch Files
This section discusses using the Grid modes, hpc and mpi, and the shell scripts supporting them. These scripts are intended to be run from the operating system command interpreters.
Using Mode = HPC
Using Mode = MPI
This script can be used to submit a file containing Maple code directly to a running Grid from an Operating System command line.
Before running the launcher script make sure that Grid Servers are running on the remote machines that are to be used.
The remote servers must all be configured similarly for port number and licenses.
Starting and Stopping the Grid Server
There are two methods to start/stop the Grid Server:
1) interactively calling functions in the Grid Library that is part of the Grid Computing Toolbox
2) using script files at the operating system level.
Interactive Start/Stop of the Grid Server via the Grid Library
One method to start/stop the server is interactively using the Grid library functions StartServer and StopServer. These functions are Maple commands and can be used when interactively running or testing grid code from within a Maple Worksheet.
Shell Script and Batch File to Start/Stop the Grid Server
The script, gridserver.sh (Linux) and batch file, gridserver.bat (Windows) start or stop an instance of the Grid Server on a local computer. These scripts are run from the operating system command interpreter.
The syntax of the gridserver script/batch file is one of
gridserver [options] startmultiple [port [count]]
gridserver [options] start [port [cpu_index]]
gridserver [options] stop [port [cpu_index]]
where one of the following modes can be specified
Causes a single Node to start on the indicated port
Causes count Nodes to be started on a single computer beginning at the indicated port.
Halts the Node associated with port. If Node was from a startmultiple then all Nodes are halted.
If no mode is specified then startmultiple is assumed and the modes take the following optional parameters
The index. Should be set zero to allow AutoDiscovery with broadcast.
The port associated with the Node. If not specified then use port from the -p option or from the configuration file.
The number of Nodes to start. If not specified the use the -n option or the value from the configuration file.
Options may be one or more of:
UDP broadcast address (e.g 192.168.255.255) for AutoDiscovery. Use 255 to indicate the portion of the subnet to use. This example will broadcast to all computers on the 192.168.*.* address.
UDP broadcast port (e.g. 4400). Set to 0 to disable AutoDiscovery.
Enable debug messages to the log file
Path and filename of log file (e.g. logs/grid.log)The path can be relative to Grid Computing Toolbox directory
The full path to commandline Maple binary
Number of Nodes to create
TCP/IP base port for the Nodes
Any option not specified will be taken from the configuration file conf/grid.properties
The startmultiple mode starts the specified count of Grid Servers on the same machine. Count should generally not exceed the number of CPUs on the machine as then several instances of Maple will likely use the same CPU instead of being distributed over multi-CPUs. However one reason to set count higher than the number of CPUs is during initial testing of grid code using a single machine. This might be done when a network of Grid Servers is not available. To perform testing using a single machine the following command could be used at the operating system prompt:
gridserver.sh -a 127.0.0.255 -b 4401 startmultiple 2000 5 (Linux)
gridserver.bat -a 127.0.0.255 -b 4401 startmultiple 2000 5 (Windows)
This would start 5 instances of the Grid Server on the local machine each instance using a port from 2000 to 2004.
Jobs would be submitted using the Launch library function after issuing a Setup using localhost (127.0.0.1) and port 2000 as the parameters.
Example Code Accessing the Running Servers
Example for the Server running on port 2000 on the local computer.
Setup("hpc", host="localhost", port=2000);
A String representing Maple code to execute on the remote nodes is Launched.
code := "printf(\"Hello from node %a\\n\", Grid:-MyNode());";
result := Launch(code, numnodes=2);
printf("Hello from node %a\n", Grid:-Util:-MyNode());
Node 0: Hello from node 0
Node 1: Hello from node 1
The start command would be used where AutoDiscovery is not practical (e.g. Grid Servers are not on a common Internet subnet). In this case each server would be started using an operating system command similar to:
gridserver.sh -b 0 start 2000 (Linux)
gridserver.bat -b 0 start 2000 (Windows)
The same port must be used for each computer where the Grid Server is started as well as on the client machine from which the grid code is to be sent from. Access to the Grid Servers started with the start option must be done from either the Launcher script or PBS script shown below. This is because these scripts will receive a file listing the nodes to use in the grid computations.
The stop parameter would be used to terminate the running Grid Server on the local machine.
gridserver.sh stop 2000 (Linux)
gridserver.bat stop 2000 (Windows)
If the server was started with the startmultiple parameter then all instances of the Grid Server on that machine are stopped. For servers started with the start parameter then each instance would require a matching stop call to halt that instance of the Grid Server.
Submitting Maple Code to the Grid Server via Script or Batch Files
These scripts can be used to submit a file containing Maple code directly to a running "HPC" Grid from an operating system command line interface.
Launcher can be invoked using either of the following 2 forms:
launcher.bat [-p port] maplefile nodefile (Window)
launcher.sh [-p port] maplefile nodefile (Unix/Linux/Mac)
launcher.bat [-p port] -h host -n num maplefile (Windows)
launcher.sh [-p port] -h host -n num maplefile (Unix/Linux/Mac)
Specify the optional -p port option if the port of the master node is different than the default port in the grid.properties file.
launcher.bat [-p port] maplefile nodefile
launcher.sh [-p port] maplefile nodefile
The servers that will be nodes for this configuration should be set up without AutoDiscovery.
You must start servers on all nodes of the grid. Launching does not start the nodes.
Set the broadcast port parameter to zero. This will turn off AutoDiscovery.
All these servers should configured in a similar manner using the same port numbers. These servers can be individually started by logging on to each remote host and running the supplied script
gridserver -b 0 start [port]
where the option -b 0 turns off AutoDiscovery and port is optional. The value for port specifies the TCP/IP port that the nodes communicates on. If the port is omitted then the port is determined from the grid.properties file found in the <maple>/toolbox/Grid/conf directory. All the ports must be the same and must match the port in the properties file, grid.properties, that is on the machine where launcher is run from.
This gridserver command should be run on each individual server.
The job-scheduler now controls which nodes to run a job on.
For this form of the launcher invocation, the options are as follows:
port - port that the node at host is listening on. This will override the grid.port value specified in the grid.properties file
maplefile - specify the path and filename of a file containing Maple code to be run on each of the nodes
nodefile - specify the path to a file containing a list of grid servers. See below.
To create a text file listing the names of the servers that you need to use for this Grid calculation. For example if you have 3 remote servers named grid1.somewhere.com, grid2.somewhere.com and grid3.somewhere.com you would create a text file, say nodes.txt that contained the lines.
Note: the first server listed will be considered to be the master node.
To submit a maple file, simple.mpl, on a Linux system to these nodes for execution you would issue the command
launcher.sh ../samples/Simple.mpl nodes.txt
The Grid launcher script will send the maple code contained in the file example.mpl to the nodes listed in the nodes.txt file. The server listed as the first node, grid1.somewhere.com in this example, will be considered the master node for controlling the computations. Output from the grid computations will be printed to the console's standard output.
The Grid Computing Toolbox (GCT) library code can utilize operating system implementations of the MPI standard (see http://www.mcs.anl.gov/research/projects/mpi/). This document outlines the basics of setting up and using MPI.
The current library supports Microsoft's MPI implementation, msmpi, found on the Windows Server 2008 HPC clusters. Other MPI implementations may work but have not been tested.
The default location of the Grid toolbox server installation is <maple>\toolbox\Grid where <maple> represents where the main Maple program is installed. Additionally, you should create the directory\shared\mpi plus subdirectories. The directory \shared\mpi\bin will contain additional scripts that are used by client workstations submitting Grid code to the cluster. The file maple.ini contains generic settings to enable the Grid MPI procedures.
The Maple command Grid[Setup]("mpi", mpidll="msmpi"); instructs the Grid toolbox to use the "mpi" interface instead of the default local implementation. The second argument, "msmpi" refers to the operating system basename of the MPI implementation dll or shared library. This dll or shared library should be found in one of the directories named in the PATH environment variable.
The \shared\mpi directory on the head node should be shared with all the other compute nodes in the cluster. This is because any Maple GCT code submitted will be stored in the \\headnode\shared\mpi\grid subdirectory. Also any output generated will be stored there.
This directory must allow read/write permission for Grid users as their temporary files will be placed there. User files will be placed in \shared\mpi\grid\<username> where <username> will be the user who submitted the job.
Also the compute nodes should be configured so that PATH environment variable contains the directory for the cmaple.exe program.
Running the MPIlauncher Scripts
This script can be used to submit a file containing Maple code directly to a Grid running MPI from an Operating System command line.
Before running the mpilauncher script make sure that Grid cluster is set up as per the instructions in the previous section.
The mpi launcher can be invoked using the following script command and a combination of options and parameters:
<maple>\toolbox\Grid\bin\mpilauncher.bat -s hh nn ff
<maple>\toolbox\Grid\bin\mpilauncher.bat -q|-r|-c hh jj
<maple>\toolbox\Grid\bin\mpilauncher.bat -a hh
<maple>\toolbox\Grid\bin\mpilauncher.bat -d ss
where the script command parameters are defined as:
hostname of the head node of the HPC cluster
number of nodes requested for the job
absolute file path to Maple code
job id of submitted job
seconds for delay
and the script command options are defined as:
submit Maple file, ff to nn nodes
query status of running job jj
retrieve results of completed job jj
cleanup/remove temp files and output of completed job jj
sleep for ss secs
get number of total and available nodes
To get the total number of available nodes use the command:
mpilauncher -a hh
where hh is the headnode.
To set a pause in the execution of the batch file, use the command
mpilauncher -d ss
where ss is the length of the pause
To submit the Maple script file containing your Maple code use the command:
mpilauncher -s hh nn fff
When you run the mpilauncher.bat script, the Maple script you specified with parameter ff is submitted to the headnode, hh. The headnode then hands copies this script to each of the nn nodes specified, where each node will run the Maple script.
The Maple script is copied to a directory on the head node. The directory will be of the form \shared\mpi\grid\<username> where <username> is the name of the user running the mpilauncher script.
The filename will be of the form maple_jj.code.mpl where jj is the job id for this submission. This job id is then echoed to the console.
Maple output is saved in maple_jj.stdout.txt and maple_jj.stderr.txt. Where jj is the job id.
To check the status of the submitted job use the command:
mpilauncher -q hh jj
where hh is the name of the headnode and jj is the job id of the submission.
The command returns one of the following status messages:
the job is complete
the job was submitted but failed
the job was aborted (usually done by a System Administrator)
the job is running on the grid
the job was submitted but is not running yet
To retrieve the results of a submitted job use the command:
mpilauncher -r nn job
The contents of the file will be returned to the console.
To clean up results and temporary files from the head node use the command:
mpilauncher -c nn job
This will delete the file associated with the job id.
The current library supports the MPICH MPI implementation, MPICH2. Before you start setting up the Grid toolbox, you must ensure that you have a copy of MPICH2 installed on in the same location on each machine that you will be using for Grid computing.
You will need to have passwordless ssh logins for all computational nodes or else the mpiexec program used for job submission will prompt you for a password for every node when a job is submitted.
Before you can run the mpilauncher scripts, you must set up the machines you will be using for Grid computing, by performing the following steps.
Create an NFS network share on one machine in the network that all users can read and write from. For example, /var/mpi.
> exportfs -o rw,root_squash,sync,no_subtree_check :/var/mpi
Other machines must mount this share at the same location and be readable and writable by all users. For example,
mount -t <type> grid01.myserver.com:/var/mpi/var/mpi
The Grid toolbox creates user directories in this shared location to store intermediate work files, specifically maple code and output.
Create a hostfile in the shared mount location, for example in the /var/mpi/hostfile directory. This hostfile should contain the hostname of each host, one per line, that MPICH2 can use as Grid nodes.
Optionally, after each hostname you can add a colon and a number to indicate how many nodes can be run on that host.
If you create this file in the shared mount location, for example, /var/mpi/hostfile, then all machines can share the same hostfile.
grid01.myserver.com # Can handle 1 nodes
grid02.myserver.com:2 # Can handle 2 nodes
grid03.myserver.com:4 # Can handle 4 nodes, maybe a quadcore
grid04.myserver.com:2 # Can handle 2 nodes
Install the Grid toolbox on each machine. The default location of the Grid toolbox server installation is <maple>/toolbox/Grid where <maple> represents where the main Maple program is installed.
Also the compute nodes should be configured so that PATH environment variable contains the directory for the xmaple program.
The Maple command Grid[Setup]("mpi", mpidll="mpich"); instructs the Grid toolbox to use the "mpi" interface instead of the default local implementation. The second argument, "mpich" refers to the operating system basename of the MPI implementation dll or shared library. This dll or shared library should be found in one of the directories named in the PATH environment variable.
<maple>/toolbox/Grid/bin/mpilauncher -s hh nn ff
<maple>/toolbox/Grid/bin/mpilauncher -q|-r|-c hh jj
<maple>/toolbox/Grid/bin/mpilauncher -a hh
<maple>/toolbox/Grid/bin/mpilauncher -d ss
where hh is localhost.
where ss is the length of the pause
When you run the mpilauncher script, the Maple script you specified with parameter ff is submitted to the mpi program and then delegated to the appropriate nodes.
The Maple script is copied to a file in the shared directory that you created in the previous section. The name of the file will be of the form jj.mpl where jj is the job id for this submission.
where hh is localhost and jj is the job id of the submission.
mpilauncher -r hh job
mpilauncher -c hh job
The pbslauncher.sh and pbslauncher.bat script are intended to be run in PBS(tm) environment. Jobs submitted to PBS have an environment variable PBS_NODEFILE that point to a file containing the names of the servers running this job, one server name per line in the file. This is the same format as the nodes.txt shown in the Launcher section above.
Note: all servers in the PBS environment must be configured for the same ports via the grid.properties file as the port cannot be overridden from the pbslauncher script.
To submit a maple file, example.mpl, on a Linux system to the PBS nodes for execution you would issue the command
The pbslauncher script will determine the node file from the PBS_NODEFILE variable and then use this to invoke the equivalent of
launcher.sh example.mpl $PBS_NODEFILE
to send the maple code contained in the file example.mpl to the PBS nodes. Output from the grid computations will be printed to the console's standard output.
Download Help Document