Python Job Runner

To best understand the below information, you should already have anunderstanding of:

# Show the overall status of each partition sinfo # Submit a job sbatch.jobs/jobFile.job # See the entire job queue squeue # See only jobs for a given user squeue -u username # Count number of running / in queue jobs squeue -u username wc-l # Get estimated start times for your jobs (when Sherlock is busy) squeue -start-u username # Show the. Command-Line Interface # Flink provides a Command-Line Interface (CLI) bin/flink to run programs that are packaged as JAR files and to control their execution. The CLI is part of any Flink setup, available in local single node setups and in distributed setups. It connects to the running JobManager specified in conf/flink-config.yaml. Job Lifecycle Management # A.

  • Using the command line to: navigate within directories,create/copy/move/delete files and directories, and run theirintended programs (aka 'executables').

CHTC provides several copies of Python that can be used to run Pythoncode in jobs. See our list of supported versions here: CHTC SupportedPython

This guide details the steps needed to:

If you want to build your own copy of base Python, see this archivedpage: Building a Python installation

If you want to use conda to manage your Python package dependencies, read this guide as background material,then read our guide on using conda.

Python versionName of Python installation file
Python 2.7python27.tar.gz
Python 3.6python36.tar.gz
Python 3.7python37.tar.gz
Python 3.8python38.tar.gz
Python job runner

If your code uses specific Python packages (like numpy, matplotlib,sci-kit learn, etc) follow the directions below to download andprepare the packages you need for job submission. If your job does notrequire any extra Python packages, skip to parts 2 and 3.

You are going to start an interactive job that runs on the HTC buildservers and that downloads a copy of Python. You will then install yourpackages to a folder and zip those files to return to the submit server.

Python Job Runner

These instructions are primarily about adding packages to a freshinstall of Python; if you want to add packages to a pre-existingpackage folder, there will be notes below in boxes like this one.

A. Submit an Interactive Job

Create the following special submit file on the submit server, callingit something like build.sub.

Python Job Runner Online

The only thing you should need to change in the above file is the nameof the python##.tar.gz file - in the 'transfer_input_files' line.We have two versions of Python available to build from -- see the tableabove.

If you want to add packages to a pre-existing package directory, addthe tar.gz file with the packages to the transfer_input_filesline:

Once this submit file is created, you will start the interactive job byrunning the following command:

It may take a few minutes for the build job to start.

B. Install the Packages

Once the interactive build job starts, you should see the Python thatyou specified inside the working directory:

We'll now unzip the copy of Python and set the PATH variable toreference that version of Python:

Python Job Runners

To make sure that your setup worked, try running:

You can also try running this command to make sure the copy of pythonthat is now active is the one you just installed:

The command above should return a path that includes the prefix/var/lib/condor/, indicating that it is installed in your job'sworking directory.

If you're using Python 2, use python2 instead of python3 above (andin what follows). The output should match the version number that youwant to be using!

If you brought along your own package directory, un-tar it here andskip the directory creation step below.

First, create, a directory to put your packages into:

You can choose what name to use for this directory -- if you havedifferent sets of packages that you use for different jobs, you coulduse a more descriptive name than 'packages'

To install the Python packages run the following command:

Replace package1package2 with the names of packages you want toinstall. pip should download all dependent packages and install them.Certain packages may take longer than others.

Python job runner

C. Finish Up

Right now, if we exit the interactive job, nothing will be transferredback because we haven't created any new files in the workingdirectory, just sub-directories. In order to transfer back ourinstallation, we will need to compress it into a tarball file - not onlywill HTCondor then transfer back the file, it is generally easier totransfer a single, compressed tarball file than an uncompressed set ofdirectories.

Python Job Runner Download

Run the following command to create your own tarball of your packages:

Again, you can use a different name for the tar.gz file, if you want.

We now have our packages bundled and ready for CHTC! You can now exitthe interactive job and the tar.gz file with your Python packages willreturn to the submit server with you (this sometimes takes a few extraseconds after exiting).

In order to use CHTC's copy of Python and the packages you haveprepared in an HTCondor job, we will need to write a script that unpacksboth Python and the packages and then runs our Python code. We will usethis script as as the executable of our HTCondor submit file.

A sample script appears below. After the first line, the lines startingwith hash marks are comments . You should replace 'my_script.py' withthe name of the script you would like to run, and modify the Pythonversion numbers to be the same as what you used above to install yourpackages.

If you have additional commands you would like to be run within the job,you can add them to this base script. Once your script does what youwould like, give it executable permissions by running:

Arguments in Python

To pass arguments to an R script within a job, you'll need to use thefollowing syntax in your main executable script, in place of thegeneric command above:

Python Job Runner Tutorial

Here, $1 and $2 are the first and second arguments passed to thebash script from the submit file (see below), which are then sent onto the Python script. For more (or fewer) arguments, simply add more(or fewer) arguments and numbers.

In addition, your Python script will need to be able to acceptarguments from the command line. There is an explanation of how to dothis in this Software Carpentrylesson.

A sample submit file can be found in our helloworld example page. You should make the followingchanges in order to run Python jobs:

  • Your executable should be the script that you wroteabove.

  • Modify the CPU/memory request lines. Test a few jobs for disk space/memory usage in order to make sure your requests for a large batch are accurate!
    Disk space and memory usage can be found in the log file after the job completes.
  • Change transfer_input_files to include:
  • If your script takes arguments (see the box from the previoussection), include those in the arguments line:

Runner

How big is your package tarball?

If your package tarball is larger than 100 MB, you should NOT transferthe tarball using transfer_input_files. Instead, you should useCHTC's web proxy, squid. In order to request space on squid,email the research computing facilitators at chtc@cs.wisc.edu.