Beginning With Python
So we have the very basics. Every program we write moving forward will have some kind of output that will be analyzed or at least used in some fashion. This may be the direct result of computation, length of runtime, or some other result. In this example we show how to perform some interesting computation, in the next section we show how to effectively manage these results.
In almost any project there are a substantial number of dependencies installed from some source, in python this is typically pip. As has become standard practice in python project management, all packages must be installed in a virtual environment. Note that not every package or package version available through pip is available on compute nodes. The Alliance uses a limited mirror of PyPi with specifically chosen packages optimized for HPC use. The full list is available here.
To begin we assume that you have some sort of python project with a requirements.txt
already. If you do not we will be using an example project found here. Our default script will begin as follows, and will evolve throughout this guide.
Loading Python
Python is not loaded by default in compute node instances. As such we must specifically load the particular version of python we wish to use. This will give access to pip, virtualenv, etc.
Virtual Environments
In order to install packages and run python scripts we must set up a virtual environment. This is where our python executable and packages will be installed. Using the default python executable can cause issues with programs, especially when trying to use external packages. For more information refer to the virtualenv
documentation.
If you would like you can test this by running it the same way as the original helloworld.sh
script. If you want to jump to running your own python file that is covered in the next subsection.
Note that the virtual environment is created in a temporary directory. This is useful as everything is contained close to compute node, improving compute time. Installing the environment to the login node for example will cause significant performance issues.
Copying Files
Install requirements.txt
Given some required packages they can now be installed almost as per usual. Due to the fact that the script executes on the compute node, in a different location, as opposed to the project folder where the script was dispatched using slurm, we need to reference the original requirements.txt
using the absolute path. This can be determined by executing pwd
in the same directory as requirements.txt
.
Running the Python Script
Now that we have environment and dependencies set up we can actually run the python script. Below is an example python script with expected output.
The following is some example output that might be returned from the above program. Storing results in simple log file text like this is not ideal for later parsing and analysis. The section "Managing Results" is dedicated to best practices for managing these results more effectively.