Streamline the user experience: bash wrappers and modules
Overview
Teaching: 15 min
Exercises: 15 minQuestions
Objectives
Simplify containers usage by means of bash wrappers
Discuss how to deploy containers and their wrappers using aliases
Can we standardise the use of containers, to simplify the required syntax?
To answer this question, let’s grab the BLAST container we used in the demo on sharing files with the host (the image will be quickly pulled from cache if you ran that demo):
$ cd $TUTO/demos/wrap_blast
$ singularity pull docker://quay.io/biocontainers/blast:2.9.0--pl526he19e7b1_7
Now, let’s think about the typical usage of a containerised application. Once the container image is available in the local disk, in the vast majority of cases you’ll use it to execute some command in this way:
singularity exec ./blast_2.9.0--pl526he19e7b1_7.sif <CMD> <ARGS>
As a plain, useful example, let’s suppose we want to get the help output from the blastp
command:
$ singularity exec ./blast_2.9.0--pl526he19e7b1_7.sif blastp -help
We can break this into logical parts; let’s write a script called blastp.1
for convenience:
#!/bin/bash
image_dir="."
image_name="blast_2.9.0--pl526he19e7b1_7.sif"
cmd="blastp"
args="$@"
singularity exec $image_dir/$image_name $cmd $args
Look at how general the expression in the last line of this script is!
We’re also using shell variables to express tool- and command- specific information. Of these, the image location image_dir
and name image_name
are set at the time we pull the image. The command name, cmd
, might change from command to command. So, for instance, we might write a script for the command makeblastdb
by only changing that line:
#!/bin/bash
image_dir="."
image_name="blast_2.9.0--pl526he19e7b1_7.sif"
cmd="makeblastdb"
args="$@"
singularity exec $image_dir/$image_name $cmd $args
How about the value we assigned to the command arguments variable, args
? Well, that’s bash syntax. If you execute this script, bash will assign to $@
the full list of arguments that you append to the script in the command line.
To see a practical example, let’s make the blastp.1
script executable (using chmod
) and run it with the -help
argument:
$ chmod +x blastp.1
$ ./blastp.1 -help
USAGE
blastp [-h] [-help] [-import_search_strategy filename]
[..]
-use_sw_tback
Compute locally optimal Smith-Waterman alignments?
From the output, you can see that the blastp
command actually got the -help
flag right, and this was thanks to the usage of $@
in the script.
So to summarise this section, we’ve written a simple bash script that wraps around the Singularity exec
approach, so that to run blastp
from a container you simply type:
$ ./blastp.1 <ARGS>
Why the .1
extension? Well, this is just because the story is not over…
A (quite) general bash wrapper for containerised applications
In the first iteration of a bash wrapper for containerised commands, we need to provide 3 pieces of information in the script: image location, image name and command name. Can we further simplify and generalise this?
Yes. With a couple of extra bash commands and assumptions, we can make it so that the only required information will be the container image name.
First, let’s get rid of the command name.
Let’s assume that we’re calling the wrapper with the same name of the command we want it to execute. Then, we’re going to use the bash variable $0
; used inside a script, it contains the full path of the script itself; we’re also using the bash command basename
, that extract a file or directory name out of its full path. The cmd
variable becomes:
cmd="$(basename $0)"
And now, let’s generalise the image location.
Let’s assume that we’re storing the wrappers in the same directory where the image is located. Then, we can use the bash command dirname
to extract the location of a file or directory out of its full path. The image_dir
variable becomes:
image_dir="$(dirname $0)"
So we can now have a general bash wrapper for BLAST commands from the container image blast_2.9.0--pl526he19e7b1_7.sif
:
#!/bin/bash
image_dir="$(dirname $0)"
image_name="blast_2.9.0--pl526he19e7b1_7.sif"
cmd="$(basename $0)"
args="$@"
singularity exec $image_dir/$image_name $cmd $args
To create a wrapper for blastp
, all we have to do is to create a script named blastp
with that content. Then, we can do the same for makeblastdb
, blastn
, blastx
and so on.
To limit the number of files, we might even just have a single copy of this script, e.g. named blastp
, and then create symbolic links for the other commands, for instance:
$ ln -s blastp makeblastdb
What if we need bash wrappers for the Trinity assembler from the pulled image ?
Well, just make a new script with a different image_name
, named according to the required command:
image_name="trinityrnaseq_2.8.6.sif"
How general is this approach?
Well, quite general probably. It can be used every time you would use containers with this Singularity syntax:
singularity exec <IMAGE> <CMD> <ARGS>
This will also work with MPI containers and Slurm, as the corresponding syntax does not impact such form:
mpirun -n <NNODES> singularity exec <IMAGE> <CMD> <ARGS>
srun -n <NNODES> singularity exec <IMAGE> <CMD> <ARGS>
Of course there are some corner cases.
For instance, for GPU enabled containers, after exec
in the wrapper you will need to add --nv
(Nvidia) or --rocm
(AMD).
Using overlays requires adding --overlay <OVERLAY FILEPATH>
, with the file path possibly specified using a shell variable that you can define prior to executing the wrapper.
Wrappers to launch GUI sessions will also require some tweaking.
What if we need to bind mount some host directories?
This is a case worth commenting in this context.
Specifying the paths to be bind mounted as additional flags in the wrappers is not really general nor portable.
So what you want to do here is to use $SINGULARITY_BINDPATH
, defining the required paths prior to execution of the application.
If you have a standard setup on your system, where all the data go under the same parent directory (e.g. /data
), you might even want to define the variable in the startup scripts (~/.bashrc
,…). This can be quite a good practice in simplifying your production environment, and making it more robust.
In this respect, in Pawsey HPC systems the singularity module adds /group
and /scratch
to the the bind path, so you don’t have to worry about bind mounting data directories at all.
Using Aliases
Another way to shortcut the singularity command line is to use alias feature of the BASH shell.
We create an alias with the command
alias alias_name="command_to_run"
So for our blastp example:
$ alias blastp="singularity exec /home/ubuntu/singularity-containers/demos/wrap_blast/blast_2.9.0--pl526he19e7b1_7.sif blastp"
$ blastp -h
USAGE
blastp [-h] [-help] [-import_search_strategy filename]
[-export_search_strategy filename] [-task task_name] [-db database_name]
[-dbsize num_letters] [-gilist filename] [-seqidlist filename]
...
Key Points
It is possible to devise a quite general wrapper template for containerised application
The key information to setup the wrappers is the container image, and the commands one needs to run from that image