Software

Hydra

1. Where can I find installed software?

Most end-user software on Hydra is available via modules. To obtain a full list of installed software, type:

module av

To get a list of available versions of a given software package (for example, Python), type:

module spider Python

To get a list of extensions included in a given software package (for example, Python 3.6), type:

module show Python/3.6.6-foss-2018b

Modules can be loaded as follows (in this example Python 3.6 is loaded):

module load Python/3.6.6-foss-2018b

You can check which modules are currently loaded with:

module list

Unloading all currently loaded modules can be done like this:

module purge

If you need software that is not yet installed, please contact us at hpc@vub.ac.be.

More information on module usage can be found in the HPC tutorial and in the HPC training slides.

2. The toolchain of software packages

The name of a software package such as Python/3.6.6-foss-2018b not only contains the name of the package Python and its version 3.6.6, but also has the term foss-2018b, which is known as the toolchain. The toolchain is the set of tools used to build that package and it has its own version 2018b. Any specific software can be made available with different toolchains. The most common ones are

  • GCCcore: The toolchain based on the GNU Compiler Collection, which includes front ends for C, C++, Objective-C, Fortran, Java, and Ada.
  • GCC: The GCCcore toolchain including libraries provided by binutils (libstdc++, libgcj,…).
  • foss: The Free and open-source software (FOSS) toolchain is based on GCCcore and also includes support for OpenMPI, OpenBLAS, FFTW and ScaLAPACK.
  • intel: The Intel compilers, Intel MPI and Intel Math Kernel Library (MKL).
  • iomkl: The Intel C/C++ and Fortran compilers, Intel MKL & OpenMPI.
  • fosscuda: The foss toolchain with support for CUDA.

Some toolchains offer improved performance on specialised hardware, for instance certain software may be faster with the intel toolchain if used in compute nodes with Intel hardware; whereas other toolchains offer additional features, such as fosscuda which is needed to execute code on GPUs.

The name of software packages may contain after the toolchain specification additional information regarding other dependencies of the package, such as needed interpreters. For instance, the package R-bundle-Bioconductor/3.7-foss-2018b-R-3.5.1 corresponds to Bioconductor version 3.7 built with the foss toolchain 2018b and requires R version 3.5.1.

It is important to only load modules built with a common toolchain (including its version), otherwise conflicts may occur. Hence, R-bundle-Bioconductor/3.7-foss-2018b-R-3.5.1 can be loaded along with other packages build with the foss-2018b toolchain, but not any other. The only exception to this rule are packages built with the GCCcore toolchain, which is compatible with both the foss and intel toolchains. Note that GCCcore is not the same as GCC, which is incompatible with the intel toolchain. The compatibility between toolchains is version dependent, the most recent ones being:

  • GCCcore-8.2.0 is compatible with foss-2019a and intel-2019a
  • GCCcore-7.3.0 is compatible with foss-2018b and intel-2018b
  • GCCcore-6.4.0 is compatible with foss-2018a, foss-2017b, intel-2018a and intel-2017b
  • GCCcore-6.3.0 is compatible with foss-2017a and intel-2017a

If you cannot find a compatible set of modules that provides the software required for your work, please contact us at hpc@vub.ac.be.

3. How can I install additional software/packages?

You should first check if the needed software or package is already available on Hydra, either with its own module or as part of another module. If the software or package you need is not available, the HPC team recommends to kindly request its installation at hpc@vub.ac.be. This has several advantages:

  1. The HPC team will optimize the compilation for each CPU architecture present in Hydra, guaranteeing that your software/package runs efficiently on all nodes (and usually much faster than installations made by the users).
  2. Free software will be available to all users of Hydra and licensed software can be made available to specific groups of users.
  3. The package will be built in a reproducible way with EasyBuild: important for scientific reproducibility.
  4. Different versions of the software can be installed alongside each other.

If you still want to install additional software/packages yourself, there are several resources available online:

  • Compiling and testing your software on the HPC

    Users compiling their own software should be aware that software compiled on the login nodes may fail in older compute nodes if full hardware optimization is used. The CPU microarchitecture of the login nodes (Skylake) has some instruction sets not available in Hydra’s older compute nodes (Ivy Bridge and Broadwell). Therefore, there are two options to compile your own software

    • Best performance: compile on the login node (with -march=native). The resulting binaries can only run on Skylake nodes, but they offer the best performance on those nodes. Jobs can be restricted to run on Skylake nodes with -l feature=skylake.
    • Best compatibility: compile on any Ivy Bridge node. Login to an Ivy Bridge node with qsub -I -l feature=ivybridge and compile your code on it. The resulting binaries can run on any node on Hydra with good performance. Alternatively, users knowing how to setup the compilation can compile on the login node with -march=ivybridge -mtune=skylake.

    see the advanced section of the HPC tutorial for more information. Please contact us in case of problems or questions.

  • How to install additional Python packages

    see the VSC docs on Python package management

  • How to install additional Perl packages

    see the VSC docs on Perl package management

  • How to install additional R packages

    see our documentation below or the VSC docs on R package management

5. How can I run MATLAB?

MATLAB is available as a module, however it is not recommended to run intensive MATLAB calculations on Hydra: it’s performance is not optimal and parallel execution is not fully supported.

First check which MATLAB versions are available:

module av matlab

Next load a suitable version, for example (take the most recent version):

module load MATLAB/2019a

It is possible to run MATLAB in console mode for quick tests. For example, with a MATLAB script called ‘testmatlab.m’, type:

matlab -nodisplay -r testmatlab

Execution of MATLAB scripts requires compute nodes. In this case, the HPC team highly recommends to first compile your script using the MATLAB compiler mcc:

mcc -m testmatlab.m

This will generate a testmatlab binary file, as well as a ‘run_testmatlab.sh’ shell script (and a few other files). You can ignore the ‘run_testmatlab.sh’ file.

Now you can submit your matlab calculation as a batch job. Your job script should look like this:

#!/bin/bash -l
#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=1
#PBS -l mem=1gb

module load MCR/R2019a

cd $PBS_O_WORKDIR
./testmatlab 2>&1 >testmatlab.out

The advantage of running a compiled matlab binary is that it does not require a license. We have only a limited number of MATLAB licenses that can be used at the same time, so in this way you can run your simulation even if the all licenses are in use.

More information on using the MATLAB compiler can be found here:

https://nl.mathworks.com/help/mps/ml_code/mcc.html

6. How can I use GaussView?

GaussView is a graphical interface used with the computational chemistry program Gaussian. If your ssh session is configured with X11 forwarding, you can use GaussView directly on Hydra after loading the module:

ml GaussView/6.0.16

However, using a graphical interface on Hydra is slow, thus for regular use the HPC team recommends to install GaussView locally. Binary packages of GaussView are available for Linux, Mac, and Windows users and are provided upon request.

Installation of GaussView on Mac:

  1. Untar G16 and GaussView to /Applications (Two new dirs, g16 and gv will be created)

  2. Create a file ~/Library/LaunchAgents/env.GAUSS_EXEDIR.plist and paste the following content into it:

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
    <plist version="1.0">
    <dict>
    <key>Label</key>
    <string>env.GAUSS_EXEDIR</string>
    <key>ProgramArguments</key>
    <array>
    <string>launchctl</string>
    <string>setenv</string>
    <string>GAUSS_EXEDIR</string>
    <string>/Applications/g16/bsd:/Applications/g16</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    </dict>
    </plist>
    
  3. Issue the following command (only once) (or, alternatively, restart the machine):

    launchctl load ~/Library/LaunchAgents/env.GAUSS_EXEDIR.plist
    

7. How can I use R?

Depending on your needs there are different methods to use R in Hydra

  • Interactive sessions for light workloads can be performed in the login node

    1. Login to Hydra

    2. Load the following R module

      module load R/3.5.1-foss-2018b
      
    3. Start R

      R
      

    Note

    If you need a different version of R, make sure to load one with foss in the name. Those versions are based on the GNU open source toolchain and can be used in the login node.

  • Interactive sessions for heavy workloads must be performed in the compute nodes

    1. Login to Hydra

    2. Start an interactive session in a compute node with the following command

      qsub -I
      
    3. Load your R module of choice (preferably a recent version)

      module load R/3.5.1-intel-2018b
      
    4. Start R

      R
      
  • Scripts written in R can be executed with the command Rscript. A minimal job script for R only requires loading the R module and executing your scripts with Rscript

    #!/bin/bash -l
    #PBS -l walltime=01:00:00
    #PBS -l nodes=1:ppn=1
    #PBS -l mem=1gb
    
    module load R/3.5.1-intel-2018b
    
    cd $PBS_O_WORKDIR
    Rscript <path-to-script.R>
    

The quality of the graphics generated by R can be improved by changing the graphical backend to Cairo. Add the following lines in the file ~/.Rprofile to make these changes permanent for your user (create the file ~/.Rprofile if it does not exist)

# Use cairo backend for graphics device
setHook(packageEvent("grDevices", "onLoad"),
    function(...) grDevices::X11.options(type='cairo'))

# Use cairo backend for bitmaps
options(bitmapType='cairo')

8. Packages included in the R library in Hydra

There are already many packages included in the library of R in Hydra. The complete list can be looked up from the shell in Hydra with the following commands

  1. Login to Hydra

  2. List the contents of any R module, for instance R/3.5.1-foss-2018b

    module show R/3.5.1-foss-2018b
    

R packages missing in the library may be provided with their own module. In that case use the module command to search in the repository of Hydra. Please see 1. Where can I find installed software?

Unavailable R packages in Hydra can be requested for installation at our support service. Please send an email to hpc@vub.ac.be and the HPC team will proceed with the installation. See 3. How can I install additional software/packages? for more details.

Developers can compile and install R packages in the local R library of their home directory. However, it is important to note that the microarchitecture of Hydra’s nodes changes from one another and needs to be taken into account to test self-compiled R packages in the nodes.

Note

The packages of a local R library can potentially cause errors due to conflicts with the global R library or due to a version change of R after the installation of local R packages. If you experience errors running R scripts that are related to a failed load of a package, it is helpful to check your script in a clean R environment without a local R library.

  1. Remove all modules and load R

    module purge
    module load R/3.5.1-foss-2018b
    
  2. Enter into a clean R environment (not loading previous workspace)

    R --no-restore
    
  3. Inside the shell of R or at the begining of your R script

    .libPaths('')
    <your R code ...>
    

9. How can I use CESM?

The dependencies required to run CESM in Hydra are provided by the module CESM-deps. This module also contains the XML configuration files for CESM with the specification of machines, compiler and batch system of Hydra. Once CESM-deps is loaded, the configuration files can be found in ${EBROOTCESMMINDEPS}/machines.

The file structure of your CESM simulation should be placed in your $VSC_SCRATCH. Users needing data located elsewhere (e.g. in /projects) can create symlinks in their $VSC_SCRATCH to the corresponding locations. The following steps show an example setup of a CESM case

  1. Login to Hydra

  2. Load the module CESM-deps

    module load CESM-deps/2-foss-2019a
    
  3. Create the required folders in your $VSC_SCRATCH. The settings for Hydra can be found in the file ${EBROOTCESMMINDEPS}/machines/config_machines.xml. The following example will be created inside the folder $VSC_SCRATCH/cime_case, with its output located in $VSC_SCRATCH/cime_output and any data needed by CESM in $VSC_SCRATCH/cesm, which in this case is a link to a location in a project directory.

    cd $VSC_SCRATCH
    ln -s /projects/our_project/cesm
    mkdir cime_output
    mkdir cime_case
    
  4. The creation of a case follows the usual procedure for CESM. Just remember to always copy the configuration files for Hydra found in ${EBROOTCESMMINDEPS}/machines to the source code of CESM and create your case with ./create_newcase --machine hydra

    cd ${VSC_SCRATCH}/cime_case
    git clone -b release-cesm2.0.1 https://github.com/ESCOMP/cesm.git cesm
    cd cesm
    ./manage_externals/checkout_externals
    cp ${EBROOTCESMMINDEPS}/machines/config_{machines,compilers,batch}.xml cime/config/cesm/machines/
    ./create_newcase --machine hydra --case $VSC_SCRATCH/cime_case/cases/control --res f19_g17 --compset I2000Clm50BgcCro
    
  5. The CESM case can now be setup, built and executed. This is done in Hydra with a job script available in ${EBROOTCESMMINDEPS}/scripts/case.job. The job script case.job sets the CESM environment, compiles the case and runs the simulation in one go, minimizing wait times in the queue. Copy case.job to the directory of your case. If needed, you can add commands to the script and/or configure your case with additional xmlchange commands. Once the script is configured to your needs, submit it to the queue with qsub as usual.

    cd ${VSC_SCRATCH}/cime_case/cases/control
    cp ${EBROOTCESMMINDEPS}/scripts/case.job ./
    # Edit case.job if needed
    qsub -l nodes=2:ppn=20 -l walltime=24:00:00 case.job
    

The module CESM-tools/2-foss-2019a provides a set of tools commonly used to analyse and visualize CESM data. Nonetheless, CESM-tools cannot be loaded at the same time as CESM-deps because their packages have incompatible dependencies. Once you obtain the results of your case, unload any modules with module purge before loading CESM-tools for the data post-process.

10. How can I use matplotlib with a graphical interface?

The HPC environment is optimized for the execution of non-ineractive applications in job scripts. Therefore, matplotlib is configured with a non-GUI backend (Agg) that can save the resulting plots in a variety of image file formats. The generated image files can be copied to your own computer for visualization or further editing.

If you need to work interactively with matplotlib and visualize its output from within Hydra, you can do so with the following steps

  1. Login to Hydra enabling X11 forwarding. Linux and Mac OSX users can type the following in their terminal

    ssh -Y username@hydra.vub.ac.be
    ssh -Y username@hydra.ulb.ac.be
    
  2. Enable the TkAgg backend at the very beginning of your Python script

    import matplotlib
    matplotlib.use('TkAgg')
    

Note

The function matplotlib.use() must be done before importing matplotlib.pyplot. Changing the backend parameter in your matplotlibrc file will not have any effect as the system-wide configuration takes precedence over it.

Vega

(TODO)

VSC Tier-1

(TODO)