Compiling & Building c++ application with Hdf5

As I found surprisingly difficult and confusing to build and compile C++  applications with the hdf5 library, I decided to post a small guide on how I achieve that.

Building hdf5

Hdf5 provides a number of download options. The one I used is the cmake version
CMake-hdf5-1.10.5.tar.gz.

After extracting there is a shell script under the main folder build-unix.sh .
The building process under Ubuntu required just that one line.
This will create a build directory inside the main folder, whereas the build directory has several folders.

Using hdf5

To use the cmake version we need a CMakeLists.txt file.

A template of such file is included in here. This file and more information can be also found under the Building HDF5 with CMake guide.

Starting from that template and after spending many hours searching I have modified the cmake as follows: to make it work for C++

cmake_minimum_required (VERSION 3.10.1)
project( myFirstHdf5 C CXX )

set (LIB_TYPE STATIC) # or SHARED
string(TOLOWER ${LIB_TYPE} SEARCH_TYPE)

#find_package (HDF5 NAMES hdf5 COMPONENTS C CXX ${SEARCH_TYPE})
#find_package(HDF5 COMPONENTS CXX HL REQUIRED)
find_package(HDF5 COMPONENTS C CXX HL REQUIRED)

link_directories( ${HDF5_LIBRARY_DIRS} )

include_directories (${HDF5_INCLUDE_DIR})
set (LINK_LIBS ${LINK_LIBS} ${HDF5_C_${LIB_TYPE}_LIBRARY})

#set (example hdfcompile)

add_executable (myFirstHdf5 myFirstHdf5.cpp)

target_link_libraries (myFirstHdf5 ${HDF5_CXX_LIBRARIES})

Put the above CMakeLists.txt file in the same folder along with the *.cpp file, which in the example is named myFirstHdf5.cpp.

Next, to run cmake you need to pass at the minimum the -G option which is the easiest and the HDF5_DIR, which was quite hard to find it. A small hint on how to find that is that the HDF5_DIR should contain the file hdf5-config.cmake. In my case, I found this file in a few places and just picked one and luckily worked.

Here is the full cmake command to build a debug version

cmake -G "Unix Makefiles" 
-DCMAKE_BUILD_TYPE=Debug 
-DHDF5_DIR=${HOME}/Downloads/CMake-hdf5-1.10.5/build/_CPack_Packages/Linux/TGZ/HDF5-1.10.5-Linux/HDF_Group/HDF5/1.10.5/share/cmake/hdf5 .

I hope this will save a bit of your time if you ever stumble on this

 

Run C2Vsim on Cluster

Here I want to show how to run C2Vsim on the Cluster at UCDavis. The cluster OS is Ubuntu and uses SLURM manager for job submission.

To submit a job we have to create shell script, for example the file Run_c2vsim_job.sbatch.

The first part of the script contains options for the SLURM. Below is a small list of options I use to run the jobs

#!/bin/bash
#
# job name:
#SBATCH --job-name='C2VSIM'
#
# Number of tasks (cores):
#
#SBATCH --ntasks=1
#
#SBATCH --array=1-100
#
#SBATCH --output=outp%j.log
#SBATCH --error=outp%j.err
#

The first option  #SBATCH –job-name=’C2VSIM’ defines a name for the job to be submitted.

The second option #SBATCH –ntasks=1 defines the number of cores we need for the job. However IWFM does not support multi-core or multi-thread simulations therefore this option should be 1. However if the simulation requires more memory than the memory allocated by 1 core then we may ask for more cores to increase the memory.

The next option #SBATCH –array=1-100 is used for array jobs. This is used to run repeated simulations with different inputs. This is ideal for Monte Carlo simulations. This option will run the simulation 100 times.

The options #SBATCH –output=outp%j.log and #SBATCH –error=outp%j.err defines the names of the log and error files. For each simulation there would be two files outpXXXXXX.log andoutpXXXXXX.err. The console output is printed on *.log and any errors are printed in *.err file. The XXXXXXX gets its values from the job and array IDs

# Load your modules
#
#
# Set up your environment
cd /home/giorgk/C2VSIM/Finegrid

Next we load any module if that’s required. To run IWFM there is no need to load any module and finally we enter into the working directory.

As it was mention in a previous post under linux we need to have all input and output files under the same directory. However this would cause a problem when we try to run the simulation multiple times simultaneously. To do so we need to run each simulation in a separate folder. Doing this manually is quite tedious. To tackle this I use the following workflow: I have created a folder which contains all original C2Vsim input directories and files. Then I copy them to a folder unique for each simulation

wrksp_folder=temp'_'$SLURM_ARRAY_JOB_ID'_'$SLURM_ARRAY_TASK_ID
#
echo "------Simulation started $(date)"
#
cp -r Clean_Input_files $wrksp_folder

The above snippet defines a variable wrksp_folder as temp_id1_id2  where temp can be any user defined string, id1 is the job id given by the SLURM and id2 spans from 1 to number of array jobs. The last command copies the input files from the folder Clean_Input_files to the unique for each simulation folder.

Next we enter into this folder and run the simulation. (Refer to this post where the content of the Run_C2VSim.sh file is explained)

cd $wrksp_folder
./Run_C2VSim.sh

After the simulation is finished we have to copy the results in a safe place and clean the temporary files.

res_folder=results'_'$SLURM_ARRAY_JOB_ID'_'$SLURM_ARRAY_TASK_ID
cp -r Results/ ../Results/$res_folder
cd ..
#
rm -r $wrksp_folder
echo "------Simulation Finished $(date)"

Similar to working folder we define a unique name for the results folder. Next we copy the content of the $wrksp_folder/Results folder somewhere outside of the $wrksp_folder/ (It may seem confusing but in the second line the Results/ and ../Results/ are two totally different directories)

After the results are copied we delete the $wrksp_folder and print the time when the simulation is finished.

To submit this job on the cluster execute

sbatch Run_c2vsim_job.sbatch

However for the farm cluster at UCDavis this is not going to work. Actually you need to run this with the following option

sbatch -p serial Run_c2vsim_job.sbatch