Reading custom ASCII files with R

While R provides a ton of efficient functions to read formatted data, you often have to read files that do not follow a given pattern throughout the file, e.g the file may contain sections with different formats, or some weird alternating formatting pattern.
If you have worked with Modflow or C2Vsim you know what I’m talking about.

The following, is a trick I have found to make reading a bit easier.
I don’t consider myself experienced R user so I this might not be the most efficient way of doing it, but it has worked very well for me so far, even with very large files.

First I read the entire file

allfile <- readLines("path/to/myfile.fhb")


where allfile is a character vector where each column is a line of the file.

To get the data from each line, first I’m extracting a safe number of characters out of it and split it. This can be done in one line. (The substring can be omitted in most cases, however I have come across files where the end line character ‘\n’ is located many thousand characters away from the actual end which can cause crashes or slowdowns)

maxChar <- 2000
temp <- strsplit(substr(allfile[5], 1, maxChar)[[1]], split = " ")[[1]]

The temp variable is a vector of strings, where some of the elements are empty.
For example

> temp
[1] ""    ""    "625" ""    "1"   ""    "2"   ""    "0"

Finally this it can be converted to numeric vector as:

temp <- as.numeric((temp[which(temp != "")]))
> temp
[1] 625   1   2   0


I hope that helps.

If there is a better way to do so please leave a comment.




Compiling & Building c++ application with Hdf5

As I found surprisingly difficult and confusing to build and compile C++  applications with the hdf5 library, I decided to post a small guide on how I achieve that.

Building hdf5

Hdf5 provides a number of download options. The one I used is the cmake version

After extracting there is a shell script under the main folder .
The building process under Ubuntu required just that one line.
This will create a build directory inside the main folder, whereas the build directory has several folders.

Using hdf5

To use the cmake version we need a CMakeLists.txt file.

A template of such file is included in here. This file and more information can be also found under the Building HDF5 with CMake guide.

Starting from that template and after spending many hours searching I have modified the cmake as follows: to make it work for C++

cmake_minimum_required (VERSION 3.10.1)
project( myFirstHdf5 C CXX )



link_directories( ${HDF5_LIBRARY_DIRS} )

include_directories (${HDF5_INCLUDE_DIR})

#set (example hdfcompile)

add_executable (myFirstHdf5 myFirstHdf5.cpp)

target_link_libraries (myFirstHdf5 ${HDF5_CXX_LIBRARIES})

Put the above CMakeLists.txt file in the same folder along with the *.cpp file, which in the example is named myFirstHdf5.cpp.

Next, to run cmake you need to pass at the minimum the -G option which is the easiest and the HDF5_DIR, which was quite hard to find it. A small hint on how to find that is that the HDF5_DIR should contain the file hdf5-config.cmake. In my case, I found this file in a few places and just picked one and luckily worked.

Here is the full cmake command to build a debug version

cmake -G "Unix Makefiles" 
-DHDF5_DIR=${HOME}/Downloads/CMake-hdf5-1.10.5/build/_CPack_Packages/Linux/TGZ/HDF5-1.10.5-Linux/HDF_Group/HDF5/1.10.5/share/cmake/hdf5 .

I hope this will save a bit of your time if you ever stumble on this


Customize Matlab color order

It’s very common to have to plot a number of individual lines on the same figure. In Matlab this can be as easy as plot(t, Data)  which results in the following figure:

As you can see the default color order includes 7 colors which are repeated.  You can get information about the default colors using get(gca,’Colororder’)  which returns a matrix 7×3 with the rgb channels of the above colors.

There are multiple ways to define a custom color order. One I have found and works well for me is the following. Instead of just plotting, get a handle for the plot

h = plot(t, Data);

Then use the plot handle to change the color order

set(h, {'color'}, num2cell(colormat,2));

where colormat  is a matrix Nx3  where N is the number of unique colors.

Of course, the selection of colors is important and not an easy task. The approach I suggest is the following which makes it quite trivial.

Navigate to Choose the number of classes and the nature of  your data. In my case I choose qualitative and 11 classes. Then from the dropdown menu choose RGB to display the RGB values of colors. Note that these are scaled between 0-255, while matlab requires scaling between 0-1.

A handy feature in this website is that you can copy/paste the values to an editor and convert it as matlab variable with a minimum effort. The result will look like the following:

colormat = [ ...
166,206,227; ...
31,120,180; ...
178,223,138; ...
51,160,44; ...
251,154,153; ...
227,26,28; ...
253,191,111; ...
255,127,0; ...
202,178,214; ...
106,61,154; ...

Using the above color order matrix will make the plot to use a different color for each line

The has some limits on the number of colors that you can define for each category. If you think you need  more colors then it’s better to question the type of plot you are trying to make.