Reading custom ASCII files with R

While R provides a ton of efficient functions to read formatted data, you often have to read files that do not follow a given pattern throughout the file, e.g the file may contain sections with different formats, or some weird alternating formatting pattern.
If you have worked with Modflow or C2Vsim you know what I’m talking about.

The following, is a trick I have found to make reading a bit easier.
I don’t consider myself experienced R user so I this might not be the most efficient way of doing it, but it has worked very well for me so far, even with very large files.

First I read the entire file

 

where allfile is a character vector where each column is a line of the file.

To get the data from each line, first I’m extracting a safe number of characters out of it and split it. This can be done in one line. (The substring can be omitted in most cases, however I have come across files where the end line character ‘\n’ is located many thousand characters away from the actual end which can cause crashes or slowdowns)

The temp variable is a vector of strings, where some of the elements are empty.
For example

Finally this it can be converted to numeric vector as:

 

I hope that helps.

If there is a better way to do so please leave a comment.

 

 

 

Compiling & Building c++ application with Hdf5

As I found surprisingly difficult and confusing to build and compile C++  applications with the hdf5 library, I decided to post a small guide on how I achieve that.

Building hdf5

Hdf5 provides a number of download options. The one I used is the cmake version
CMake-hdf5-1.10.5.tar.gz.

After extracting there is a shell script under the main folder build-unix.sh .
The building process under Ubuntu required just that one line.
This will create a build directory inside the main folder, whereas the build directory has several folders.

Using hdf5

To use the cmake version we need a CMakeLists.txt file.

A template of such file is included in here. This file and more information can be also found under the Building HDF5 with CMake guide.

Starting from that template and after spending many hours searching I have modified the cmake as follows: to make it work for C++

Put the above CMakeLists.txt file in the same folder along with the *.cpp file, which in the example is named myFirstHdf5.cpp.

Next, to run cmake you need to pass at the minimum the -G option which is the easiest and the HDF5_DIR, which was quite hard to find it. A small hint on how to find that is that the HDF5_DIR should contain the file hdf5-config.cmake. In my case, I found this file in a few places and just picked one and luckily worked.

Here is the full cmake command to build a debug version

I hope this will save a bit of your time if you ever stumble on this

 

Customize Matlab color order

It’s very common to have to plot a number of individual lines on the same figure. In Matlab this can be as easy as  plot(t, Data)  which results in the following figure:

As you can see the default color order includes 7 colors which are repeated.  You can get information about the default colors using  get(gca,'Colororder')  which returns a matrix 7×3 with the rgb channels of the above colors.

There are multiple ways to define a custom color order. One I have found and works well for me is the following. Instead of just plotting, get a handle for the plot

Then use the plot handle to change the color order

where  colormat  is a matrix Nx3  where N is the number of unique colors.

Of course, the selection of colors is important and not an easy task. The approach I suggest is the following which makes it quite trivial.

Navigate to http://colorbrewer2.org/. Choose the number of classes and the nature of  your data. In my case I choose qualitative and 11 classes. Then from the dropdown menu choose RGB to display the RGB values of colors. Note that these are scaled between 0-255, while matlab requires scaling between 0-1.

A handy feature in this website is that you can copy/paste the values to an editor and convert it as matlab variable with a minimum effort. The result will look like the following:

Using the above color order matrix will make the plot to use a different color for each line

The http://colorbrewer2.org/ has some limits on the number of colors that you can define for each category. If you think you need  more colors then it’s better to question the type of plot you are trying to make.