GIS projections in Matlab and R

In spatial analysis it is very common to gather GIS layers from a variety of sources and more than often the layers may have different coordinate systems. Every GIS software know how to project each layer that comes with different projection information (The projection information is saved in a separate file with suffix *.prj) and can perform spatial analysis tasks on layers with different projection system. 

In Matlab and R however the spatial analysis operations work only if the layers have the same coordinate system. Luckily both languages provide functions to convert between projections. Matlab introduced some functions that help with the transformation in the latest update 2020b. So the following will not work in any previous Matlab version not even in 2020a.

Matlab

First let’s bring a couple of data into Matlab workspace. The first layer is a point layer with well locations and the second layer contains the Subregions of the C2Vsim model groundwater basins. For the first layer the *.prj file is missing therefore the coordinate is unknown. For the second file we know from the report that the coordinate system  is the EPSG:26910 – NAD83 / UTM zone 10N – Projected.
Mode info about EPSG see here 

The wells_info.CoordinateReferenceSystem returns an empty field because the projection information is missing. On the other hand  C2Vsim_basins_info.CoordinateReferenceSystem returns the following structure

Unfortunately I’m not aware of any automated process to identify the projection system but we can make guesses by examining the coordinate values. If we plot the well data set we get the following image

We can see that the range of X and Y coordinates is between -124 – -114 and 33 – 42. If you work with California data this range corresponds to the WGS 84 system with EPSG:4326. This is the coordinate system that many web applications use such as google maps, earth engine etc. use. We can easily verify our assumptions in matlab using the webmap tool. If the coordinate system is indeed 4326 then they should be projected in the right place.

We can see that the well locations fall in the right place. Now we can transform this to any coordinate system using. Note that you may have to switch X and Y

Next we have to convert the basin coordinates from EPSG:26910 to EPSG:3310. I haven’t found yet and way to do this transformation using one line so I first convert from EPSG:26910 to EPSG:4326 using projinv and then convert to EPSG:3310. Also because the projinv and projfwd accept only X,Y coordinates not structures we have to loop through the polygons

Now that the two layers have their coordinates on the same system we can overlay them

 

Now that the data sets share the same coordinate system we can do spatial operations. For example we can isolate the wells that are located in the southern most sub-basin of the Central Valley.

 

R

In R the spatial transformations are even easier. First load the library library("rgdal")

The transformation between coordinate system is just one line of code

 

 

Reading custom ASCII files with R

While R provides a ton of efficient functions to read formatted data, you often have to read files that do not follow a given pattern throughout the file, e.g the file may contain sections with different formats, or some weird alternating formatting pattern.
If you have worked with Modflow or C2Vsim you know what I’m talking about.

The following, is a trick I have found to make reading a bit easier.
I don’t consider myself experienced R user so I this might not be the most efficient way of doing it, but it has worked very well for me so far, even with very large files.

First I read the entire file

where allfile is a character vector where each column is a line of the file.

To get the data from each line, first I’m extracting a safe number of characters out of it and split it. This can be done in one line. (The substring can be omitted in most cases, however I have come across files where the end line character ‘\n’ is located many thousand characters away from the actual end which can cause crashes or slowdowns)

The temp variable is a vector of strings, where some of the elements are empty.
For example

Finally this it can be converted to numeric vector as:

I hope that helps.

If there is a better way to do so please leave a comment.

Compiling & Building c++ application with Hdf5

As I found surprisingly difficult and confusing to build and compile C++  applications with the hdf5 library, I decided to post a small guide on how I achieve that.

Building hdf5

Hdf5 provides a number of download options. The one I used is the cmake version
CMake-hdf5-1.10.5.tar.gz.

After extracting there is a shell script under the main folder build-unix.sh .
The building process under Ubuntu required just that one line.
This will create a build directory inside the main folder, whereas the build directory has several folders.

Using hdf5

To use the cmake version we need a CMakeLists.txt file.

A template of such file is included in here. This file and more information can be also found under the Building HDF5 with CMake guide.

Starting from that template and after spending many hours searching I have modified the cmake as follows: to make it work for C++

Put the above CMakeLists.txt file in the same folder along with the *.cpp file, which in the example is named myFirstHdf5.cpp.

Next, to run cmake you need to pass at the minimum the -G option which is the easiest and the HDF5_DIR, which was quite hard to find it. A small hint on how to find that is that the HDF5_DIR should contain the file hdf5-config.cmake. In my case, I found this file in a few places and just picked one and luckily worked.

Here is the full cmake command to build a debug version

I hope this will save a bit of your time if you ever stumble on this