GIS projections in Matlab and R

Posted on December 2, 2020December 2, 2020 by Giorgos Kourakos

In spatial analysis it is very common to gather GIS layers from a variety of sources and more than often the layers may have different coordinate systems. Every GIS software know how to project each layer that comes with different projection information (The projection information is saved in a separate file with suffix *.prj) and can perform spatial analysis tasks on layers with different projection system.

In Matlab and R however the spatial analysis operations work only if the layers have the same coordinate system. Luckily both languages provide functions to convert between projections. Matlab introduced some functions that help with the transformation in the latest update 2020b. So the following will not work in any previous Matlab version not even in 2020a.

Matlab

First let’s bring a couple of data into Matlab workspace. The first layer is a point layer with well locations and the second layer contains the Subregions of the C2Vsim model groundwater basins. For the first layer the *.prj file is missing therefore the coordinate is unknown. For the second file we know from the report that the coordinate system is the EPSG:26910 – NAD83 / UTM zone 10N – Projected.
Mode info about EPSG see here

% For the well layer
% Read the shapefile information
wells_info = shapeinfo('CVwelldata');
% Read the shapefile geometry
wells = shaperead('CVwelldata');

% For the Subregion layer
% Read the shapefile information
C2Vsim_basins_info = shapeinfo('C2Vsim_subBasins');
% Read the shapefile geometry
C2Vsim_basins = shaperead('C2Vsim_subBasins');

% For the well layer

% Read the shapefile information

wells_info = shapeinfo('CVwelldata');

% Read the shapefile geometry

wells = shaperead('CVwelldata');

% For the Subregion layer

% Read the shapefile information

C2Vsim_basins_info = shapeinfo('C2Vsim_subBasins');

% Read the shapefile geometry

C2Vsim_basins = shaperead('C2Vsim_subBasins');

The wells_info.CoordinateReferenceSystem returns an empty field because the projection information is missing. On the other hand C2Vsim_basins_info.CoordinateReferenceSystem returns the following structure

ans = 

  projcrs with properties:

                    Name: "NAD83 / UTM zone 10N"
           GeographicCRS: [1×1 geocrs]
        ProjectionMethod: "Transverse Mercator"
              LengthUnit: "meter"
    ProjectionParameters: [1×1 map.crs.ProjectionParameters]

ans =

projcrs with properties:

Name: "NAD83 / UTM zone 10N"

GeographicCRS: [1×1 geocrs]

ProjectionMethod: "Transverse Mercator"

LengthUnit: "meter"

ProjectionParameters: [1×1 map.crs.ProjectionParameters]

Unfortunately I’m not aware of any automated process to identify the projection system but we can make guesses by examining the coordinate values. If we plot the well data set we get the following image

plot([wells.X]', [wells.Y]','.')
axis equal

1 2	plot([wells.X]', [wells.Y]','.') axis equal

We can see that the range of X and Y coordinates is between -124 – -114 and 33 – 42. If you work with California data this range corresponds to the WGS 84 system with EPSG:4326. This is the coordinate system that many web applications use such as google maps, earth engine etc. use. We can easily verify our assumptions in matlab using the webmap tool. If the coordinate system is indeed 4326 then they should be projected in the right place.

webmap % starts the webmap
wmmarker([wells.Y]', [wells.X]','Alpha',1, 'Color','green', 'IconScale', 0.2, ...
  'Icon','http://maps.google.com/mapfiles/kml/shapes/shaded_dot.png');

webmap % starts the webmap

wmmarker([wells.Y]', [wells.X]','Alpha',1, 'Color','green', 'IconScale', 0.2, ...

'Icon','http://maps.google.com/mapfiles/kml/shapes/shaded_dot.png');

We can see that the well locations fall in the right place. Now we can transform this to any coordinate system using. Note that you may have to switch X and Y

[Wx,Wy] = projfwd(projcrs(3310),[wells.Y]', [wells.X]');

1	[Wx,Wy] = projfwd(projcrs(3310),[wells.Y]', [wells.X]');

Next we have to convert the basin coordinates from EPSG:26910 to EPSG:3310. I haven’t found yet and way to do this transformation using one line so I first convert from EPSG:26910 to EPSG:4326 using projinv and then convert to EPSG:3310. Also because the projinv and projfwd accept only X,Y coordinates not structures we have to loop through the polygons

for ii = 1:length(C2Vsim_basins)
    [lat,lon] = projinv(projcrs(26910),C2Vsim_basins(ii,1).X, C2Vsim_basins(ii,1).Y);
    [C2Vsim_basins(ii,1).X3310, C2Vsim_basins(ii,1).Y3310] = projfwd(projcrs(3310),lat, lon);
end

for ii = 1:length(C2Vsim_basins)

[lat,lon] = projinv(projcrs(26910),C2Vsim_basins(ii,1).X, C2Vsim_basins(ii,1).Y);

[C2Vsim_basins(ii,1).X3310, C2Vsim_basins(ii,1).Y3310] = projfwd(projcrs(3310),lat, lon);

end

Now that the two layers have their coordinates on the same system we can overlay them

plot(Wx,Wy,'.')
hold on
for ii = 1:length(C2Vsim_basins)
    plot(C2Vsim_basins(ii,1).X3310, C2Vsim_basins(ii,1).Y3310,'r')
end

plot(Wx,Wy,'.')

hold on

for ii = 1:length(C2Vsim_basins)

plot(C2Vsim_basins(ii,1).X3310, C2Vsim_basins(ii,1).Y3310,'r')

end

Now that the data sets share the same coordinate system we can do spatial operations. For example we can isolate the wells that are located in the southern most sub-basin of the Central Valley.

in = inpolygon(Wx, Wy, C2Vsim_basins(21,1).X3310, C2Vsim_basins(21,1).Y3310);
plot(Wx(in), Wy(in),'.')
hold on
plot(C2Vsim_basins(21,1).X3310, C2Vsim_basins(21,1).Y3310,'r')

in = inpolygon(Wx, Wy, C2Vsim_basins(21,1).X3310, C2Vsim_basins(21,1).Y3310);

plot(Wx(in), Wy(in),'.')

hold on

plot(C2Vsim_basins(21,1).X3310, C2Vsim_basins(21,1).Y3310,'r')

R

In R the spatial transformations are even easier. First load the library library("rgdal")

The transformation between coordinate system is just one line of code

# Read the data sets
wells &lt;- readOGR(dsn = '.', layer = 'CVwelldata')
C2Vsim_basins &lt;- readOGR(dsn = '.', layer = 'C2Vsim_subBasins')

# Change the coordinate system to 3310
C2Vsim_basins_3310 &lt;- spTransform(C2Vsim_basins, CRS("+init=epsg:3310"))
# Because the wells have no coordinate information first we have to assign the correct one
# we know that this is the 4326
crs(wells) &lt;- CRS('+init=EPSG:4326')
# Now we can make the transformation
wells_3310 &lt;- spTransform(wells, CRS("+init=epsg:3310"))

# Read the data sets

wells <- readOGR(dsn = '.', layer = 'CVwelldata')

C2Vsim_basins <- readOGR(dsn = '.', layer = 'C2Vsim_subBasins')

# Change the coordinate system to 3310

C2Vsim_basins_3310 <- spTransform(C2Vsim_basins, CRS("+init=epsg:3310"))

# Because the wells have no coordinate information first we have to assign the correct one

# we know that this is the 4326

crs(wells) <- CRS('+init=EPSG:4326')

# Now we can make the transformation

wells_3310 <- spTransform(wells, CRS("+init=epsg:3310"))

Reading custom ASCII files with R

Posted on May 20, 2020November 17, 2020 by Giorgos Kourakos

While R provides a ton of efficient functions to read formatted data, you often have to read files that do not follow a given pattern throughout the file, e.g the file may contain sections with different formats, or some weird alternating formatting pattern.
If you have worked with Modflow or C2Vsim you know what I’m talking about.

The following, is a trick I have found to make reading a bit easier.
I don’t consider myself experienced R user so I this might not be the most efficient way of doing it, but it has worked very well for me so far, even with very large files.

First I read the entire file

allfile &lt;- readLines("path/to/myfile.fhb")

1	allfile <- readLines("path/to/myfile.fhb")

where allfile is a character vector where each column is a line of the file.

To get the data from each line, first I’m extracting a safe number of characters out of it and split it. This can be done in one line. (The substring can be omitted in most cases, however I have come across files where the end line character ‘\n’ is located many thousand characters away from the actual end which can cause crashes or slowdowns)

maxChar &lt;- 2000 
temp &lt;- strsplit(substr(allfile[5], 1, maxChar)[[1]], split = " ")[[1]]

1 2	maxChar <- 2000 temp <- strsplit(substr(allfile[5], 1, maxChar)[[1]], split = " ")[[1]]

The temp variable is a vector of strings, where some of the elements are empty.
For example

&gt; temp 
[1] "" "" "625" "" "1" "" "2" "" "0"

1 2	> temp [1] "" "" "625" "" "1" "" "2" "" "0"

Finally this it can be converted to numeric vector as:

temp &lt;- as.numeric((temp[which(temp != "")])) 
&gt; temp [1] 625 1 2 0

1 2	temp <- as.numeric((temp[which(temp != "")])) > temp [1] 625 1 2 0

I hope that helps.

If there is a better way to do so please leave a comment.

Compiling & Building c++ application with Hdf5

Posted on April 6, 2019April 6, 2019 by Giorgos Kourakos

As I found surprisingly difficult and confusing to build and compile C++ applications with the hdf5 library, I decided to post a small guide on how I achieve that.

Building hdf5

Hdf5 provides a number of download options. The one I used is the cmake version
CMake-hdf5-1.10.5.tar.gz.

After extracting there is a shell script under the main folder build-unix.sh .
The building process under Ubuntu required just that one line.
This will create a build directory inside the main folder, whereas the build directory has several folders.

Using hdf5

To use the cmake version we need a CMakeLists.txt file.

A template of such file is included in here. This file and more information can be also found under the Building HDF5 with CMake guide.

Starting from that template and after spending many hours searching I have modified the cmake as follows: to make it work for C++

cmake_minimum_required (VERSION 3.10.1)
project( myFirstHdf5 C CXX )

set (LIB_TYPE STATIC) # or SHARED
string(TOLOWER ${LIB_TYPE} SEARCH_TYPE)

#find_package (HDF5 NAMES hdf5 COMPONENTS C CXX ${SEARCH_TYPE})
#find_package(HDF5 COMPONENTS CXX HL REQUIRED)
find_package(HDF5 COMPONENTS C CXX HL REQUIRED)

link_directories( ${HDF5_LIBRARY_DIRS} )

include_directories (${HDF5_INCLUDE_DIR})
set (LINK_LIBS ${LINK_LIBS} ${HDF5_C_${LIB_TYPE}_LIBRARY})

#set (example hdfcompile)

add_executable (myFirstHdf5 myFirstHdf5.cpp)

target_link_libraries (myFirstHdf5 ${HDF5_CXX_LIBRARIES})

cmake_minimum_required (VERSION 3.10.1)

project( myFirstHdf5 C CXX )

set (LIB_TYPE STATIC) # or SHARED

string(TOLOWER ${LIB_TYPE} SEARCH_TYPE)

#find_package (HDF5 NAMES hdf5 COMPONENTS C CXX ${SEARCH_TYPE})

#find_package(HDF5 COMPONENTS CXX HL REQUIRED)

find_package(HDF5 COMPONENTS C CXX HL REQUIRED)

link_directories( ${HDF5_LIBRARY_DIRS} )

include_directories (${HDF5_INCLUDE_DIR})

set (LINK_LIBS ${LINK_LIBS} ${HDF5_C_${LIB_TYPE}_LIBRARY})

#set (example hdfcompile)

add_executable (myFirstHdf5 myFirstHdf5.cpp)

target_link_libraries (myFirstHdf5 ${HDF5_CXX_LIBRARIES})

Put the above CMakeLists.txt file in the same folder along with the *.cpp file, which in the example is named myFirstHdf5.cpp.

Next, to run cmake you need to pass at the minimum the -G option which is the easiest and the HDF5_DIR, which was quite hard to find it. A small hint on how to find that is that the HDF5_DIR should contain the file hdf5-config.cmake. In my case, I found this file in a few places and just picked one and luckily worked.

Here is the full cmake command to build a debug version

cmake -G "Unix Makefiles" 
-DCMAKE_BUILD_TYPE=Debug 
-DHDF5_DIR=${HOME}/Downloads/CMake-hdf5-1.10.5/build/_CPack_Packages/Linux/TGZ/HDF5-1.10.5-Linux/HDF_Group/HDF5/1.10.5/share/cmake/hdf5 .

cmake -G "Unix Makefiles"

-DCMAKE_BUILD_TYPE=Debug

-DHDF5_DIR=${HOME}/Downloads/CMake-hdf5-1.10.5/build/_CPack_Packages/Linux/TGZ/HDF5-1.10.5-Linux/HDF_Group/HDF5/1.10.5/share/cmake/hdf5 .

I hope this will save a bit of your time if you ever stumble on this

Subsurface

George Kourakos

Author: Giorgos Kourakos