Reading IWFM/C2VSim Output files

The latest version of IWFM uses hdf output file format. This makes the reading output files a bit more difficult since the outputs are not in ASCII format but way more efficient. Luckily Matlab, R and many other languages can handle hdf formats quite easily.

Groundwater Zone Budget file

The first step to read any hdf file in matlab is to read the information that is contained in the file

GB_bud_info = h5info(fullfile(c2vsim_path, 'Results','C2VSimFG_GW_ZBudget.hdf'));

The GB_bud_info is a structure with many fields that contain plenty of useful information. After examining the structure in my version the GB_bud_info contained the field Groups, which has the following structure

If we expand the dataset for the Layer1 we see the following structure

From this table we can find the row where each property is located . To read the data for a specific property we use the following. The Storage for example is the row 22. The following snippet loops through the layers and adds the entire storage from all layers

CVstorage = zeros(504, 32537);
for ii = 1:4
    CVstorage = CVstorage + h5read(GB_bud_info.Filename,...
    [GB_bud_info.Groups(1+ii).Name GB_bud_info.Name GB_bud_info.Groups(1+ii).Datasets(22).Name])';

The units of the hdf file are in feet. Since this is storage the units are cubic feet. To generate the famous storage depletion plot of Central valley for example after we have calculated the above we can do the following

% Create the time variable for the plot
start_date = datetime(1973,10,1);
tm = start_date + calmonths(0:503);
% Calculate the storage from all elements
CVcumulativeStorage = sum(CVstorage,2);
% convert from feet to acre feet
CVcumulativeStorage = CVcumulativeStorage * 2.29569e-5
% Convert to million acrefeet
CVcumulativeStorage = CVcumulativeStorage /1000000
% and do the plot
plot(tm,CVcumulativeStorage - CVcumulativeStorage(1),'linewidth',2) 
ylabel('Storage depletion [MAF]')
grid on


In the case of storage the data have dimensions as we see in the ChunkSize field 32537 and we can somewhat safely assume that the 1st row correspond to 1st element etc. However the ChunkSize for the Streams (rows 23,24) is 9885.

    [GB_bud_info.Groups(2).Name GB_bud_info.Name GB_bud_info.Groups(2).Datasets(23).Name])';

The size of the StreamBud  variable is 504×9885. Therefore from this matrix we cannot tell which element corresponds to the values. This information exists under Group(1).Datasets

The 6th row contains the correspondence between data and elements. To read it use

ids = h5read(GB_bud_info.Filename,[GB_bud_info.Groups(1).Name GB_bud_info.Name GB_bud_info.Groups(1).Datasets(6).Name])';

which returns a matrix 24×32537 (note that the above code transpose the output). Obviously the columns are related to the elements but how about the rows? This information exists in the 5th row of the above variable which we can read it as

Colnames = h5read(GB_bud_info.Filename,[GB_bud_info.Groups(1).Name GB_bud_info.Name GB_bud_info.Groups(1).Datasets(5).Name]);

The Colnames variable consist of 24 rows! The 3rd row is the Streams_Inflow (+). The elements that are associated with the stream budget are the nonzero ones in the 3rd row of the ids matrix which can be extracted easily

stream_elem_ids = find(ids(3,:) ~=0);

To be continued....









GIS projections in Matlab and R

In spatial analysis it is very common to gather GIS layers from a variety of sources and more than often the layers may have different coordinate systems. Every GIS software know how to project each layer that comes with different projection information (The projection information is saved in a separate file with suffix *.prj) and can perform spatial analysis tasks on layers with different projection system. 

In Matlab and R however the spatial analysis operations work only if the layers have the same coordinate system. Luckily both languages provide functions to convert between projections. Matlab introduced some functions that help with the transformation in the latest update 2020b. So the following will not work in any previous Matlab version not even in 2020a.


First let’s bring a couple of data into Matlab workspace. The first layer is a point layer with well locations and the second layer contains the Subregions of the C2Vsim model groundwater basins. For the first layer the *.prj file is missing therefore the coordinate is unknown. For the second file we know from the report that the coordinate system  is the EPSG:26910 – NAD83 / UTM zone 10N – Projected.
Mode info about EPSG see here 

% For the well layer
% Read the shapefile information
wells_info = shapeinfo('CVwelldata');
% Read the shapefile geometry
wells = shaperead('CVwelldata');

% For the Subregion layer
% Read the shapefile information
C2Vsim_basins_info = shapeinfo('C2Vsim_subBasins');
% Read the shapefile geometry
C2Vsim_basins = shaperead('C2Vsim_subBasins');

The wells_info.CoordinateReferenceSystem returns an empty field because the projection information is missing. On the other hand  C2Vsim_basins_info.CoordinateReferenceSystem returns the following structure

ans = 

  projcrs with properties:

                    Name: "NAD83 / UTM zone 10N"
           GeographicCRS: [1×1 geocrs]
        ProjectionMethod: "Transverse Mercator"
              LengthUnit: "meter"
    ProjectionParameters: [1×1]

Unfortunately I’m not aware of any automated process to identify the projection system but we can make guesses by examining the coordinate values. If we plot the well data set we get the following image

plot([wells.X]', [wells.Y]','.')
axis equal

We can see that the range of X and Y coordinates is between -124 – -114 and 33 – 42. If you work with California data this range corresponds to the WGS 84 system with EPSG:4326. This is the coordinate system that many web applications use such as google maps, earth engine etc. use. We can easily verify our assumptions in matlab using the webmap tool. If the coordinate system is indeed 4326 then they should be projected in the right place.

webmap % starts the webmap
wmmarker([wells.Y]', [wells.X]','Alpha',1, 'Color','green', 'IconScale', 0.2, ...

We can see that the well locations fall in the right place. Now we can transform this to any coordinate system using. Note that you may have to switch X and Y

[Wx,Wy] = projfwd(projcrs(3310),[wells.Y]', [wells.X]');

Next we have to convert the basin coordinates from EPSG:26910 to EPSG:3310. I haven’t found yet and way to do this transformation using one line so I first convert from EPSG:26910 to EPSG:4326 using projinv and then convert to EPSG:3310. Also because the projinv and projfwd accept only X,Y coordinates not structures we have to loop through the polygons

for ii = 1:length(C2Vsim_basins)
    [lat,lon] = projinv(projcrs(26910),C2Vsim_basins(ii,1).X, C2Vsim_basins(ii,1).Y);
    [C2Vsim_basins(ii,1).X3310, C2Vsim_basins(ii,1).Y3310] = projfwd(projcrs(3310),lat, lon);

Now that the two layers have their coordinates on the same system we can overlay them

hold on
for ii = 1:length(C2Vsim_basins)
    plot(C2Vsim_basins(ii,1).X3310, C2Vsim_basins(ii,1).Y3310,'r')


Now that the data sets share the same coordinate system we can do spatial operations. For example we can isolate the wells that are located in the southern most sub-basin of the Central Valley.

in = inpolygon(Wx, Wy, C2Vsim_basins(21,1).X3310, C2Vsim_basins(21,1).Y3310);
plot(Wx(in), Wy(in),'.')
hold on
plot(C2Vsim_basins(21,1).X3310, C2Vsim_basins(21,1).Y3310,'r')



In R the spatial transformations are even easier. First load the library library("rgdal")

The transformation between coordinate system is just one line of code

# Read the data sets
wells <- readOGR(dsn = '.', layer = 'CVwelldata')
C2Vsim_basins <- readOGR(dsn = '.', layer = 'C2Vsim_subBasins')

# Change the coordinate system to 3310
C2Vsim_basins_3310 <- spTransform(C2Vsim_basins, CRS("+init=epsg:3310"))
# Because the wells have no coordinate information first we have to assign the correct one
# we know that this is the 4326
crs(wells) <- CRS('+init=EPSG:4326')
# Now we can make the transformation
wells_3310 <- spTransform(wells, CRS("+init=epsg:3310"))



Reading custom ASCII files with R

While R provides a ton of efficient functions to read formatted data, you often have to read files that do not follow a given pattern throughout the file, e.g the file may contain sections with different formats, or some weird alternating formatting pattern.
If you have worked with Modflow or C2Vsim you know what I’m talking about.

The following, is a trick I have found to make reading a bit easier.
I don’t consider myself experienced R user so I this might not be the most efficient way of doing it, but it has worked very well for me so far, even with very large files.

First I read the entire file

allfile <- readLines("path/to/myfile.fhb")

where allfile is a character vector where each column is a line of the file.

To get the data from each line, first I’m extracting a safe number of characters out of it and split it. This can be done in one line. (The substring can be omitted in most cases, however I have come across files where the end line character ‘\n’ is located many thousand characters away from the actual end which can cause crashes or slowdowns)

maxChar <- 2000 
temp <- strsplit(substr(allfile[5], 1, maxChar)[[1]], split = " ")[[1]]

The temp variable is a vector of strings, where some of the elements are empty.
For example

> temp 
[1] "" "" "625" "" "1" "" "2" "" "0"

Finally this it can be converted to numeric vector as:

temp <- as.numeric((temp[which(temp != "")])) 
> temp [1] 625 1 2 0

I hope that helps.

If there is a better way to do so please leave a comment.