While R provides a ton of efficient functions to read formatted data, you often have to read files that do not follow a given pattern throughout the file, e.g the file may contain sections with different formats, or some weird alternating formatting pattern.
If you have worked with Modflow or C2Vsim you know what I’m talking about.
The following, is a trick I have found to make reading a bit easier.
I don’t consider myself experienced R user so I this might not be the most efficient way of doing it, but it has worked very well for me so far, even with very large files.
First I read the entire file
1 |
allfile <- readLines("path/to/myfile.fhb") |
where allfile
is a character vector where each column is a line of the file.
To get the data from each line, first I’m extracting a safe number of characters out of it and split it. This can be done in one line. (The substring can be omitted in most cases, however I have come across files where the end line character ‘\n’ is located many thousand characters away from the actual end which can cause crashes or slowdowns)
1 2 |
maxChar <- 2000 temp <- strsplit(substr(allfile[5], 1, maxChar)[[1]], split = " ")[[1]] |
The temp
variable is a vector of strings, where some of the elements are empty.
For example
1 2 |
> temp [1] "" "" "625" "" "1" "" "2" "" "0" |
Finally this it can be converted to numeric vector as:
1 2 |
temp <- as.numeric((temp[which(temp != "")])) > temp [1] 625 1 2 0 |
I hope that helps.
If there is a better way to do so please leave a comment.