Introduction to GeoST#

This quick introduction will cover some of the key concepts and basic features of GeoST to help you get started. GeoST depends heavily on popular data science libraries Pandas and GeoPandas but GeoST provides readily available, frequently used selections on data held in DataFrame or GeoDataFrame objects. This makes GeoST an easy to use option for less experienced Python users while more experienced users can easily access the underlying DataFrames and develop their own functionalities.

GeoST is designed to work with many different kinds of subsurface data that is available in The Netherlands but, even though still under construction, it will be handle any kind of subsurface data. Below is a list of different data sources which are currently supported or will be supported by GeoST in the future:

Tabular
- Dino geological boreholes (supported)
- BRO CPT data (supported)
File Formats/
- GEF CPT’s (supported)
- Dino XML geological boreholes (planned)
- BRO XML geotechnical boreholes (planned)
- BRO XML soil boreholes (planned)
- GEF boreholes (planned)
- BRO XML CPT’s (planned)
- BRO geopackage CPT’s (planned)
- Well log LAS files (planned)
- Well log ASCII files (planned)
Accessible from the BRO (REST API) (all planned)
- CPT
- BHR-P
- BHR-GT
- BHR-G
BRO Geological models
- GeoTOP (supported)
- REGIS II (planned)
- Soilmap of the Netherlands (planned)

GeoST also plans support for several Geophysical data sources such as Seismic, ERT, EM and others.

Concept#

At the core, GeoST handles data in a so-called Collection object which holds all the spatial information of any kind of data source in a “header” attribute, and the corresponding data in a “data” attribute. So for example, a set of 100 boreholes is held in a BoreholeCollection where the “header” contains one row per data entry and provides information about the id, location, surface level and depths and the “data” has the information of each described layer. When working with these Collections, GeoST automatically keeps track of the alignment and thus makes sure each data entry occurs in both the “header” and “data” attributes. For example, when a user deletes an individual borehole entry from the “header”, the Collection ensures it is deleted from the “data” as well.

User guide

For a more detailed explanation of the types of GeoST objects for different sources of data, check the Data structures in the user guide.

The basics#

BoreholeCollection#

Data is usually loaded through various reader functions (see API reference). For this tutorial, GeoST provides a set of readily available boreholes in the area of the Utrecht Science Park which can be directly loaded as a BoreholeCollection. Let’s read the data, print the result to see what it says and also plot the locations to get an idea where we are:

import geost

usp_boreholes = geost.data.boreholes_usp()
print(usp_boreholes)
usp_boreholes.header.gdf.explore() # Interactive plot of the borehole locations.

Downloading file 'boreholes_usp.parquet' from 'https://github.com/Deltares-research/geost/raw/main/data/boreholes_usp.parquet' to '/home/runner/.cache/geost'.

BoreholeCollection:
# header = 67

Make this Notebook Trusted to load map: File -> Trust Notebook

As you can see it says that ‘usp_boreholes’ is of the type BoreholeCollection. Additionally, it says # header = 67. This means that the collection in total consists of 67 boreholes but it also shows the first key attribute of a collection: the “header” attribute.

As said in the previous section, the “header” attribute in a BoreholeCollection contains all the information about each borehole such as the ID, x- and y-coordinates and further metadata. Additionally, it contains geometry objects for each borehole which allows for spatial selections and exports to GIS-supported formats etc. that are provided by GeoST. In the case of a BoreholeCollection, the header attribute is a PointHeader instance (another key GeoST object). Note, for other types of data (e.g. 2D line data), other objects are used. Let’s see what the attribute looks like by printing it:

print(usp_boreholes.header)

PointHeader instance containing 67 objects
          nr       x       y  surface      end               geometry
 B31H0541  139585  456000    1.200   -9.900  POINT (139585 456000)
 B31H0611  139600  455060    1.200  -23.000  POINT (139600 455060)
 B31H0718  139950  455200    1.300 -271.200  POINT (139950 455200)
 B31H0803  139675  455087    2.160   -4.840  POINT (139675 455087)
 B31H0806  139684  455384    1.000  -49.500  POINT (139684 455384)
..       ...     ...     ...      ...      ...                    ...
B32C1889  140919  456000    1.728   -2.772  POINT (140919 456000)
B32C1890  140938  455878    3.948    0.248  POINT (140938 455878)
B32C1891  141025  455998    2.112   -1.888  POINT (141025 455998)
B32C1892  141091  455966    1.769   -2.731  POINT (141091 455966)
B32C1893  141266  455989    2.328   -2.172  POINT (141266 455989)

[67 rows x 6 columns]

Note, that the printed “header” looks just like a Geopandas GeoDataFrame. This is because PointHeader is basically a wrapper around a GeoDataFrame which provides easily accessible selection and export methods. Therefore, the above interactive plot of the borehole locations was easily created using the “gdf” attribute. More experienced Python users can therefore access the header’s “gdf” attribute and do any customized operation with geodataframes they would normally do.

The other key attribute of a collection is the “data” attribute which is an instance of another key object of GeoST: a LayeredData object. This contains the actual logged data (i.e. layer descriptions) of the boreholes. In this case, the “data” attribute is a LayeredData object because boreholes are usually described in terms of “layers” (i.e. depth intervals over which properties are the same) with respective “top” and “bottom” depths. Let’s see what it looks like:

print(usp_boreholes.data)

LayeredData instance:
            nr       x       y  surface    end   top  bottom lith  zm   zmk  \
   B31H0541  139585  456000    1.200 -9.900  0.00    0.20    K NaN  None   
   B31H0541  139585  456000    1.200 -9.900  0.20    0.60    K NaN  None   
   B31H0541  139585  456000    1.200 -9.900  0.60    0.95    V NaN  None   
   B31H0541  139585  456000    1.200 -9.900  0.95    2.80    Z NaN  ZMFO   
   B31H0541  139585  456000    1.200 -9.900  2.80    4.20    Z NaN   ZFC   
...        ...     ...     ...      ...    ...   ...     ...  ...  ..   ...   
B32C1893  141266  455989    2.328 -2.172  0.00    0.50    Z NaN   ZZF   
B32C1893  141266  455989    2.328 -2.172  0.50    1.10    Z NaN   ZZF   
B32C1893  141266  455989    2.328 -2.172  1.10    1.40    K NaN  None   
B32C1893  141266  455989    2.328 -2.172  1.40    2.10    Z NaN   ZZF   
B32C1893  141266  455989    2.328 -2.172  2.10    4.50    Z NaN   ZZF   

      ...  cons color lutum_pct plants shells  kleibrokjes strat_1975  \
   ...  None    ON       NaN      0      0            0       None   
   ...  None    BR       NaN      0      0            0       None   
   ...  None    BR       NaN      0      0            0       None   
   ...  None    GR       NaN      0      0            0       None   
   ...  None    BR       NaN      0      0            0       None   
...   ...   ...   ...       ...    ...    ...          ...        ...   
...  None    BR       NaN      0      0            0       None   
...  None    GR       NaN      0      0            0       None   
...  CMST    GR       NaN      0      0            0       None   
...  None    GR       NaN      0      0            0       None   
...  None    GR       NaN      0      0            0       None   

     strat_2003 strat_inter                                               desc  
          EC         NaN  [TEELAARDE#***#****#*] ..........................  
          EC         NaN                       [KLEI#***#****#*] grysbruin.  
          NI         NaN                     [VEEN#***#****#*] donkerbruin.  
          EC         NaN  [ZAND#***#****#*] FYN TOT matig fyn# iets slib...  
        BXWI         NaN                  [ZAND#***#****#*] fyn# grysbruin.  
...         ...         ...                                                ...  
     None         NaN       BRON:GEF-BESTAND;0.00;0.50;Zs3h2 PU2;ZZF;BR;  
     None         NaN     BRON:GEF-BESTAND;0.50;1.10;Zs3 KLE2;ZZF;GR DO;  
     None         NaN      BRON:GEF-BESTAND;1.10;1.40;Ks2h1;;GR TBR;KMST  
     None         NaN     BRON:GEF-BESTAND;1.40;2.10;Zs2 SLI1;ZZF;GR DO;  
     None         NaN             BRON:GEF-BESTAND;2.10;4.50;Zs2;ZZF;GR;  

[1398 rows x 32 columns]

Similar to the “header”, the printed LayeredData object is wrapper around a Pandas DataFrame providing easy to use selection and export methods. Also here, more experienced users can access the underlying DataFrame by accessing the data’s “df” attribute. The “data” attribute of this collection contains 32 different columns that hold the relevant borehole data and describes characteristics such as lithology, sand grain size, plant remains and others.

Positional reference#

As said, a collection contains all spatial information about the data both horizontally and vertically. These attributes can be accessed through the “vertical_reference” and “horizontal_reference” attributes:

print(usp_boreholes.vertical_reference)
print(usp_boreholes.horizontal_reference)

EPSG:5709
EPSG:28992

These attributes can be used to reproject the data. For example, changing Dutch “Rijksdriehoekstelsel” coordinates to WGS 84 coordinates or change the vertical reference from Dutch “NAP” to a “Mean Sea Level” plane. Any reprojection automatically updates the coordinates in the data. Let’s change the horizontal reference in “usp_boreholes” and checkout the “header” again to see this:

usp_boreholes.change_horizontal_reference(4326) # Change from RD to WGS 84
print(usp_boreholes.header, usp_boreholes.horizontal_reference, sep="\n")

PointHeader instance containing 67 objects
          nr         x          y  surface      end                  geometry
0   B31H0541  5.162268  52.092042    1.200   -9.900  POINT (5.16227 52.09204)
1   B31H0611  5.162530  52.083594    1.200  -23.000  POINT (5.16253 52.08359)
2   B31H0718  5.167630  52.084862    1.300 -271.200  POINT (5.16763 52.08486)
3   B31H0803  5.163622  52.083839    2.160   -4.840  POINT (5.16362 52.08384)
4   B31H0806  5.163740  52.086508    1.000  -49.500  POINT (5.16374 52.08651)
..       ...       ...        ...      ...      ...                       ...
62  B32C1889  5.181734  52.092078    1.728   -2.772  POINT (5.18173 52.09208)
63  B32C1890  5.182016  52.090982    3.948    0.248  POINT (5.18202 52.09098)
64  B32C1891  5.183281  52.092062    2.112   -1.888  POINT (5.18328 52.09206)
65  B32C1892  5.184245  52.091776    1.769   -2.731  POINT (5.18424 52.09178)
66  B32C1893  5.186798  52.091987    2.328   -2.172   POINT (5.1868 52.09199)

[67 rows x 6 columns]
EPSG:4326

Note that the coordinates in the “x” and “y” columns have indeed been changed to latitude, longitude coordinates.

Selections and slices#

There are several ways to make subsets of a collection, such as:

Spatial selections

select_within_bbox - Select data points in the collection within a bounding box
select_with_points - Select data points in the collection within distance to other point geometries
select_with_lines - Select data points in the collection within distance from line geometries
select_within_polygons - Select data points in the collection within polygon geometries

Conditional selections

select_by_values - Select data points in the collection based on the presence of certain values in one or more of the data columns
select_by_length - Select data points in the collection based on length requirements
select_by_depth - Select data points in the collection based on depth constraints

Slicing

slice_depth_interval - Slice boreholes in the collection down to the specified depth interval
slice_by_values - Slice boreholes in the collection based on value (e.g. only sand layers, remove others).

We will not go through each of these methods. See the API Reference.