Survey data#

GeoST offer various functions to read and parse subsurface data to GeoST objects. In general, survey data can either be loaded from (local) files or is requested from a service like the BRO REST-API. Either way, data coming from multiple sources and file formats are support.

Reading survey data#

By survey data we refer to measurements (i.e. raw data) of the subsurface. These can comprise boreholes, CPTs, Well logs, Seismic or EM lines and others (see Data structures for a more detailed description). In any case, the data is parsed to a geost.Collection. The tables below list the currently supported data sources, associated reader functions and resulting GeoST objects.

File format/data service

Read function

Returned GeoST object

Description

BHR-G

read_bhrg

Collection

(BRO) Geological boreholes from xml

BHR-GT

read_bhrgt

Collection

(BRO) Geotechnical boreholes from xml

BHR-GT-samples

read_bhrgt_samples

Collection

(BRO) Geotechnical boreholes - grainsize samples from xml

BHR-P

read_bhrp

Collection

(BRO) Pedological boreholes from xml

CPT

read_cpt read_gef_cpts

Collection

(BRO) Cone Penetration Tests from xml or gef

SFR

read_sfr

Collection

(BRO) Pedological soilprofile descriptions from xml

BRO REST-API

bro_api_read

Collection

BRO BHR-G, BHR-GT, BHR-GT-samples, BHR-P, CPT or SFR objects

Parquet or csv

read_table read_borehole_table read_cpt_table

Collection or DataFrame

Survey data stored as a table. Result of to_parquet or to_csv export methods.

NLOG excel export

read_nlog_cores

Collection or DataFrame

Reader for NLOG deep cores, see here

UU LLG cores

read_uullg_tables

Collection or DataFrame

Reader for csv distribution of Utrecht University student boreholes

BORIS XML

read_xml_boris

Collection or DataFrame

Reader for XML exports of the BORIS borehole description software

Reading data from the BRO REST-API#

Subsurface data is widely available in the Netherlands via the portal of the Basis Registratie Ondergrond (BRO). GeoST can directly load this data for an area of interest.

import geost

# Read a few BRO pedological soil cores in a small area 250 m x 500 m
boreholes = geost.bro_api_read("BHR-P", bbox=(142_000, 455_000, 142_250, 455_500))
boreholes
Collection
  header (rows, columns) : (8, 10)
  data (rows, columns)   : (40, 8)
crs: Amersfoort / RD New
vertical datum: NAP height

You can see that this loads the soil cores as a geost.Collection. This is also supported for geological (BHR-G) and geotechnical (BHR-GT) boreholes, cone penetration test (CPT) data and pedological soil profile descriptions (SFR). This facilitates the direct use of BRO data within any application.

Reading from local files#

A likely option is to use GeoST to load survey data stored in a tabular format such as Parquet or csv and use the available selection and analysis methods. For example, suppose you have survey data for multiple boreholes stored in a local Parquet file. Using geost.read_table you can very easily load it into a Collection or if preferred, a pandas.DataFrame and use the data for further analysis:

borehole_file = geost.data.boreholes_usp(
    return_filepath=True
)  # Use the filepath instead of directly reading the borehole data
boreholes = geost.read_table(borehole_file)
print(boreholes)
Collection
  header (rows, columns) : (67, 5)
  data (rows, columns)   : (1398, 32)
crs: None
vertical datum: None

Now you can easily select the boreholes that contain peat (“V”) for example:

peat_boreholes = boreholes.select_by_values("lith", "V")
peat_boreholes
Collection
  header (rows, columns) : (32, 5)
  data (rows, columns)   : (670, 32)
crs: None
vertical datum: None

The same thing is possible too with Pandas DataFrames. Note that with DataFrames, any GeoST methods need to be used through the .gst accessor, like we showed in the Data structures section. This way we can select the boreholes that contain peat just like before. Let’s load the same borehole data, but this time as a pandas.DataFrame:

borehole_df = geost.read_table(borehole_file, as_collection=False)
borehole_df.head()
nr x y surface end top bottom lith zm zmk ... cons color lutum_pct plants shells kleibrokjes strat_1975 strat_2003 strat_inter desc
0 B31H0541 139585 456000 1.2 -9.9 0.00 0.20 K NaN NaN ... NaN ON NaN 0 0 0 NaN EC NaN [TEELAARDE#***#****#*] ..........................
1 B31H0541 139585 456000 1.2 -9.9 0.20 0.60 K NaN NaN ... NaN BR NaN 0 0 0 NaN EC NaN [KLEI#***#****#*] grysbruin.
2 B31H0541 139585 456000 1.2 -9.9 0.60 0.95 V NaN NaN ... NaN BR NaN 0 0 0 NaN NI NaN [VEEN#***#****#*] donkerbruin.
3 B31H0541 139585 456000 1.2 -9.9 0.95 2.80 Z NaN ZMFO ... NaN GR NaN 0 0 0 NaN EC NaN [ZAND#***#****#*] FYN TOT matig fyn# iets slib...
4 B31H0541 139585 456000 1.2 -9.9 2.80 4.20 Z NaN ZFC ... NaN BR NaN 0 0 0 NaN BXWI NaN [ZAND#***#****#*] fyn# grysbruin.

5 rows × 32 columns

Now we can make the same selection using the .gst accessor:

peat_df = borehole_df.gst.select_by_values("lith", "V")
peat_df
nr x y surface end top bottom lith zm zmk ... cons color lutum_pct plants shells kleibrokjes strat_1975 strat_2003 strat_inter desc
0 B31H0541 139585 456000 1.20 -9.90 0.00 0.20 K NaN NaN ... NaN ON NaN 0 0 0 NaN EC NaN [TEELAARDE#***#****#*] ..........................
1 B31H0541 139585 456000 1.20 -9.90 0.20 0.60 K NaN NaN ... NaN BR NaN 0 0 0 NaN EC NaN [KLEI#***#****#*] grysbruin.
2 B31H0541 139585 456000 1.20 -9.90 0.60 0.95 V NaN NaN ... NaN BR NaN 0 0 0 NaN NI NaN [VEEN#***#****#*] donkerbruin.
3 B31H0541 139585 456000 1.20 -9.90 0.95 2.80 Z NaN ZMFO ... NaN GR NaN 0 0 0 NaN EC NaN [ZAND#***#****#*] FYN TOT matig fyn# iets slib...
4 B31H0541 139585 456000 1.20 -9.90 2.80 4.20 Z NaN ZFC ... NaN BR NaN 0 0 0 NaN BXWI NaN [ZAND#***#****#*] fyn# grysbruin.
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1352 B32C1800 140571 455934 2.08 -2.92 0.70 1.20 Z NaN ZMF ... NaN GR NaN 0 0 0 NaN NaN NaN BRON:Boormanager;VAN:70.0000;TOT:120.0000;H:Z;...
1353 B32C1800 140571 455934 2.08 -2.92 1.20 1.50 K NaN NaN ... NaN GR NaN 0 0 0 NaN NaN NaN BRON:Boormanager;VAN:120.0000;TOT:150.0000;H:K...
1354 B32C1800 140571 455934 2.08 -2.92 1.50 1.70 V NaN NaN ... NaN BR NaN 0 0 0 NaN NaN NaN BRON:Boormanager;VAN:150.0000;TOT:170.0000;H:V...
1355 B32C1800 140571 455934 2.08 -2.92 1.70 2.00 Z NaN ZUF ... NaN BR NaN 0 0 0 NaN NaN NaN BRON:Boormanager;VAN:170.0000;TOT:200.0000;H:Z...
1356 B32C1800 140571 455934 2.08 -2.92 2.00 5.00 Z NaN ZUF ... NaN BR NaN 0 0 0 NaN NaN NaN BRON:Boormanager;VAN:200.0000;TOT:500.0000;H:Z...

670 rows × 32 columns

Notice that the shape of peat_df is the same as before in the data table of the peat_boreholes Collection: 670 rows x 32 columns. The selection result is also the same:

peat_boreholes.data.equals(peat_df)  # Check equality of the two dataframes
True

As you can see, GeoST functionality can be used interchangeably on geost.Collection objects or DataFrame objects.

Positional columns#

GeoST requires that several data columns are present to ensure that the methods in a Collection will work. These are called “positional columns”. The required positional columns differ per type of method. For example, Collection.slice_depth_interval requires depth information about the surface level of surveys and the depth of layers while Collection.select_with_points requires a valid geometry. The presence of depth information or a valid geometry is needed for both methods to work however, both presences are optional. The method slice_depth_interval does not need a geometry to work and select_with_points does not need depth information. Therefore, their presence is only required when you want to use one of these methods. This was chosen as design to ensure the most flexibility for users with different needs.

The only mandatory presence is the positional column which identifies each individual survey (e.g. “nr”) in both the header and data table. The table below shows the required positional columns for all methods to work.

Name

dtype

Description

Mandatory

nr

int, float, string

Identification name/number/code of the point survey

Yes

x

int, float

X-, Easting- or lon-coordinate

No

y

int, float

Y-, Northing- or lat-coordinate

No

surface

int, float

Surface elevation of the point survey in m +NAP

In methods involving depth

end

int, float

End depth of a point survey in m +NAP

No

geometry

shapely.geometry.Point in case of point data

Geometry object of the survey location

In spatial methods

depth/bottom

int, float

Depth of a measurement or bottom depth of a layer with respect to the surface level

In methods involving depth

top

int, float

Top depth of a layer with respect to the surface level

No, is used in methods involving depth when the survey data is contains layered information

Note

The names for the postional columns in the table above are chosen as they are as these are well-descriptive for the type of information they provide. However, it is not mandatory that the positional columns are named exactly like the names in the table above.

GeoST automatically determines which columns to use as positional columns. These can be checked with any DataFrame by:

borehole_df.gst.positional_columns
{'nr': 'nr',
 'surface': 'surface',
 'end': 'end',
 'x': 'x',
 'y': 'y',
 'top': 'top',
 'depth': 'bottom'}

The table below shows which columns are automatically recognized as positional columns by GeoST.

Name

Recognized names

nr

“nr”, “bro_id”, “nitg_nr”, “nitg”, “boorp”

x

“x”, “x-coord”, “longitude”, “lon”, “easting”, “x_bottom_rd”, “x_rd_crd”, “x_calc_crd”

y

“y”, “y-coord”, “latitude”, “lat”, “northing”, “y_bottom_rd”, “y_rd_crd”, “y_calc_crd”

surface

“surface”, “maaiveld”, “mv”, “height_nap”, “surface_nap”

end

“end”, “einddiepte”, “einddiepte_nap”, “end_depth”, “end_depth_nap”

top

“top”, “tv_top_nap”, “top_diepte”, “top_depth”, “upperboundary”

depth

“depth”, “bottom”, tv_bottom_nap”, “basis_diepte”, “bottom_depth”, lowerboundary”

Note

geometry is not included in the positional columns as this uses the “active geometry column” attribute of a geopandas.GeoDataFrame

Note

The recognized names are case-insensitive as each name is checked in lowercase form.

To see how this works, let’s rename the “x” and “y” columns in borehole_df and check the positional columns again:

borehole_df.rename(columns={"x": "Longitude", "y": "Latitude"}, inplace=True)
borehole_df.gst.positional_columns
{'nr': 'nr',
 'surface': 'surface',
 'end': 'end',
 'x': 'Longitude',
 'y': 'Latitude',
 'top': 'top',
 'depth': 'bottom'}

You can see that the new names are automatically picked up as positional columns. If one of the positional columns is not picked up, None is returned. We rename the “surface” column to some unknown name to show this:

borehole_df.rename(columns={"surface": "unknown-surface-name"}, inplace=True)
borehole_df.gst.positional_columns
{'nr': 'nr',
 'surface': None,
 'end': 'end',
 'x': 'Longitude',
 'y': 'Latitude',
 'top': 'top',
 'depth': 'bottom'}

Since “surface” is not recognized anymore, trying an analysis method which needs depth would now raise a KeyError:

try:
    borehole_df.gst.slice_depth_interval(0, 10)
    print("Slicing successful")  # Only prints if no error is raised
except KeyError as e:
    print(e)  # Print the error message instead of actually raising the error
"Method 'slice_depth_interval' requires depth information in the DataFrame. Please ensure that the DataFrame contains one of the following the required combinations of depth columns: - 'surface', 'top' and 'bottom' - 'surface' and 'bottom' - 'surface' and 'depth'"

To solve this problem, you can choose to rename your columns to recognized names. Read functions such as geost.read_table have a column_mapper parameter which takes a dictionary with columns to rename and returns the data with the renamed columns. Read functions raise a UserWarning if any of the optional positional columns cannot be found and a KeyError if the column identifying surveys cannot be found.

Alternatively, GeoST provides a simple way to add any column name to be recognized as a positional column name so you can make all functionality work for any kind of data:

geost.add_positional_columns({"surface": "unknown-surface-name"})
borehole_df.gst.positional_columns
{'nr': 'nr',
 'surface': 'unknown-surface-name',
 'end': 'end',
 'x': 'Longitude',
 'y': 'Latitude',
 'top': 'top',
 'depth': 'bottom'}

Now, the previously unrecognized name works again.

Note

Use the persist=True keyword argument to store the provided column aliases in a user-specific configuration file so they are automatically recognized in future sessions.

Use with generic Geopandas/Pandas#

The .gst accessor also works on any GeoDataFrame or any DataFrame instance as long as it contains columns which can be recognized as positional columns, see the previous section. Therefore, any data that has been loaded or created without GeoST can also use the provided functionality via the accessor.

Let’s demonstrate this with a simple geopandas.GeoDataFrame containing two points:

import geopandas as gpd

gdf = gpd.GeoDataFrame(
    {"nr": [1, 2]}, geometry=gpd.points_from_xy([1, 10], [1, 20]), crs=28992
)
print(gdf)
print("\nSelection result:")
print(gdf.gst.select_within_bbox(0, 0, 2, 2))
   nr       geometry
0   1    POINT (1 1)
1   2  POINT (10 20)

Selection result:
   nr     geometry
0   1  POINT (1 1)

This also works for any pandas.DataFrame:

import pandas as pd

df = pd.DataFrame(
    {"nr": ["a", "a"], "top": [0, 1], "bottom": [1, 2], "lith": ["clay", "sand"]}
)
print(df)
print("\nSelection result:")
print(df.gst.slice_by_values("lith", "clay"))
  nr  top  bottom  lith
0  a    0       1  clay
1  a    1       2  sand

Selection result:
  nr  top  bottom  lith
0  a    0       1  clay