Survey data#

GeoST offer various functions to read and parse subsurface data to GeoST objects. In general, survey data can either be loaded from (local) files or is requested from a service like the BRO REST-API. Either way, data coming from multiple sources and file formats are support.

Reading survey data#

By survey data we refer to measurements (i.e. raw data) of the subsurface. These can comprise boreholes, CPTs, Well logs, Seismic or EM lines and others (see Data structures for a more detailed description). In any case, the data is parsed to a geost.Collection. The tables below list the currently supported data sources, associated reader functions and resulting GeoST objects.

File format/data service	Read function	Returned GeoST object	Description
BHR-G	`read_bhrg`	`Collection`	(BRO) Geological boreholes from xml
BHR-GT	`read_bhrgt`	`Collection`	(BRO) Geotechnical boreholes from xml
BHR-GT-samples	`read_bhrgt_samples`	`Collection`	(BRO) Geotechnical boreholes - grainsize samples from xml
BHR-P	`read_bhrp`	`Collection`	(BRO) Pedological boreholes from xml
CPT	`read_cpt` `read_gef_cpts`	`Collection`	(BRO) Cone Penetration Tests from xml or gef
SFR	`read_sfr`	`Collection`	(BRO) Pedological soilprofile descriptions from xml
BRO REST-API	`bro_api_read`	`Collection`	BRO BHR-G, BHR-GT, BHR-GT-samples, BHR-P, CPT or SFR objects
Parquet or csv	`read_table` `read_borehole_table` `read_cpt_table`	`Collection` or `DataFrame`	Survey data stored as a table. Result of `to_parquet` or `to_csv` export methods.
NLOG excel export	`read_nlog_cores`	`Collection` or `DataFrame`	Reader for NLOG deep cores, see here
UU LLG cores	`read_uullg_tables`	`Collection` or `DataFrame`	Reader for csv distribution of Utrecht University student boreholes
BORIS XML	`read_xml_boris`	`Collection` or `DataFrame`	Reader for XML exports of the BORIS borehole description software

Reading data from the BRO REST-API#

Subsurface data is widely available in the Netherlands via the portal of the Basis Registratie Ondergrond (BRO). GeoST can directly load this data for an area of interest.

import geost

# Read a few BRO pedological soil cores in a small area 250 m x 500 m
boreholes = geost.bro_api_read("BHR-P", bbox=(142_000, 455_000, 142_250, 455_500))
boreholes

Collection
  header (rows, columns) : (8, 10)
  data (rows, columns)   : (40, 8)
crs: Amersfoort / RD New
vertical datum: NAP height

You can see that this loads the soil cores as a geost.Collection. This is also supported for geological (BHR-G) and geotechnical (BHR-GT) boreholes, cone penetration test (CPT) data and pedological soil profile descriptions (SFR). This facilitates the direct use of BRO data within any application.

Reading from local files#

A likely option is to use GeoST to load survey data stored in a tabular format such as Parquet or csv and use the available selection and analysis methods. For example, suppose you have survey data for multiple boreholes stored in a local Parquet file. Using geost.read_table you can very easily load it into a Collection or if preferred, a pandas.DataFrame and use the data for further analysis:

borehole_file = geost.data.boreholes_usp(
    return_filepath=True
)  # Use the filepath instead of directly reading the borehole data
boreholes = geost.read_table(borehole_file)
print(boreholes)

Collection
  header (rows, columns) : (67, 5)
  data (rows, columns)   : (1398, 32)
crs: None
vertical datum: None

Now you can easily select the boreholes that contain peat (“V”) for example:

peat_boreholes = boreholes.select_by_values("lith", "V")
peat_boreholes

Collection
  header (rows, columns) : (32, 5)
  data (rows, columns)   : (670, 32)
crs: None
vertical datum: None

The same thing is possible too with Pandas DataFrames. Note that with DataFrames, any GeoST methods need to be used through the .gst accessor, like we showed in the Data structures section. This way we can select the boreholes that contain peat just like before. Let’s load the same borehole data, but this time as a pandas.DataFrame:

borehole_df = geost.read_table(borehole_file, as_collection=False)
borehole_df.head()

	nr	x	y	surface	end	top	bottom	lith	zm	zmk	...	cons	color	lutum_pct	strat_1975	strat_2003	strat_inter	desc
0	B31H0541	139585	456000	1.2	-9.9	0.00	0.20	K	NaN	NaN	...	NaN	ON	NaN	NaN	EC	NaN	[TEELAARDE#*#*#] ..........................
1	B31H0541	139585	456000	1.2	-9.9	0.20	0.60	K	NaN	NaN	...	NaN	BR	NaN	NaN	EC	NaN	[KLEI#*#*#] grysbruin.
2	B31H0541	139585	456000	1.2	-9.9	0.60	0.95	V	NaN	NaN	...	NaN	BR	NaN	NaN	NI	NaN	[VEEN#*#*#] donkerbruin.
3	B31H0541	139585	456000	1.2	-9.9	0.95	2.80	Z	NaN	ZMFO	...	NaN	GR	NaN	NaN	EC	NaN	[ZAND#*#*#] FYN TOT matig fyn# iets slib...
4	B31H0541	139585	456000	1.2	-9.9	2.80	4.20	Z	NaN	ZFC	...	NaN	BR	NaN	NaN	BXWI	NaN	[ZAND#*#*#] fyn# grysbruin.

5 rows × 32 columns

Now we can make the same selection using the .gst accessor:

peat_df = borehole_df.gst.select_by_values("lith", "V")
peat_df

	nr	x	y	surface	end	top	bottom	lith	zm	zmk	...	cons	color	lutum_pct	plants	shells	kleibrokjes	strat_1975	strat_2003	strat_inter	desc
0	B31H0541	139585	456000	1.20	-9.90	0.00	0.20	K	NaN	NaN	...	NaN	ON	NaN	0	0	0	NaN	EC	NaN	[TEELAARDE#*#*#] ..........................
1	B31H0541	139585	456000	1.20	-9.90	0.20	0.60	K	NaN	NaN	...	NaN	BR	NaN	0	0	0	NaN	EC	NaN	[KLEI#*#*#] grysbruin.
2	B31H0541	139585	456000	1.20	-9.90	0.60	0.95	V	NaN	NaN	...	NaN	BR	NaN	0	0	0	NaN	NI	NaN	[VEEN#*#*#] donkerbruin.
3	B31H0541	139585	456000	1.20	-9.90	0.95	2.80	Z	NaN	ZMFO	...	NaN	GR	NaN	0	0	0	NaN	EC	NaN	[ZAND#*#*#] FYN TOT matig fyn# iets slib...
4	B31H0541	139585	456000	1.20	-9.90	2.80	4.20	Z	NaN	ZFC	...	NaN	BR	NaN	0	0	0	NaN	BXWI	NaN	[ZAND#*#*#] fyn# grysbruin.
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1352	B32C1800	140571	455934	2.08	-2.92	0.70	1.20	Z	NaN	ZMF	...	NaN	GR	NaN	0	0	0	NaN	NaN	NaN	BRON:Boormanager;VAN:70.0000;TOT:120.0000;H:Z;...
1353	B32C1800	140571	455934	2.08	-2.92	1.20	1.50	K	NaN	NaN	...	NaN	GR	NaN	0	0	0	NaN	NaN	NaN	BRON:Boormanager;VAN:120.0000;TOT:150.0000;H:K...
1354	B32C1800	140571	455934	2.08	-2.92	1.50	1.70	V	NaN	NaN	...	NaN	BR	NaN	0	0	0	NaN	NaN	NaN	BRON:Boormanager;VAN:150.0000;TOT:170.0000;H:V...
1355	B32C1800	140571	455934	2.08	-2.92	1.70	2.00	Z	NaN	ZUF	...	NaN	BR	NaN	0	0	0	NaN	NaN	NaN	BRON:Boormanager;VAN:170.0000;TOT:200.0000;H:Z...
1356	B32C1800	140571	455934	2.08	-2.92	2.00	5.00	Z	NaN	ZUF	...	NaN	BR	NaN	0	0	0	NaN	NaN	NaN	BRON:Boormanager;VAN:200.0000;TOT:500.0000;H:Z...

670 rows × 32 columns

Notice that the shape of peat_df is the same as before in the data table of the peat_boreholes Collection: 670 rows x 32 columns. The selection result is also the same:

peat_boreholes.data.equals(peat_df)  # Check equality of the two dataframes

True

As you can see, GeoST functionality can be used interchangeably on geost.Collection objects or DataFrame objects.

Positional columns#

GeoST requires that several data columns are present to ensure that the methods in a Collection will work. These are called “positional columns”. The required positional columns differ per type of method. For example, Collection.slice_depth_interval requires depth information about the surface level of surveys and the depth of layers while Collection.select_with_points requires a valid geometry. The presence of depth information or a valid geometry is needed for both methods to work however, both presences are optional. The method slice_depth_interval does not need a geometry to work and select_with_points does not need depth information. Therefore, their presence is only required when you want to use one of these methods. This was chosen as design to ensure the most flexibility for users with different needs.

The only mandatory presence is the positional column which identifies each individual survey (e.g. “nr”) in both the header and data table. The table below shows the required positional columns for all methods to work.

Name	dtype	Description	Mandatory
nr	int, float, string	Identification name/number/code of the point survey	Yes
x	int, float	X-, Easting- or lon-coordinate	No
y	int, float	Y-, Northing- or lat-coordinate	No
surface	int, float	Surface elevation of the point survey in m +NAP	In methods involving depth
end	int, float	End depth of a point survey in m +NAP	No
geometry	`shapely.geometry.Point` in case of point data	Geometry object of the survey location	In spatial methods
depth/bottom	int, float	Depth of a measurement or bottom depth of a layer with respect to the surface level	In methods involving depth
top	int, float	Top depth of a layer with respect to the surface level	No, is used in methods involving depth when the survey data is contains layered information

Note

The names for the postional columns in the table above are chosen as they are as these are well-descriptive for the type of information they provide. However, it is not mandatory that the positional columns are named exactly like the names in the table above.

GeoST automatically determines which columns to use as positional columns. These can be checked with any DataFrame by:

borehole_df.gst.positional_columns

{'nr': 'nr',
 'surface': 'surface',
 'end': 'end',
 'x': 'x',
 'y': 'y',
 'top': 'top',
 'depth': 'bottom'}

The table below shows which columns are automatically recognized as positional columns by GeoST.

Name	Recognized names
nr	“nr”, “bro_id”, “nitg_nr”, “nitg”, “boorp”
x	“x”, “x-coord”, “longitude”, “lon”, “easting”, “x_bottom_rd”, “x_rd_crd”, “x_calc_crd”
y	“y”, “y-coord”, “latitude”, “lat”, “northing”, “y_bottom_rd”, “y_rd_crd”, “y_calc_crd”
surface	“surface”, “maaiveld”, “mv”, “height_nap”, “surface_nap”
end	“end”, “einddiepte”, “einddiepte_nap”, “end_depth”, “end_depth_nap”
top	“top”, “tv_top_nap”, “top_diepte”, “top_depth”, “upperboundary”
depth	“depth”, “bottom”, tv_bottom_nap”, “basis_diepte”, “bottom_depth”, lowerboundary”

Note

geometry is not included in the positional columns as this uses the “active geometry column” attribute of a geopandas.GeoDataFrame

Note

The recognized names are case-insensitive as each name is checked in lowercase form.

To see how this works, let’s rename the “x” and “y” columns in borehole_df and check the positional columns again:

borehole_df.rename(columns={"x": "Longitude", "y": "Latitude"}, inplace=True)
borehole_df.gst.positional_columns

{'nr': 'nr',
 'surface': 'surface',
 'end': 'end',
 'x': 'Longitude',
 'y': 'Latitude',
 'top': 'top',
 'depth': 'bottom'}

You can see that the new names are automatically picked up as positional columns. If one of the positional columns is not picked up, None is returned. We rename the “surface” column to some unknown name to show this:

borehole_df.rename(columns={"surface": "unknown-surface-name"}, inplace=True)
borehole_df.gst.positional_columns

{'nr': 'nr',
 'surface': None,
 'end': 'end',
 'x': 'Longitude',
 'y': 'Latitude',
 'top': 'top',
 'depth': 'bottom'}

Since “surface” is not recognized anymore, trying an analysis method which needs depth would now raise a KeyError:

try:
    borehole_df.gst.slice_depth_interval(0, 10)
    print("Slicing successful")  # Only prints if no error is raised
except KeyError as e:
    print(e)  # Print the error message instead of actually raising the error

"Method 'slice_depth_interval' requires depth information in the DataFrame. Please ensure that the DataFrame contains one of the following the required combinations of depth columns: - 'surface', 'top' and 'bottom' - 'surface' and 'bottom' - 'surface' and 'depth'"

To solve this problem, you can choose to rename your columns to recognized names. Read functions such as geost.read_table have a column_mapper parameter which takes a dictionary with columns to rename and returns the data with the renamed columns. Read functions raise a UserWarning if any of the optional positional columns cannot be found and a KeyError if the column identifying surveys cannot be found.

Alternatively, GeoST provides a simple way to add any column name to be recognized as a positional column name so you can make all functionality work for any kind of data:

geost.add_positional_columns({"surface": "unknown-surface-name"})
borehole_df.gst.positional_columns

{'nr': 'nr',
 'surface': 'unknown-surface-name',
 'end': 'end',
 'x': 'Longitude',
 'y': 'Latitude',
 'top': 'top',
 'depth': 'bottom'}

Now, the previously unrecognized name works again.

Note

Use the persist=True keyword argument to store the provided column aliases in a user-specific configuration file so they are automatically recognized in future sessions.

Use with generic Geopandas/Pandas#

The .gst accessor also works on any GeoDataFrame or any DataFrame instance as long as it contains columns which can be recognized as positional columns, see the previous section. Therefore, any data that has been loaded or created without GeoST can also use the provided functionality via the accessor.

Let’s demonstrate this with a simple geopandas.GeoDataFrame containing two points:

import geopandas as gpd

gdf = gpd.GeoDataFrame(
    {"nr": [1, 2]}, geometry=gpd.points_from_xy([1, 10], [1, 20]), crs=28992
)
print(gdf)
print("\nSelection result:")
print(gdf.gst.select_within_bbox(0, 0, 2, 2))

   nr       geometry
0   1    POINT (1 1)
1   2  POINT (10 20)

Selection result:
   nr     geometry
0   1  POINT (1 1)

This also works for any pandas.DataFrame:

import pandas as pd

df = pd.DataFrame(
    {"nr": ["a", "a"], "top": [0, 1], "bottom": [1, 2], "lith": ["clay", "sand"]}
)
print(df)
print("\nSelection result:")
print(df.gst.slice_by_values("lith", "clay"))

  nr  top  bottom  lith
0  a    0       1  clay
1  a    1       2  sand

Selection result:
  nr  top  bottom  lith
0  a    0       1  clay