Survey data#
GeoST offer various functions to read and parse subsurface data to GeoST objects. In general, survey data can either be loaded from (local) files or is requested from a service like the BRO REST-API. Either way, data coming from multiple sources and file formats are support.
Reading survey data#
By survey data we refer to measurements (i.e. raw data) of the subsurface. These can comprise
boreholes, CPTs, Well logs, Seismic or EM lines and others (see Data structures for a more detailed description). In any case, the data is parsed to a geost.Collection. The tables below list the currently supported data sources, associated reader functions and resulting GeoST objects.
File format/data service |
Read function |
Returned GeoST object |
Description |
|---|---|---|---|
BHR-G |
(BRO) Geological boreholes from xml |
||
BHR-GT |
(BRO) Geotechnical boreholes from xml |
||
BHR-GT-samples |
(BRO) Geotechnical boreholes - grainsize samples from xml |
||
BHR-P |
(BRO) Pedological boreholes from xml |
||
CPT |
(BRO) Cone Penetration Tests from xml or gef |
||
SFR |
(BRO) Pedological soilprofile descriptions from xml |
||
BRO REST-API |
BRO BHR-G, BHR-GT, BHR-GT-samples, BHR-P, CPT or SFR objects |
||
Parquet or csv |
Survey data stored as a table. Result of |
||
NLOG excel export |
Reader for NLOG deep cores, see here |
||
UU LLG cores |
Reader for csv distribution of Utrecht University student boreholes |
||
BORIS XML |
Reader for XML exports of the BORIS borehole description software |
Reading data from the BRO REST-API#
Subsurface data is widely available in the Netherlands via the portal of the Basis Registratie Ondergrond (BRO). GeoST can directly load this data for an area of interest.
import geost
# Read a few BRO pedological soil cores in a small area 250 m x 500 m
boreholes = geost.bro_api_read("BHR-P", bbox=(142_000, 455_000, 142_250, 455_500))
boreholes
Collection
header (rows, columns) : (8, 10)
data (rows, columns) : (40, 8)
crs: Amersfoort / RD New
vertical datum: NAP height
You can see that this loads the soil cores as a geost.Collection. This is also supported for geological (BHR-G) and geotechnical (BHR-GT) boreholes, cone penetration test (CPT) data and pedological soil profile descriptions (SFR). This facilitates the
direct use of BRO data within any application.
Reading from local files#
A likely option is to use GeoST to load survey data stored in a tabular format such as
Parquet or csv and use the available selection and analysis methods. For example, suppose you
have survey data for multiple boreholes stored in a local Parquet file. Using
geost.read_table you can
very easily load it into a Collection or if preferred, a pandas.DataFrame and use the data
for further analysis:
borehole_file = geost.data.boreholes_usp(
return_filepath=True
) # Use the filepath instead of directly reading the borehole data
boreholes = geost.read_table(borehole_file)
print(boreholes)
Collection
header (rows, columns) : (67, 5)
data (rows, columns) : (1398, 32)
crs: None
vertical datum: None
Now you can easily select the boreholes that contain peat (“V”) for example:
peat_boreholes = boreholes.select_by_values("lith", "V")
peat_boreholes
Collection
header (rows, columns) : (32, 5)
data (rows, columns) : (670, 32)
crs: None
vertical datum: None
The same thing is possible too with Pandas DataFrames. Note that with DataFrames, any GeoST
methods need to be used through the .gst accessor, like we showed in the Data structures
section. This way we can select the boreholes that contain peat just like before. Let’s load
the same borehole data, but this time as a pandas.DataFrame:
borehole_df = geost.read_table(borehole_file, as_collection=False)
borehole_df.head()
| nr | x | y | surface | end | top | bottom | lith | zm | zmk | ... | cons | color | lutum_pct | plants | shells | kleibrokjes | strat_1975 | strat_2003 | strat_inter | desc | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | B31H0541 | 139585 | 456000 | 1.2 | -9.9 | 0.00 | 0.20 | K | NaN | NaN | ... | NaN | ON | NaN | 0 | 0 | 0 | NaN | EC | NaN | [TEELAARDE#***#****#*] .......................... |
| 1 | B31H0541 | 139585 | 456000 | 1.2 | -9.9 | 0.20 | 0.60 | K | NaN | NaN | ... | NaN | BR | NaN | 0 | 0 | 0 | NaN | EC | NaN | [KLEI#***#****#*] grysbruin. |
| 2 | B31H0541 | 139585 | 456000 | 1.2 | -9.9 | 0.60 | 0.95 | V | NaN | NaN | ... | NaN | BR | NaN | 0 | 0 | 0 | NaN | NI | NaN | [VEEN#***#****#*] donkerbruin. |
| 3 | B31H0541 | 139585 | 456000 | 1.2 | -9.9 | 0.95 | 2.80 | Z | NaN | ZMFO | ... | NaN | GR | NaN | 0 | 0 | 0 | NaN | EC | NaN | [ZAND#***#****#*] FYN TOT matig fyn# iets slib... |
| 4 | B31H0541 | 139585 | 456000 | 1.2 | -9.9 | 2.80 | 4.20 | Z | NaN | ZFC | ... | NaN | BR | NaN | 0 | 0 | 0 | NaN | BXWI | NaN | [ZAND#***#****#*] fyn# grysbruin. |
5 rows × 32 columns
Now we can make the same selection using the .gst accessor:
peat_df = borehole_df.gst.select_by_values("lith", "V")
peat_df
| nr | x | y | surface | end | top | bottom | lith | zm | zmk | ... | cons | color | lutum_pct | plants | shells | kleibrokjes | strat_1975 | strat_2003 | strat_inter | desc | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | B31H0541 | 139585 | 456000 | 1.20 | -9.90 | 0.00 | 0.20 | K | NaN | NaN | ... | NaN | ON | NaN | 0 | 0 | 0 | NaN | EC | NaN | [TEELAARDE#***#****#*] .......................... |
| 1 | B31H0541 | 139585 | 456000 | 1.20 | -9.90 | 0.20 | 0.60 | K | NaN | NaN | ... | NaN | BR | NaN | 0 | 0 | 0 | NaN | EC | NaN | [KLEI#***#****#*] grysbruin. |
| 2 | B31H0541 | 139585 | 456000 | 1.20 | -9.90 | 0.60 | 0.95 | V | NaN | NaN | ... | NaN | BR | NaN | 0 | 0 | 0 | NaN | NI | NaN | [VEEN#***#****#*] donkerbruin. |
| 3 | B31H0541 | 139585 | 456000 | 1.20 | -9.90 | 0.95 | 2.80 | Z | NaN | ZMFO | ... | NaN | GR | NaN | 0 | 0 | 0 | NaN | EC | NaN | [ZAND#***#****#*] FYN TOT matig fyn# iets slib... |
| 4 | B31H0541 | 139585 | 456000 | 1.20 | -9.90 | 2.80 | 4.20 | Z | NaN | ZFC | ... | NaN | BR | NaN | 0 | 0 | 0 | NaN | BXWI | NaN | [ZAND#***#****#*] fyn# grysbruin. |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1352 | B32C1800 | 140571 | 455934 | 2.08 | -2.92 | 0.70 | 1.20 | Z | NaN | ZMF | ... | NaN | GR | NaN | 0 | 0 | 0 | NaN | NaN | NaN | BRON:Boormanager;VAN:70.0000;TOT:120.0000;H:Z;... |
| 1353 | B32C1800 | 140571 | 455934 | 2.08 | -2.92 | 1.20 | 1.50 | K | NaN | NaN | ... | NaN | GR | NaN | 0 | 0 | 0 | NaN | NaN | NaN | BRON:Boormanager;VAN:120.0000;TOT:150.0000;H:K... |
| 1354 | B32C1800 | 140571 | 455934 | 2.08 | -2.92 | 1.50 | 1.70 | V | NaN | NaN | ... | NaN | BR | NaN | 0 | 0 | 0 | NaN | NaN | NaN | BRON:Boormanager;VAN:150.0000;TOT:170.0000;H:V... |
| 1355 | B32C1800 | 140571 | 455934 | 2.08 | -2.92 | 1.70 | 2.00 | Z | NaN | ZUF | ... | NaN | BR | NaN | 0 | 0 | 0 | NaN | NaN | NaN | BRON:Boormanager;VAN:170.0000;TOT:200.0000;H:Z... |
| 1356 | B32C1800 | 140571 | 455934 | 2.08 | -2.92 | 2.00 | 5.00 | Z | NaN | ZUF | ... | NaN | BR | NaN | 0 | 0 | 0 | NaN | NaN | NaN | BRON:Boormanager;VAN:200.0000;TOT:500.0000;H:Z... |
670 rows × 32 columns
Notice that the shape of peat_df is the same as before in the data table of the peat_boreholes Collection: 670 rows x 32 columns.
The selection result is also the same:
peat_boreholes.data.equals(peat_df) # Check equality of the two dataframes
True
As you can see, GeoST functionality can be used interchangeably on geost.Collection objects
or DataFrame objects.
Positional columns#
GeoST requires that several data columns are present to ensure that the methods in a Collection will work.
These are called “positional columns”. The required positional columns differ per type of method. For example,
Collection.slice_depth_interval
requires depth information about the surface level of surveys and the depth of layers while
Collection.select_with_points
requires a valid geometry. The presence of depth information or a valid geometry is needed
for both methods to work however, both presences are optional. The method slice_depth_interval
does not need a geometry to work and select_with_points does not need depth information.
Therefore, their presence is only required when you want to use one of these methods. This
was chosen as design to ensure the most flexibility for users with different needs.
The only mandatory presence is the positional column which identifies each individual survey (e.g. “nr”) in both the header and data table. The table below shows the required positional columns for all methods to work.
Name |
dtype |
Description |
Mandatory |
|---|---|---|---|
nr |
int, float, string |
Identification name/number/code of the point survey |
Yes |
x |
int, float |
X-, Easting- or lon-coordinate |
No |
y |
int, float |
Y-, Northing- or lat-coordinate |
No |
surface |
int, float |
Surface elevation of the point survey in m +NAP |
In methods involving depth |
end |
int, float |
End depth of a point survey in m +NAP |
No |
geometry |
|
Geometry object of the survey location |
In spatial methods |
depth/bottom |
int, float |
Depth of a measurement or bottom depth of a layer with respect to the surface level |
In methods involving depth |
top |
int, float |
Top depth of a layer with respect to the surface level |
No, is used in methods involving depth when the survey data is contains layered information |
Note
The names for the postional columns in the table above are chosen as they are as these are well-descriptive for the type of information they provide. However, it is not mandatory that the positional columns are named exactly like the names in the table above.
GeoST automatically determines which columns to use as positional columns. These can be
checked with any DataFrame by:
borehole_df.gst.positional_columns
{'nr': 'nr',
'surface': 'surface',
'end': 'end',
'x': 'x',
'y': 'y',
'top': 'top',
'depth': 'bottom'}
The table below shows which columns are automatically recognized as positional columns by GeoST.
Name |
Recognized names |
|---|---|
nr |
“nr”, “bro_id”, “nitg_nr”, “nitg”, “boorp” |
x |
“x”, “x-coord”, “longitude”, “lon”, “easting”, “x_bottom_rd”, “x_rd_crd”, “x_calc_crd” |
y |
“y”, “y-coord”, “latitude”, “lat”, “northing”, “y_bottom_rd”, “y_rd_crd”, “y_calc_crd” |
surface |
“surface”, “maaiveld”, “mv”, “height_nap”, “surface_nap” |
end |
“end”, “einddiepte”, “einddiepte_nap”, “end_depth”, “end_depth_nap” |
top |
“top”, “tv_top_nap”, “top_diepte”, “top_depth”, “upperboundary” |
depth |
“depth”, “bottom”, tv_bottom_nap”, “basis_diepte”, “bottom_depth”, lowerboundary” |
Note
geometry is not included in the positional columns as this uses the “active geometry column”
attribute of a geopandas.GeoDataFrame
Note
The recognized names are case-insensitive as each name is checked in lowercase form.
To see how this works, let’s rename the “x” and “y” columns in borehole_df and check the
positional columns again:
borehole_df.rename(columns={"x": "Longitude", "y": "Latitude"}, inplace=True)
borehole_df.gst.positional_columns
{'nr': 'nr',
'surface': 'surface',
'end': 'end',
'x': 'Longitude',
'y': 'Latitude',
'top': 'top',
'depth': 'bottom'}
You can see that the new names are automatically picked up as positional columns. If one of
the positional columns is not picked up, None is returned. We rename the “surface” column
to some unknown name to show this:
borehole_df.rename(columns={"surface": "unknown-surface-name"}, inplace=True)
borehole_df.gst.positional_columns
{'nr': 'nr',
'surface': None,
'end': 'end',
'x': 'Longitude',
'y': 'Latitude',
'top': 'top',
'depth': 'bottom'}
Since “surface” is not recognized anymore, trying an analysis method which needs depth
would now raise a KeyError:
try:
borehole_df.gst.slice_depth_interval(0, 10)
print("Slicing successful") # Only prints if no error is raised
except KeyError as e:
print(e) # Print the error message instead of actually raising the error
"Method 'slice_depth_interval' requires depth information in the DataFrame. Please ensure that the DataFrame contains one of the following the required combinations of depth columns: - 'surface', 'top' and 'bottom' - 'surface' and 'bottom' - 'surface' and 'depth'"
To solve this problem, you can choose to rename your columns to recognized names. Read functions such as geost.read_table have a column_mapper parameter which takes a dictionary with
columns to rename and returns the data with the renamed columns. Read functions raise a UserWarning if any of the optional positional columns cannot be found and a KeyError if the column identifying surveys cannot be found.
Alternatively, GeoST provides a simple way to add any column name to be recognized as a positional column name so you can make all functionality work for any kind of data:
geost.add_positional_columns({"surface": "unknown-surface-name"})
borehole_df.gst.positional_columns
{'nr': 'nr',
'surface': 'unknown-surface-name',
'end': 'end',
'x': 'Longitude',
'y': 'Latitude',
'top': 'top',
'depth': 'bottom'}
Now, the previously unrecognized name works again.
Note
Use the persist=True keyword argument to store the provided column aliases in a user-specific
configuration file so they are automatically recognized in future sessions.
Use with generic Geopandas/Pandas#
The .gst accessor also works on any GeoDataFrame or any DataFrame instance as long as it
contains columns which can be recognized as positional columns, see the previous
section. Therefore, any data that has been loaded or created without GeoST can also use
the provided functionality via the accessor.
Let’s demonstrate this with a simple geopandas.GeoDataFrame containing two points:
import geopandas as gpd
gdf = gpd.GeoDataFrame(
{"nr": [1, 2]}, geometry=gpd.points_from_xy([1, 10], [1, 20]), crs=28992
)
print(gdf)
print("\nSelection result:")
print(gdf.gst.select_within_bbox(0, 0, 2, 2))
nr geometry
0 1 POINT (1 1)
1 2 POINT (10 20)
Selection result:
nr geometry
0 1 POINT (1 1)
This also works for any pandas.DataFrame:
import pandas as pd
df = pd.DataFrame(
{"nr": ["a", "a"], "top": [0, 1], "bottom": [1, 2], "lith": ["clay", "sand"]}
)
print(df)
print("\nSelection result:")
print(df.gst.slice_by_values("lith", "clay"))
nr top bottom lith
0 a 0 1 clay
1 a 1 2 sand
Selection result:
nr top bottom lith
0 a 0 1 clay