geost.read_borehole_table#

geost.read_borehole_table(file: str | Path, as_collection: bool = True, crs: str | int | CRS = None, vertical_datum: str | int | CRS = None, has_inclined: bool = False, coordinate_names: tuple[str, str] = None, include_in_header: str | Iterable[str] | None = None, column_mapper: dict = None, **kwargs) Collection | DataFrame[source]#

Read tabular borehole information from a file (parquet, csv or Excel) that includes row data for each (borehole) layer.

Parameters:
  • file (str | Path) – Path to the file to be read. Depending on the file extension, the corresponding Pandas read function will automatically be used. This can be either .parquet, .csv or .xlsx. Optional keyword arguments that can be given in the specific Pandas read function can be passed via the kwargs argument.

  • as_collection (bool, optional) – If True, the borehole table will be read as a Collection. If False, a pd.DataFrame is returned. The default is True.

  • crs (str | int | CRS, optional) – EPSG of the data’s horizontal reference. Takes anything that can be interpreted by pyproj.crs.CRS.from_user_input(). The default is None, which means no CRS will be assigned to the resulting Collection. Only used if as_collection=True.

  • vertical_datum (str | int | CRS, optional) – Vertical datum for the collection. The default is None. Only used if as_collection=True.

  • has_inclined (bool, optional) – Indicates whether the collection has inclined data. The default is False.

  • coordinate_names (tuple[str, str], optional) – Tuple specifying the names of the columns to be used as coordinates for the geometry column. The default is None, which means that it automatically tries to find the names of the x and y columns (see POSSIBLE_COLUMN_NAMING in column_names ). If not found, no geometry column will be created.

  • include_in_header (str | Iterable[str] | None, optional) – Columns to aditionally include in the header. The default is None, which means that only the default columns ‘nr’, ‘surface’, ‘x’ and ‘y’ or their aliases are included.

  • column_mapper (dict, optional) – Mapping from column names in the input file to GeoST positional column names. Use this when your file uses non-standard names which cannot be recognized automatically as positional columns (e.g. {‘ID’: ‘nr’, ‘X_RD’: ‘x’, ‘Y_RD’: ‘y’, ‘maaiveld’: ‘surface’, ‘van’: ‘top’, ‘tot’: ‘bottom’}). See geost.validation.column_names.POSITIONAL_COLUMN_NAMES for the accepted column names for each positional column type. If no valid survey-id (e.g. “nr”) column is found after mapping, a KeyError is raised. Missing x/y or depth columns trigger warnings and may limit functionality.

  • **kwargs – Optional keyword arguments for Pandas.read_parquet, Pandas.read_csv or Pandas.read_excel depending on the file extension.

Returns:

Instance of Collection when as_collection=True or pd.DataFrame otherwise.

Return type:

Collection or pd.DataFrame

Examples

Suppose we have a file of boreholes with columns ‘nr’, ‘x’, ‘y’, ‘maaiveld’, ‘end’, ‘top’ and ‘bottom’. The column ‘maaiveld’ corresponds to the required column ‘surface’ in GeoST and thus needs to be remapped. The x and y-coordinates are given in WGS84 UTM 31N whereas the vertical datum is given in the Belgian Ostend height vertical datum, so we specify these in the coll_kwargs so these will be used in the to_collection method:

>>> from geost import read_borehole_table
>>> collection_kwargs = {
...     "include_in_header": ["nr", "x", "y", "surface", "end"],
...     "has_inclined": False,
... }
>>> collection = read_borehole_table(
...     file, column_mapper={'maaiveld': 'surface'}, **collection_kwargs
... )