# GeoST Accessors
This page briefly goes into more detail about the header and data accessors available for
Geopandas GeoDataFrames and Pandas DataFrames as shown on the [previous page](./data_structures.ipynb#geost-accessors). For header GeoDataFrames the [`.gsthd`](../api_reference/header_accessors.rst)
accessor is available and for data DataFrames the [`.gstda`](../api_reference/data_accessors.rst)
is available. For these accessors to work, the GeoDataFrame or DataFrame must meet some criteria
which will be explained below. Additionally, we will provide some details about how they
work with several data sources for a better understanding on their use and functionality.

We will use the GeoST borehole and CPT sample data for this exaplanation.


In [None]:
import geost

# Load sample data
cores = geost.data.boreholes_usp()
cpts = geost.data.cpts_usp()

# Separate header and data tables
cores_header, cores_data = cores.header, cores.data
cpts_header, cpts_data = cpts.header, cpts.data

## Header `.gsthd` accessor
The [`.gsthd`](../api_reference/header_accessors.rst) **only works on Geopandas GeoDataFrames**
which is related to different types of subsurface data that exist. Typically available types
of subsurface data comprise point-like data such as boreholes, CPTs and well-logs and line-like
data such as seismics, GPR and EM. This means that the location of different types of surveys in
some cases are represented by a point and others by a line. In the design, GeoST aims to be
as consistent as possible with creating methods (e.g. `select_within_bbox`, `select_with_points`)
but under the hood, some methods may need to work slightly different based on the type of
geometry.

The `.gsthd` accessor automatically resolves the above need and figures out the correct method
to use based on the datatype in the "geometry" column of a GeoDataFrame. This is done by determining
the geometry type of the first row in the GeoDataFrame and selecting the required backend:
for instance a [`PointHeader`](../api_reference/header_accessors.rst#pointheader) in case of
point data and [`LineHeader`](../api_reference/data_accessors.rst#lineheader) in case of line
data. To illustrate this, we will stepwise show how this works the loaded `cores_header`.

Let's first check the geometry type in the header table and see what happens when simply
use the `.gsthd` attribute.

In [None]:
print(cores_header.geom_type.head()) # Check geometry type and show the first rows
cores_header.gsthd

As you can see, the geometry type is "Point" and calling the `.gsthd` attribute prints a
`geost.accessors.accessor.Header` instance. This is a generic class where the correct backend
(i.e. PointHeader or LineHeader) is chosen based on the geometry type. Since the geometry type
is "Point" we expect that the chosen backend is a `PointHeader`. We can check this by checking
the resulting backend.

In [None]:
cores_header.gsthd._backend

We see that the backend indeed is a `PointHeader` instance. This makes sure that each called
method when using the accessor is dispatched to the PointHeader instance.

As mentioned before, every GeoST method available in `Collection` instances is also available
in either the `.gsthd` or `.gstda` accessor. The code snippet below shows how methods are called
normally on Collection instances and how this can be done using an accessor.

In [None]:
# Selection with the Collection instance
collection_select = cores.select_within_bbox(139_500, 455_000, 140_000, 455_500)

# Selection with the header accessor
header_select = cores_header.gsthd.select_within_bbox(
 139_500, 455_000, 140_000, 455_500
)

# Show the selection results
print(collection_select)
header_select.head()

## Data `.gstda` accessor
As described in the [Data structures](./data_structures.ipynb#data-table), GeoST mainly distinguishes
between "layered" and "discrete" data. Therefore, some methods that operate on the data table may needs
to work differently with both types of data and the [`.gstda`](../api_reference/data_accessors.rst).
Similar to the header accessor, the correct method is automatically figured out by the `.gstda`
accessor. This is determined by the presence of the columns **"top"** and **"bottom"** (i.e. layered data) or the presence of the column **"depth"** (i.e. discrete data). Note that the `.gstda` accessor only works if either of these columns are present.

Similar to the header, the `.gstda` accessor refers to a generic `geost.accessors.accessor.Data`
instance where the correct backend is chosen. If we check the backends for the "cores_data" (layered
data) and "cpts_data" (discrete data), we see that this indeed results in different backends.

In [None]:
print(cores_data.gstda._backend)
print(cpts_data.gstda._backend)

As shown before, using methods with the `.gstda` accessor only differs slightly from using
Collection instances.

In [None]:
# Selection with the Collection instance
cores_selected = cores.slice_depth_interval(0.5, 1.5) # Between 0.5m and 1.5m depth

# Selection with the data accessor
cores_data_selected = cores_data.gstda.slice_depth_interval(0.5, 1.5)
cpts_data_selected = cpts_data.gstda.slice_depth_interval(0.5, 1.5)

print(cores_selected)
cores_data_selected.head(), cpts_data_selected.head()

## Use with generic Geopandas/Pandas
The `.gsthd` and `.gstda` accessors also work on any GeoDataFrame or any DataFrame instance
as long as these have required columns available for specific methods to work. Therefore,
also data that has been loaded or created without GeoST can use the accessors. 

We will demonstrate this for the header by creating a simple GeoDataFrame with two point
geometries and using a GeoST selection method.

In [None]:
import geopandas as gpd

gdf = gpd.GeoDataFrame(geometry=gpd.points_from_xy([1, 10], [1, 20]), crs=28992)
print(gdf)
print("\nSelection result:")
print(gdf.gsthd.select_within_bbox(0, 0, 2, 2))

Note that the above GeoDataFrame only contains a "geometry" column and therefore, not all
methods from the `.gsthd` accessor will work. See the [Data structures](./data_structures.ipynb#header-table) section for the required columns for all methods to work.

Also for the data table we will create a simple example DataFrame to show it works.

In [None]:
import pandas as pd

df = pd.DataFrame(
 {"nr": ["a", "a"], "top": [0, 1], "bottom": [1, 2], "lith": ["clay", "sand"]}
)
print(df)
print("\nSelection result:")
print(df.gstda.slice_by_values("lith", "clay"))

Note again that for the `.gstda` accessor to work, either the columns **"top"** and **"bottom"** (i.e. layered data) or the column **"depth"** (i.e. discrete data) must be present. Otherwise an error will
be thrown. Similar to the header accessor, several columns need to be present for all methods to work.
See the [Data structures](./data_structures.ipynb#data-table) section for all the required columns.