Modules#
kenmerkendewaarden.data_retrieve#
Retrieve data from ddlpy and write to netcdf files including all metadata
- kenmerkendewaarden.data_retrieve.read_measurements(dir_output: str, station: str, extremes: bool, return_xarray: bool = False, nap_correction: bool = False, drop_duplicates: bool = False)[source]#
Read the measurements netcdf as a dataframe.
Parameters#
- dir_outputstr
Path where the measurements are stored.
- stationstr
station name, for instance “HOEKVHLD”.
- extremesbool
Whether to read measurements for waterlevel timeseries or extremes.
- return_xarraybool, optional
Whether to return raw xarray.Dataset instead of a DataFrame. No support for nap_correction and drop_duplicates. The default is False.
- nap_correctionbool, optional
Whether to correct for NAP2005. The default is False.
- drop_duplicatesbool, optional
Whether to drop duplicated timesteps. The default is False.
Returns#
- df_measpd.DataFrame
DataFrame with the measurements or extremes timeseries.
- kenmerkendewaarden.data_retrieve.read_measurements_amount(dir_output: str, extremes: bool)[source]#
Read the measurements amount csv into a dataframe.
Parameters#
- dir_outputstr
Path where the measurements are stored.
- extremesbool
Whether to read measurements amount for waterlevel timeseries or extremes.
Returns#
- df_amountpd.DataFrame
DataFrame with the amount of measurements per year.
- kenmerkendewaarden.data_retrieve.retrieve_measurements(dir_output: str, station: str, extremes: bool, start_date: Timestamp, end_date: Timestamp, drop_if_constant: list | None = None)[source]#
Retrieve timeseries with measurements or extremes for a single station from the DDL with ddlpy.
Parameters#
- dir_outputstr
Path where the measurement netcdf file will be stored.
- stationstr
station name, for instance “HOEKVHLD”.
- extremesbool
Whether to read measurements for waterlevel timeseries or extremes.
- start_datepd.Timestamp (or anything understood by pd.Timestamp)
start date of the measurements to be retrieved.
- end_datepd.Timestamp (or anything understood by pd.Timestamp)
end date of the measurements to be retrieved.
- drop_if_constantlist, optional
A list of columns to drop if the row values are constant, to save disk space. The default is None.
Returns#
None
- kenmerkendewaarden.data_retrieve.retrieve_measurements_amount(dir_output: str, station_list: list, extremes: bool, start_date: Timestamp, end_date: Timestamp)[source]#
Retrieve the amount of measurements or extremes for a single station from the DDL with ddlpy.
Parameters#
- dir_outputstr
Path where the measurement netcdf file will be stored.
- stationstr
station name, for instance “HOEKVHLD”.
- extremesbool
Whether to read measurements for waterlevel timeseries or extremes.
- start_datepd.Timestamp (or anything understood by pd.Timestamp)
start date of the measurements to be retrieved.
- end_datepd.Timestamp (or anything understood by pd.Timestamp)
end date of the measurements to be retrieved.
Returns#
None
kenmerkendewaarden.data_analysis#
Data analysis like missings, duplicates, outliers and several other statistics
- kenmerkendewaarden.data_analysis.derive_statistics(dir_output: str, station_list: list, extremes: bool)[source]#
Derive several statistics for the measurements of each station in the list.
Parameters#
- dir_outputstr
Path where the measurement netcdf file will be stored.
- stationlist
list of station names to derive statistics for, for instance [“HOEKVHLD”].
- extremesbool
Whether to derive statistics from waterlevel timeseries or extremes.
Returns#
- data_summarypd.DataFrame
A dataframe with several statistics for each station from the provided list.
- kenmerkendewaarden.data_analysis.plot_measurements(df_meas: DataFrame, df_ext: DataFrame | None = None)[source]#
Generate a timeseries figure for the measurement timeseries (and extremes) of this station.
Parameters#
- df_measpd.DataFrame
Dataframe with the measurement timeseries for a particular station.
- df_extpd.DataFrame, optional
Dataframe with the measurement extremes for a particular station.
Returns#
- figmatplotlib.figure.Figure
Figure handle.
- axmatplotlib.axes._axes.Axes
Figure axis handle.
- kenmerkendewaarden.data_analysis.plot_measurements_amount(df: DataFrame, relative: bool = False)[source]#
Read the measurements amount csv and generate a pcolormesh figure of all years and stations. The colors indicate the absolute or relative number of measurements per year.
Parameters#
- dfpd.DataFrame
Dataframe with the amount of measurements for several years per station.
- relativebool, optional
Whether to scale the amount of measurements with the median of all measurement amounts for the same year. The default is False.
Returns#
- figmatplotlib.figure.Figure
Figure handle.
- axmatplotlib.axes._axes.Axes
Figure axis handle.
- kenmerkendewaarden.data_analysis.plot_stations(station_list: list, crs: int | None = None, add_labels: bool = False)[source]#
Plot the stations by subsetting a ddlpy catalog with the provided list of stations.
Parameters#
- station_listlist
List of stations to plot the locations from.
- crsint, optional
Coordinate reference system, for instance 28992. The coordinates retrieved from the DDL will be converted to this EPSG. The default is None.
- add_labelsbool, optional
Whether to add station code labels in the figure, useful for debugging. The default is False.
Returns#
- figmatplotlib.figure.Figure
Figure handle.
- axmatplotlib.axes._axes.Axes
Figure axis handle.
kenmerkendewaarden.tidalindicators#
Computation of tidal indicators from waterlevel extremes or timeseries
- kenmerkendewaarden.tidalindicators.calc_HWLWtidalindicators(df_ext: DataFrame, min_coverage: float | None = None)[source]#
Computes several tidal extreme indicators from tidal extreme dataset.
Parameters#
- df_extpd.DataFrame
Dataframe with extremes timeseries.
- min_coveragefloat, optional
The minimal required coverage (between 0 to 1) of the df_ext timeseries to consider the statistics to be valid. It is the factor between the actual amount and the expected amount of high waters in the series. Note that the expected amount is not an exact extimate, so min_coverage=1 will probably result in nans even though all extremes are present. The default is None.
Returns#
- dict_tidalindicatorsdict
Dictionary with several tidal indicators like yearly/monthly means.
- kenmerkendewaarden.tidalindicators.calc_HWLWtidalrange(df_ext: DataFrame)[source]#
Compute the difference between a high water and the following low water. This tidal range is added as a column to the df_ext dataframe.
Parameters#
- df_extpd.DataFrame
Dataframe with extremes timeseries.
Returns#
- df_extpd.DataFrame
Input dataframe enriched with ‘tidalindicators’ and ‘HWLWno’ columns.
- kenmerkendewaarden.tidalindicators.calc_wltidalindicators(df_meas: DataFrame, min_coverage: float | None = None)[source]#
Computes monthly and yearly means from waterlevel timeseries.
Parameters#
- df_measpd.DataFrame
Dataframe with waterlevel timeseries.
- min_coveragefloat, optional
The minimum percentage (from 0 to 1) of timeseries coverage to consider the statistics to be valid. The default is None.
Returns#
- dict_tidalindicatorsdict
Dictionary with several tidal indicators like yearly/monthly means.
- kenmerkendewaarden.tidalindicators.plot_tidalindicators(dict_indicators: dict)[source]#
Plot tidalindicators.
Parameters#
- dict_indicatorsdict, optional
Dictionary as returned from kw.calc_wltidalindicators() and/or kw.calc_HWLWtidalindicators(). The default is None.
Returns#
- figmatplotlib.figure.Figure
Figure handle.
- axmatplotlib.axes._axes.Axes
Figure axis handle.
kenmerkendewaarden.tidalextremes#
Created on Mon Jun 9 20:15:54 2025
@author: veenstra
- kenmerkendewaarden.tidalextremes.calc_highest_lowest_astronomical_tide(df_meas: DataFrame) tuple [source]#
Computing HAT and LAT from measurement timeseries, highest respectively lowest astronomical tides. This method derives the SA and SM components from 19 years of measurements (at once) and the other components from the most recent 4 years of measurements (per year, then vector averaged). The mean is overwitten with the slotgemiddelde, derived from the entire timerseries. The resulting component set is used to make a prediction of 19 years per year. The min and max from the resulting prediction timeseries are the LAT and HAT values.
The slowly varying SA and SM can only be derived from long timeseries covering an entire nodal cycle. These components are sensitive to timeseries length, so it is important to supply a sufficiently long timeseries. The other components are varying more quickly and for those only the last four years are used to represent the tidal dynamics at the end of the period instead of the average over the last 19 years. This also goes for the average, which is overwritten by the slotgemiddelde corresponding to the end of the period. This results in LAT/HAT values that are representative for the end of the supplied period.
Several alternative methods were considered, details are available in Deltares-research/kenmerkendewaarden#73
Parameters#
- df_measpd.DataFrame
Dataframe with waterlevel timeseries.
Returns#
- tuple
hat and lat values.
kenmerkendewaarden.slotgemiddelden#
Computation of slotgemiddelden of waterlevels and extremes
- kenmerkendewaarden.slotgemiddelden.calc_slotgemiddelden(df_meas: DataFrame, df_ext: DataFrame | None = None, min_coverage: float | None = None, clip_physical_break: bool = False)[source]#
Compute slotgemiddelden from measurement timeseries and optionally also from extremes timeseries. A simple linear trend is used to avoid all pretend-accuracy. However, when fitting a linear trend on a limited amount of data, the nodal cycle and wind effects will cause the model fit to be inaccurate. It is wise to use at least 30 years of data for a valid fit, this is >1.5 times the nodal cycle.
Parameters#
- df_measpd.DataFrame
the timeseries of measured waterlevels.
- df_extpd.DataFrame, optional
the timeseries of extremes (high and low waters). The default is None.
- min_coveragefloat, optional
Set yearly means to nans for years that do not have sufficient data coverage. The default is None.
- clip_physical_breakbool, optional
Whether to exclude the part of the timeseries before physical breaks like estuary closures. The default is False.
Returns#
- slotgemiddelden_dictdict
dictionary with yearly means and model fits, optionally also for extremes and corresponding tidal range.
- kenmerkendewaarden.slotgemiddelden.plot_slotgemiddelden(slotgemiddelden_dict: dict, slotgemiddelden_dict_all: dict | None = None)[source]#
plot timeseries of yearly mean waterlevels and corresponding model fits.
Parameters#
- slotgemiddelden_dictdict
Output from kw.calc_slotgemiddelden containing timeseries of yearly mean waterlevels and corresponding model fits.
- slotgemiddelden_dict_alldict, optional
Optionally provide another dictionary with unfiltered mean waterlevels. Only used to plot the mean waterlevels (in grey). The default is None.
Returns#
- figmatplotlib.figure.Figure
Figure handle.
- axmatplotlib.axes._axes.Axes
Figure axis handle.
kenmerkendewaarden.havengetallen#
Computation of havengetallen
- kenmerkendewaarden.havengetallen.calc_HWLW_springneap(df_ext: DataFrame, min_coverage=None, moonculm_offset: int = 4)[source]#
- kenmerkendewaarden.havengetallen.calc_havengetallen(df_ext: DataFrame, return_df_ext=False, min_coverage=None, moonculm_offset: int = 4)[source]#
havengetallen consist of the extreme (high and low) median values and the extreme median time delays with respect to the moonculmination. Besides that it computes the tide difference for each cycle and the tidal period. All these indicators are derived by dividing the extremes in hour-classes with respect to the moonculminination.
Parameters#
- df_extpd.DataFrame
DataFrame with extremes (highs and lows, no aggers). The last 10 years of this timeseries are used to compute the havengetallen.
- return_dfbool
Whether to return the enriched input dataframe. Default is False.
- min_coveragefloat, optional
The minimal required coverage (between 0 to 1) of the df_ext timeseries to consider the statistics to be valid. It is the factor between the actual amount and the expected amount of high waters in the series. Note that the expected amount is not an exact extimate, so min_coverage=1 will probably result in nans even though all extremes are present. The default is None.
- moonculm_offsetint, optional
Offset between moonculmination and extremes. Passed on to calc_HWLW_moonculm_combi. The default is 4, which corresponds to a 2-day offset, which is applicable to the Dutch coast.
Returns#
- df_havengetallenpd.DataFrame
DataFrame with havengetallen for all hour-classes. 0 corresponds to spring, 6 corresponds to neap, mean is mean.
- df_ext_culmpd.DataFrame
An enriched copy of the input DataFrame including a ‘culm_hr’ column.
- kenmerkendewaarden.havengetallen.plot_HWLW_pertimeclass(df_ext: DataFrame, df_havengetallen: DataFrame)[source]#
Plot the extremes for each hour-class, including a median line.
Parameters#
- df_extpd.DataFrame
DataFrame with measurement extremes, as provided by kw.calc_havengetallen().
- df_havengetallenpd.DataFrame
DataFrame with havengetallen for all hour-classes, as provided by kw.calc_havengetallen().
Returns#
- figmatplotlib.figure.Figure
Figure handle.
- axmatplotlib.axes._axes.Axes
Figure axis handle.
- kenmerkendewaarden.havengetallen.plot_aardappelgrafiek(df_havengetallen: DataFrame)[source]#
Plot the median values of each hour-class in a aardappelgrafiek.
Parameters#
- df_havengetallenpd.DataFrame
DataFrame with havengetallen for all hour-classes, as provided by kw.calc_havengetallen().
Returns#
- figmatplotlib.figure.Figure
Figure handle.
- axmatplotlib.axes._axes.Axes
Figure axis handle.
kenmerkendewaarden.gemiddeldgetij#
Computation of gemiddelde getijkromme
- kenmerkendewaarden.gemiddeldgetij.calc_gemiddeldgetij(df_meas: DataFrame, df_ext: DataFrame | None = None, min_coverage: float | None = None, freq: str = '60sec', nb: int = 0, nf: int = 0, scale_extremes: bool = False, scale_period: bool = False)[source]#
Generate an average tidal signal for average/spring/neap tide by doing a tidal analysis on a timeseries of measurements. The (subsets/adjusted) resulting tidal components are then used to make a raw prediction for average/spring/neap tide. These raw predictions can optionally be scaled in height (with havengetallen) and in time (to a fixed period of 12h25min). An n-number of backwards and forward repeats are added before the timeseries are returned, resulting in nb+nf+1 tidal periods.
Parameters#
- df_measpd.DataFrame
Timeseries of waterlevel measurements. The last 10 years of this timeseries are used to compute the getijkrommes.
- df_extpd.DataFrame, optional
Timeseries of waterlevel extremes (1/2 only). The last 10 years of this timeseries are used to compute the getijkrommes. The default is None.
- min_coveragefloat, optional
The minimal required coverage of the df_ext timeseries. Passed on to calc_havengetallen(). The default is None.
- freqstr, optional
Frequency of the prediction, a value of 60 seconds or lower is adivisable for decent results. The default is “60sec”.
- nbint, optional
Amount of periods to repeat backward. The default is 0.
- nfint, optional
Amount of periods to repeat forward. The default is 0.
- scale_extremesbool, optional
Whether to scale extremes with havengetallen. The default is False.
- scale_periodbool, optional
Whether to scale to 12h25min (for boi). The default is False.
Returns#
- gemgetij_dictdict
dictionary with Dataframes with gemiddeld getij for mean, spring and neap tide.
- kenmerkendewaarden.gemiddeldgetij.plot_gemiddeldgetij(gemgetij_dict: dict, gemgetij_dict_raw: dict | None = None, tick_hours: int | None = None)[source]#
Default plotting function for gemiddeldgetij dictionaries.
Parameters#
- gemgetij_dictdict
dictionary as returned from kw.calc_gemiddeldgetij().
- gemgetij_raw_dictdict, optional
dictionary as returned from kw.calc_gemiddeldgetij() e.g. with uncorrected values. The default is None.
- ticks_12hbool, optional
whether to use xaxis ticks of 12 hours, otherwise automatic but less nice values
Returns#
- figmatplotlib.figure.Figure
Figure handle.
- axmatplotlib.axes._axes.Axes
Figure axis handle.
kenmerkendewaarden.overschrijding#
Computation of probabilities (overschrijdingsfrequenties) of extreme waterlevels
- kenmerkendewaarden.overschrijding.calc_highest_extremes(df_ext: DataFrame, ascending: bool = False, num_extremes: int = 5)[source]#
Calculate the n amount of highest lowest extremes, by sorting the input dataframe with extremes from high to low (ascending=False) or low to high (ascending=True) and return the first n times and values.
Parameters#
- df_extpd.DataFrame
The timeseries of extremes (high and low waters).
- ascendingbool, optional
Whether to sort from high to low (ascending=False) or low to high (ascending=True). The default is False.
- num_extremesint, optional
The number of extremes to return. The default is 5.
- kenmerkendewaarden.overschrijding.calc_overschrijding(df_ext: ~pandas.core.frame.DataFrame, dist: dict = None, inverse: bool = False, clip_physical_break: bool = False, rule_type: str = None, rule_value: (<class 'pandas._libs.tslibs.timestamps.Timestamp'>, <class 'float'>) = None, interp_freqs: list = None)[source]#
Compute exceedance/deceedance frequencies based on measured extreme waterlevels.
Parameters#
- df_extpd.DataFrame, optional
The timeseries of extremes (high and low waters). The default is None.
- distdict, optional
A pre-filled dictionary with a Hydra-NL and/or validation distribution. The default is None.
- inversebool, optional
Whether to compute deceedance instead of exceedance frequencies. The default is False.
- clip_physical_breakbool, optional
Whether to exclude the part of the timeseries before physical breaks like estuary closures. The default is False.
- rule_typestr, optional
break/linear/None, passed on to apply_trendanalysis(). The default is None.
- rule_value(pd.Timestamp, float), optional
Value corresponding to rule_type, pd.Timestamp (or anything understood by pd.Timestamp) in case of rule_type=’break’, float in case of rule_type=’linear’. The default is None.
- interp_freqslist, optional
The frequencies to interpolate to, providing this will result in a “geinterpoleerd” key in the returned dictionary. The default is None.
Returns#
- distdict
A dictionary with several distributions.