Modules#

kenmerkendewaarden.data_retrieve#

Retrieve data from ddlpy and write to netcdf files including all metadata

kenmerkendewaarden.data_retrieve.read_measurements(dir_output: str, station: str, extremes: bool, return_xarray: bool = False, nap_correction: bool = False, drop_duplicates: bool = False)[source]#

Read the measurements netcdf as a dataframe.

Parameters#

dir_outputstr

Path where the measurements are stored.

stationstr

station name, for instance “HOEKVHLD”.

extremesbool

Whether to read measurements for waterlevel timeseries or extremes.

return_xarraybool, optional

Whether to return raw xarray.Dataset instead of a DataFrame. No support for nap_correction and drop_duplicates. The default is False.

nap_correctionbool, optional

Whether to correct for NAP2005. The default is False.

drop_duplicatesbool, optional

Whether to drop duplicated timesteps. The default is False.

Returns#

df_measpd.DataFrame

DataFrame with the measurements or extremes timeseries.

kenmerkendewaarden.data_retrieve.read_measurements_amount(dir_output: str, extremes: bool)[source]#

Read the measurements amount csv into a dataframe.

Parameters#

dir_outputstr

Path where the measurements are stored.

extremesbool

Whether to read measurements amount for waterlevel timeseries or extremes.

Returns#

df_amountpd.DataFrame

DataFrame with the amount of measurements per year.

kenmerkendewaarden.data_retrieve.retrieve_measurements(dir_output: str, station: str, extremes: bool, start_date: Timestamp, end_date: Timestamp, drop_if_constant: list = None)[source]#

Retrieve timeseries with measurements or extremes for a single station from the DDL with ddlpy.

Parameters#

dir_outputstr

Path where the measurement netcdf file will be stored.

stationstr

station name, for instance “HOEKVHLD”.

extremesbool

Whether to read measurements for waterlevel timeseries or extremes.

start_datepd.Timestamp (or anything understood by pd.Timestamp)

start date of the measurements to be retrieved.

end_datepd.Timestamp (or anything understood by pd.Timestamp)

end date of the measurements to be retrieved.

drop_if_constantlist, optional

A list of columns to drop if the row values are constant, to save disk space. The default is None.

Returns#

None

kenmerkendewaarden.data_retrieve.retrieve_measurements_amount(dir_output: str, station_list: list, extremes: bool, start_date: Timestamp, end_date: Timestamp)[source]#

Retrieve the amount of measurements or extremes for a single station from the DDL with ddlpy.

Parameters#

dir_outputstr

Path where the measurement netcdf file will be stored.

stationstr

station name, for instance “HOEKVHLD”.

extremesbool

Whether to read measurements for waterlevel timeseries or extremes.

start_datepd.Timestamp (or anything understood by pd.Timestamp)

start date of the measurements to be retrieved.

end_datepd.Timestamp (or anything understood by pd.Timestamp)

end date of the measurements to be retrieved.

Returns#

None

kenmerkendewaarden.data_analysis#

Data analysis like missings, duplicates, outliers and several other statistics

kenmerkendewaarden.data_analysis.derive_statistics(dir_output: str, station_list: list, extremes: bool)[source]#

Derive several statistics for the measurements of each station in the list.

Parameters#

dir_outputstr

Path where the measurement netcdf file will be stored.

stationlist

list of station names to derive statistics for, for instance [“HOEKVHLD”].

extremesbool

Whether to derive statistics from waterlevel timeseries or extremes.

Returns#

data_summarypd.DataFrame

A dataframe with several statistics for each station from the provided list.

kenmerkendewaarden.data_analysis.plot_measurements(df_meas: DataFrame, df_ext: DataFrame = None)[source]#

Generate a timeseries figure for the measurement timeseries (and extremes) of this station.

Parameters#

df_measpd.DataFrame

Dataframe with the measurement timeseries for a particular station.

df_extpd.DataFrame, optional

Dataframe with the measurement extremes for a particular station.

Returns#

figmatplotlib.figure.Figure

Figure handle.

axmatplotlib.axes._axes.Axes

Figure axis handle.

kenmerkendewaarden.data_analysis.plot_measurements_amount(df: DataFrame, relative: bool = False)[source]#

Read the measurements amount csv and generate a pcolormesh figure of all years and stations. The colors indicate the absolute or relative number of measurements per year.

Parameters#

dfpd.DataFrame

Dataframe with the amount of measurements for several years per station.

relativebool, optional

Whether to scale the amount of measurements with the median of all measurement amounts for the same year. The default is False.

Returns#

figmatplotlib.figure.Figure

Figure handle.

axmatplotlib.axes._axes.Axes

Figure axis handle.

kenmerkendewaarden.data_analysis.plot_stations(station_list: list, crs: int = None, add_labels: bool = False)[source]#

Plot the stations by subsetting a ddlpy catalog with the provided list of stations.

Parameters#

station_listlist

List of stations to plot the locations from.

crsint, optional

Coordinate reference system, for instance 28992. The coordinates retrieved from the DDL will be converted to this EPSG. The default is None.

add_labelsbool, optional

Whether to add station code labels in the figure, useful for debugging. The default is False.

Returns#

figmatplotlib.figure.Figure

Figure handle.

axmatplotlib.axes._axes.Axes

Figure axis handle.

kenmerkendewaarden.tidalindicators#

Computation of tidal indicators from waterlevel extremes or timeseries

kenmerkendewaarden.tidalindicators.calc_HWLWtidalindicators(df_ext: DataFrame, min_coverage: float = None)[source]#

Computes several tidal extreme indicators from tidal extreme dataset.

Parameters#

df_extpd.DataFrame

Dataframe with extremes timeseries.

min_coveragefloat, optional

The minimal required coverage (between 0 to 1) of the df_ext timeseries to consider the statistics to be valid. It is the factor between the actual amount and the expected amount of high waters in the series. Note that the expected amount is not an exact extimate, so min_coverage=1 will probably result in nans even though all extremes are present. The default is None.

Returns#

dict_tidalindicatorsdict

Dictionary with several tidal indicators like yearly/monthly means.

kenmerkendewaarden.tidalindicators.calc_HWLWtidalrange(df_ext: DataFrame)[source]#

Compute the difference between a high water and the following low water. This tidal range is added as a column to the df_ext dataframe.

Parameters#

df_extpd.DataFrame

Dataframe with extremes timeseries.

Returns#

df_extpd.DataFrame

Input dataframe enriched with ‘tidalindicators’ and ‘HWLWno’ columns.

kenmerkendewaarden.tidalindicators.calc_getijcomponenten(df_meas, const_list=None)[source]#
kenmerkendewaarden.tidalindicators.calc_hat_lat_fromcomponents(comp: DataFrame) tuple[source]#

Derive highest and lowest astronomical tide (HAT/LAT) from a component set. The component set is used to make a tidal prediction for an arbitrary period of 19 years with a 10 minute interval. The max/min values of the predictions of all years are the HAT/LAT values. The HAT/LAT is very dependent on the A0 of the component set. Therefore, the HAT/LAT values are relevant for the same year as the slotgemiddelde that is used to replace A0 in the component set. For instance, if the slotgemiddelde is valid for 2021.0, HAT and LAT are also relevant for that year. It is important to use the same tidal prediction settings as used to derive the tidal components.

Parameters#

comppd.DataFrame

DataFrame with amplitudes and phases for a list of components.

Returns#

tuple

hat and lat values.

kenmerkendewaarden.tidalindicators.calc_hat_lat_frommeasurements(df_meas: DataFrame) tuple[source]#

Derive highest and lowest astronomical tide (HAT/LAT) from a measurement timeseries of 19 years. Tidal components are derived for each year of the measurement timeseries. The resulting component sets are used to make a tidal prediction each year of the measurement timeseries with a 10 minute interval. The max/min values of the predictions of all years are the HAT/LAT values. The HAT/LAT is very dependent on the A0 of the component sets. Therefore, the HAT/LAT values are relevant for the same period as the measurement timeseries.

Parameters#

df_measpd.DataFrame

Measurements timeseries. The last 19 years of this timeseries are used to compute hat and lat.

Returns#

tuple

hat and lat values.

kenmerkendewaarden.tidalindicators.calc_wltidalindicators(df_meas: DataFrame, min_coverage: float = None)[source]#

Computes monthly and yearly means from waterlevel timeseries.

Parameters#

df_measpd.DataFrame

Dataframe with waterlevel timeseries.

min_coveragefloat, optional

The minimum percentage (from 0 to 1) of timeseries coverage to consider the statistics to be valid. The default is None.

Returns#

dict_tidalindicatorsdict

Dictionary with several tidal indicators like yearly/monthly means.

kenmerkendewaarden.tidalindicators.plot_tidalindicators(dict_indicators: dict)[source]#

Plot tidalindicators.

Parameters#

dict_indicatorsdict, optional

Dictionary as returned from kw.calc_wltidalindicators() and/or kw.calc_HWLWtidalindicators(). The default is None.

Returns#

figmatplotlib.figure.Figure

Figure handle.

axmatplotlib.axes._axes.Axes

Figure axis handle.

kenmerkendewaarden.slotgemiddelden#

Computation of slotgemiddelden of waterlevels and extremes

kenmerkendewaarden.slotgemiddelden.calc_slotgemiddelden(df_meas: DataFrame, df_ext: DataFrame = None, min_coverage: float = None, clip_physical_break: bool = False)[source]#

Compute slotgemiddelden from measurement timeseries and optionally also from extremes timeseries. A simple linear trend is used to avoid all pretend-accuracy. However, when fitting a linear trend on a limited amount of data, the nodal cycle and wind effects will cause the model fit to be inaccurate. It is wise to use at least 30 years of data for a valid fit, this is >1.5 times the nodal cycle.

Parameters#

df_measpd.DataFrame

the timeseries of measured waterlevels.

df_extpd.DataFrame, optional

the timeseries of extremes (high and low waters). The default is None.

min_coveragefloat, optional

Set yearly means to nans for years that do not have sufficient data coverage. The default is None.

clip_physical_breakbool, optional

Whether to exclude the part of the timeseries before physical breaks like estuary closures. The default is False.

Returns#

slotgemiddelden_dictdict

dictionary with yearly means and model fits, optionally also for extremes and corresponding tidal range.

kenmerkendewaarden.slotgemiddelden.plot_slotgemiddelden(slotgemiddelden_dict: dict, slotgemiddelden_dict_all: dict = None)[source]#

plot timeseries of yearly mean waterlevels and corresponding model fits.

Parameters#

slotgemiddelden_dictdict

Output from kw.calc_slotgemiddelden containing timeseries of yearly mean waterlevels and corresponding model fits.

slotgemiddelden_dict_alldict, optional

Optionally provide another dictionary with unfiltered mean waterlevels. Only used to plot the mean waterlevels (in grey). The default is None.

Returns#

figmatplotlib.figure.Figure

Figure handle.

axmatplotlib.axes._axes.Axes

Figure axis handle.

kenmerkendewaarden.havengetallen#

Computation of havengetallen

kenmerkendewaarden.havengetallen.calc_HWLW_springneap(df_ext: DataFrame, min_coverage=None, moonculm_offset: int = 4)[source]#
kenmerkendewaarden.havengetallen.calc_havengetallen(df_ext: DataFrame, return_df_ext=False, min_coverage=None, moonculm_offset: int = 4)[source]#

havengetallen consist of the extreme (high and low) median values and the extreme median time delays with respect to the moonculmination. Besides that it computes the tide difference for each cycle and the tidal period. All these indicators are derived by dividing the extremes in hour-classes with respect to the moonculminination.

Parameters#

df_extpd.DataFrame

DataFrame with extremes (highs and lows, no aggers). The last 10 years of this timeseries are used to compute the havengetallen.

return_dfbool

Whether to return the enriched input dataframe. Default is False.

min_coveragefloat, optional

The minimal required coverage (between 0 to 1) of the df_ext timeseries to consider the statistics to be valid. It is the factor between the actual amount and the expected amount of high waters in the series. Note that the expected amount is not an exact extimate, so min_coverage=1 will probably result in nans even though all extremes are present. The default is None.

moonculm_offsetint, optional

Offset between moonculmination and extremes. Passed on to calc_HWLW_moonculm_combi. The default is 4, which corresponds to a 2-day offset, which is applicable to the Dutch coast.

Returns#

df_havengetallenpd.DataFrame

DataFrame with havengetallen for all hour-classes. 0 corresponds to spring, 6 corresponds to neap, mean is mean.

df_ext_culmpd.DataFrame

An enriched copy of the input DataFrame including a ‘culm_hr’ column.

kenmerkendewaarden.havengetallen.plot_HWLW_pertimeclass(df_ext: DataFrame, df_havengetallen: DataFrame)[source]#

Plot the extremes for each hour-class, including a median line.

Parameters#

df_extpd.DataFrame

DataFrame with measurement extremes, as provided by kw.calc_havengetallen().

df_havengetallenpd.DataFrame

DataFrame with havengetallen for all hour-classes, as provided by kw.calc_havengetallen().

Returns#

figmatplotlib.figure.Figure

Figure handle.

axmatplotlib.axes._axes.Axes

Figure axis handle.

kenmerkendewaarden.havengetallen.plot_aardappelgrafiek(df_havengetallen: DataFrame)[source]#

Plot the median values of each hour-class in a aardappelgrafiek.

Parameters#

df_havengetallenpd.DataFrame

DataFrame with havengetallen for all hour-classes, as provided by kw.calc_havengetallen().

Returns#

figmatplotlib.figure.Figure

Figure handle.

axmatplotlib.axes._axes.Axes

Figure axis handle.

kenmerkendewaarden.gemiddeldgetij#

Computation of gemiddelde getijkromme

kenmerkendewaarden.gemiddeldgetij.calc_gemiddeldgetij(df_meas: DataFrame, df_ext: DataFrame = None, min_coverage: float = None, freq: str = '60sec', nb: int = 0, nf: int = 0, scale_extremes: bool = False, scale_period: bool = False)[source]#

Generate an average tidal signal for average/spring/neap tide by doing a tidal analysis on a timeseries of measurements. The (subsets/adjusted) resulting tidal components are then used to make a raw prediction for average/spring/neap tide. These raw predictions can optionally be scaled in height (with havengetallen) and in time (to a fixed period of 12h25min). An n-number of backwards and forward repeats are added before the timeseries are returned, resulting in nb+nf+1 tidal periods.

Parameters#

df_measpd.DataFrame

Timeseries of waterlevel measurements. The last 10 years of this timeseries are used to compute the getijkrommes.

df_extpd.DataFrame, optional

Timeseries of waterlevel extremes (1/2 only). The last 10 years of this timeseries are used to compute the getijkrommes. The default is None.

min_coveragefloat, optional

The minimal required coverage of the df_ext timeseries. Passed on to calc_havengetallen(). The default is None.

freqstr, optional

Frequency of the prediction, a value of 60 seconds or lower is adivisable for decent results. The default is “60sec”.

nbint, optional

Amount of periods to repeat backward. The default is 0.

nfint, optional

Amount of periods to repeat forward. The default is 0.

scale_extremesbool, optional

Whether to scale extremes with havengetallen. The default is False.

scale_periodbool, optional

Whether to scale to 12h25min (for boi). The default is False.

Returns#

gemgetij_dictdict

dictionary with Dataframes with gemiddeld getij for mean, spring and neap tide.

kenmerkendewaarden.gemiddeldgetij.plot_gemiddeldgetij(gemgetij_dict: dict, gemgetij_dict_raw: dict = None, tick_hours: int = None)[source]#

Default plotting function for gemiddeldgetij dictionaries.

Parameters#

gemgetij_dictdict

dictionary as returned from kw.calc_gemiddeldgetij().

gemgetij_raw_dictdict, optional

dictionary as returned from kw.calc_gemiddeldgetij() e.g. with uncorrected values. The default is None.

ticks_12hbool, optional

whether to use xaxis ticks of 12 hours, otherwise automatic but less nice values

Returns#

figmatplotlib.figure.Figure

Figure handle.

axmatplotlib.axes._axes.Axes

Figure axis handle.

kenmerkendewaarden.overschrijding#

Computation of probabilities (overschrijdingsfrequenties) of extreme waterlevels

kenmerkendewaarden.overschrijding.calc_overschrijding(df_ext: ~pandas.core.frame.DataFrame, dist: dict = None, inverse: bool = False, clip_physical_break: bool = False, rule_type: str = None, rule_value: (<class 'pandas._libs.tslibs.timestamps.Timestamp'>, <class 'float'>) = None, interp_freqs: list = None)[source]#

Compute exceedance/deceedance frequencies based on measured extreme waterlevels.

Parameters#

df_extpd.DataFrame, optional

The timeseries of extremes (high and low waters). The default is None.

distdict, optional

A pre-filled dictionary with a Hydra-NL and/or validation distribution. The default is None.

inversebool, optional

Whether to compute deceedance instead of exceedance frequencies. The default is False.

clip_physical_breakbool, optional

Whether to exclude the part of the timeseries before physical breaks like estuary closures. The default is False.

rule_typestr, optional

break/linear/None, passed on to apply_trendanalysis(). The default is None.

rule_value(pd.Timestamp, float), optional

Value corresponding to rule_type, pd.Timestamp (or anything understood by pd.Timestamp) in case of rule_type=’break’, float in case of rule_type=’linear’. The default is None.

interp_freqslist, optional

The frequencies to interpolate to, providing this will result in a “geinterpoleerd” key in the returned dictionary. The default is None.

Returns#

distdict

A dictionary with several distributions.

kenmerkendewaarden.overschrijding.plot_overschrijding(dist: dict)[source]#

plot overschrijding/onderschrijding

Parameters#

distdict

Dictionary as returned from kw.calc_overschrijding().

Returns#

figmatplotlib.figure.Figure

Figure handle.

axmatplotlib.axes._axes.Axes

Figure axis handle.