Modules#
kenmerkendewaarden.data_retrieve#
Retrieve data from ddlpy and write to netcdf files including all metadata
- kenmerkendewaarden.data_retrieve.read_measurements(dir_output: str, station: str, extremes: bool, return_xarray: bool = False, nap_correction: bool = False, drop_duplicates: bool = False)[source]#
Read the measurements netcdf as a dataframe.
Parameters#
- dir_outputstr
Path where the measurements are stored.
- stationstr
station name, for instance “HOEKVHLD”.
- extremesbool
Whether to read measurements for waterlevel timeseries or extremes.
- return_xarraybool, optional
Whether to return raw xarray.Dataset instead of a DataFrame. No support for nap_correction and drop_duplicates. The default is False.
- nap_correctionbool, optional
Whether to correct for NAP2005. The default is False.
- drop_duplicatesbool, optional
Whether to drop duplicated timesteps. The default is False.
Returns#
- df_measpd.DataFrame
DataFrame with the measurements or extremes timeseries.
- kenmerkendewaarden.data_retrieve.read_measurements_amount(dir_output: str, extremes: bool)[source]#
Read the measurements amount csv into a dataframe.
Parameters#
- dir_outputstr
Path where the measurements are stored.
- extremesbool
Whether to read measurements amount for waterlevel timeseries or extremes.
Returns#
- df_amountpd.DataFrame
DataFrame with the amount of measurements per year.
- kenmerkendewaarden.data_retrieve.retrieve_measurements(dir_output: str, station: str, extremes: bool, start_date: Timestamp, end_date: Timestamp, drop_if_constant: list = None)[source]#
Retrieve timeseries with measurements or extremes for a single station from the DDL with ddlpy.
Parameters#
- dir_outputstr
Path where the measurement netcdf file will be stored.
- stationstr
station name, for instance “HOEKVHLD”.
- extremesbool
Whether to read measurements for waterlevel timeseries or extremes.
- start_datepd.Timestamp (or anything understood by pd.Timestamp)
start date of the measurements to be retrieved.
- end_datepd.Timestamp (or anything understood by pd.Timestamp)
end date of the measurements to be retrieved.
- drop_if_constantlist, optional
A list of columns to drop if the row values are constant, to save disk space. The default is None.
Returns#
None
- kenmerkendewaarden.data_retrieve.retrieve_measurements_amount(dir_output: str, station_list: list, extremes: bool, start_date: Timestamp, end_date: Timestamp)[source]#
Retrieve the amount of measurements or extremes for a single station from the DDL with ddlpy.
Parameters#
- dir_outputstr
Path where the measurement netcdf file will be stored.
- stationstr
station name, for instance “HOEKVHLD”.
- extremesbool
Whether to read measurements for waterlevel timeseries or extremes.
- start_datepd.Timestamp (or anything understood by pd.Timestamp)
start date of the measurements to be retrieved.
- end_datepd.Timestamp (or anything understood by pd.Timestamp)
end date of the measurements to be retrieved.
Returns#
None
kenmerkendewaarden.data_analysis#
Data analysis like missings, duplicates, outliers and several other statistics
- kenmerkendewaarden.data_analysis.derive_statistics(dir_output: str, station_list: list, extremes: bool)[source]#
Derive several statistics for the measurements of each station in the list.
Parameters#
- dir_outputstr
Path where the measurement netcdf file will be stored.
- stationlist
list of station names to derive statistics for, for instance [“HOEKVHLD”].
- extremesbool
Whether to derive statistics from waterlevel timeseries or extremes.
Returns#
- data_summarypd.DataFrame
A dataframe with several statistics for each station from the provided list.
- kenmerkendewaarden.data_analysis.plot_measurements(df_meas: DataFrame, df_ext: DataFrame = None)[source]#
Generate a timeseries figure for the measurement timeseries (and extremes) of this station.
Parameters#
- df_measpd.DataFrame
Dataframe with the measurement timeseries for a particular station.
- df_extpd.DataFrame, optional
Dataframe with the measurement extremes for a particular station.
Returns#
- figmatplotlib.figure.Figure
Figure handle.
- axmatplotlib.axes._axes.Axes
Figure axis handle.
- kenmerkendewaarden.data_analysis.plot_measurements_amount(df: DataFrame, relative: bool = False)[source]#
Read the measurements amount csv and generate a pcolormesh figure of all years and stations. The colors indicate the absolute or relative number of measurements per year.
Parameters#
- dfpd.DataFrame
Dataframe with the amount of measurements for several years per station.
- relativebool, optional
Whether to scale the amount of measurements with the median of all measurement amounts for the same year. The default is False.
Returns#
- figmatplotlib.figure.Figure
Figure handle.
- axmatplotlib.axes._axes.Axes
Figure axis handle.
- kenmerkendewaarden.data_analysis.plot_stations(station_list: list, crs: int = None, add_labels: bool = False)[source]#
Plot the stations by subsetting a ddlpy catalog with the provided list of stations.
Parameters#
- station_listlist
List of stations to plot the locations from.
- crsint, optional
Coordinate reference system, for instance 28992. The coordinates retrieved from the DDL will be converted to this EPSG. The default is None.
- add_labelsbool, optional
Whether to add station code labels in the figure, useful for debugging. The default is False.
Returns#
- figmatplotlib.figure.Figure
Figure handle.
- axmatplotlib.axes._axes.Axes
Figure axis handle.
kenmerkendewaarden.tidalindicators#
Computation of tidal indicators from waterlevel extremes or timeseries
- kenmerkendewaarden.tidalindicators.calc_HWLWtidalindicators(df_ext: DataFrame, min_coverage: float = None)[source]#
Computes several tidal extreme indicators from tidal extreme dataset.
Parameters#
- df_extpd.DataFrame
Dataframe with extremes timeseries.
- min_coveragefloat, optional
The minimal required coverage (between 0 to 1) of the df_ext timeseries to consider the statistics to be valid. It is the factor between the actual amount and the expected amount of high waters in the series. Note that the expected amount is not an exact extimate, so min_coverage=1 will probably result in nans even though all extremes are present. The default is None.
Returns#
- dict_tidalindicatorsdict
Dictionary with several tidal indicators like yearly/monthly means.
- kenmerkendewaarden.tidalindicators.calc_HWLWtidalrange(df_ext: DataFrame)[source]#
Compute the difference between a high water and the following low water. This tidal range is added as a column to the df_ext dataframe.
Parameters#
- df_extpd.DataFrame
Dataframe with extremes timeseries.
Returns#
- df_extpd.DataFrame
Input dataframe enriched with ‘tidalindicators’ and ‘HWLWno’ columns.
- kenmerkendewaarden.tidalindicators.calc_hat_lat_fromcomponents(comp: DataFrame) tuple [source]#
Derive highest and lowest astronomical tide (HAT/LAT) from a component set. The component set is used to make a tidal prediction for an arbitrary period of 19 years with a 10 minute interval. The max/min values of the predictions of all years are the HAT/LAT values. The HAT/LAT is very dependent on the A0 of the component set. Therefore, the HAT/LAT values are relevant for the same year as the slotgemiddelde that is used to replace A0 in the component set. For instance, if the slotgemiddelde is valid for 2021.0, HAT and LAT are also relevant for that year. It is important to use the same tidal prediction settings as used to derive the tidal components.
Parameters#
- comppd.DataFrame
DataFrame with amplitudes and phases for a list of components.
Returns#
- tuple
hat and lat values.
- kenmerkendewaarden.tidalindicators.calc_hat_lat_frommeasurements(df_meas: DataFrame) tuple [source]#
Derive highest and lowest astronomical tide (HAT/LAT) from a measurement timeseries of 19 years. Tidal components are derived for each year of the measurement timeseries. The resulting component sets are used to make a tidal prediction each year of the measurement timeseries with a 10 minute interval. The max/min values of the predictions of all years are the HAT/LAT values. The HAT/LAT is very dependent on the A0 of the component sets. Therefore, the HAT/LAT values are relevant for the same period as the measurement timeseries.
Parameters#
- df_measpd.DataFrame
Measurements timeseries. The last 19 years of this timeseries are used to compute hat and lat.
Returns#
- tuple
hat and lat values.
- kenmerkendewaarden.tidalindicators.calc_wltidalindicators(df_meas: DataFrame, min_coverage: float = None)[source]#
Computes monthly and yearly means from waterlevel timeseries.
Parameters#
- df_measpd.DataFrame
Dataframe with waterlevel timeseries.
- min_coveragefloat, optional
The minimum percentage (from 0 to 1) of timeseries coverage to consider the statistics to be valid. The default is None.
Returns#
- dict_tidalindicatorsdict
Dictionary with several tidal indicators like yearly/monthly means.
- kenmerkendewaarden.tidalindicators.plot_tidalindicators(dict_indicators: dict)[source]#
Plot tidalindicators.
Parameters#
- dict_indicatorsdict, optional
Dictionary as returned from kw.calc_wltidalindicators() and/or kw.calc_HWLWtidalindicators(). The default is None.
Returns#
- figmatplotlib.figure.Figure
Figure handle.
- axmatplotlib.axes._axes.Axes
Figure axis handle.
kenmerkendewaarden.slotgemiddelden#
Computation of slotgemiddelden of waterlevels and extremes
- kenmerkendewaarden.slotgemiddelden.calc_slotgemiddelden(df_meas: DataFrame, df_ext: DataFrame = None, min_coverage: float = None, clip_physical_break: bool = False)[source]#
Compute slotgemiddelden from measurement timeseries and optionally also from extremes timeseries. A simple linear trend is used to avoid all pretend-accuracy. However, when fitting a linear trend on a limited amount of data, the nodal cycle and wind effects will cause the model fit to be inaccurate. It is wise to use at least 30 years of data for a valid fit, this is >1.5 times the nodal cycle.
Parameters#
- df_measpd.DataFrame
the timeseries of measured waterlevels.
- df_extpd.DataFrame, optional
the timeseries of extremes (high and low waters). The default is None.
- min_coveragefloat, optional
Set yearly means to nans for years that do not have sufficient data coverage. The default is None.
- clip_physical_breakbool, optional
Whether to exclude the part of the timeseries before physical breaks like estuary closures. The default is False.
Returns#
- slotgemiddelden_dictdict
dictionary with yearly means and model fits, optionally also for extremes and corresponding tidal range.
- kenmerkendewaarden.slotgemiddelden.plot_slotgemiddelden(slotgemiddelden_dict: dict, slotgemiddelden_dict_all: dict = None)[source]#
plot timeseries of yearly mean waterlevels and corresponding model fits.
Parameters#
- slotgemiddelden_dictdict
Output from kw.calc_slotgemiddelden containing timeseries of yearly mean waterlevels and corresponding model fits.
- slotgemiddelden_dict_alldict, optional
Optionally provide another dictionary with unfiltered mean waterlevels. Only used to plot the mean waterlevels (in grey). The default is None.
Returns#
- figmatplotlib.figure.Figure
Figure handle.
- axmatplotlib.axes._axes.Axes
Figure axis handle.
kenmerkendewaarden.havengetallen#
Computation of havengetallen
- kenmerkendewaarden.havengetallen.calc_HWLW_springneap(df_ext: DataFrame, min_coverage=None, moonculm_offset: int = 4)[source]#
- kenmerkendewaarden.havengetallen.calc_havengetallen(df_ext: DataFrame, return_df_ext=False, min_coverage=None, moonculm_offset: int = 4)[source]#
havengetallen consist of the extreme (high and low) median values and the extreme median time delays with respect to the moonculmination. Besides that it computes the tide difference for each cycle and the tidal period. All these indicators are derived by dividing the extremes in hour-classes with respect to the moonculminination.
Parameters#
- df_extpd.DataFrame
DataFrame with extremes (highs and lows, no aggers). The last 10 years of this timeseries are used to compute the havengetallen.
- return_dfbool
Whether to return the enriched input dataframe. Default is False.
- min_coveragefloat, optional
The minimal required coverage (between 0 to 1) of the df_ext timeseries to consider the statistics to be valid. It is the factor between the actual amount and the expected amount of high waters in the series. Note that the expected amount is not an exact extimate, so min_coverage=1 will probably result in nans even though all extremes are present. The default is None.
- moonculm_offsetint, optional
Offset between moonculmination and extremes. Passed on to calc_HWLW_moonculm_combi. The default is 4, which corresponds to a 2-day offset, which is applicable to the Dutch coast.
Returns#
- df_havengetallenpd.DataFrame
DataFrame with havengetallen for all hour-classes. 0 corresponds to spring, 6 corresponds to neap, mean is mean.
- df_ext_culmpd.DataFrame
An enriched copy of the input DataFrame including a ‘culm_hr’ column.
- kenmerkendewaarden.havengetallen.plot_HWLW_pertimeclass(df_ext: DataFrame, df_havengetallen: DataFrame)[source]#
Plot the extremes for each hour-class, including a median line.
Parameters#
- df_extpd.DataFrame
DataFrame with measurement extremes, as provided by kw.calc_havengetallen().
- df_havengetallenpd.DataFrame
DataFrame with havengetallen for all hour-classes, as provided by kw.calc_havengetallen().
Returns#
- figmatplotlib.figure.Figure
Figure handle.
- axmatplotlib.axes._axes.Axes
Figure axis handle.
- kenmerkendewaarden.havengetallen.plot_aardappelgrafiek(df_havengetallen: DataFrame)[source]#
Plot the median values of each hour-class in a aardappelgrafiek.
Parameters#
- df_havengetallenpd.DataFrame
DataFrame with havengetallen for all hour-classes, as provided by kw.calc_havengetallen().
Returns#
- figmatplotlib.figure.Figure
Figure handle.
- axmatplotlib.axes._axes.Axes
Figure axis handle.
kenmerkendewaarden.gemiddeldgetij#
Computation of gemiddelde getijkromme
- kenmerkendewaarden.gemiddeldgetij.calc_gemiddeldgetij(df_meas: DataFrame, df_ext: DataFrame = None, min_coverage: float = None, freq: str = '60sec', nb: int = 0, nf: int = 0, scale_extremes: bool = False, scale_period: bool = False)[source]#
Generate an average tidal signal for average/spring/neap tide by doing a tidal analysis on a timeseries of measurements. The (subsets/adjusted) resulting tidal components are then used to make a raw prediction for average/spring/neap tide. These raw predictions can optionally be scaled in height (with havengetallen) and in time (to a fixed period of 12h25min). An n-number of backwards and forward repeats are added before the timeseries are returned, resulting in nb+nf+1 tidal periods.
Parameters#
- df_measpd.DataFrame
Timeseries of waterlevel measurements. The last 10 years of this timeseries are used to compute the getijkrommes.
- df_extpd.DataFrame, optional
Timeseries of waterlevel extremes (1/2 only). The last 10 years of this timeseries are used to compute the getijkrommes. The default is None.
- min_coveragefloat, optional
The minimal required coverage of the df_ext timeseries. Passed on to calc_havengetallen(). The default is None.
- freqstr, optional
Frequency of the prediction, a value of 60 seconds or lower is adivisable for decent results. The default is “60sec”.
- nbint, optional
Amount of periods to repeat backward. The default is 0.
- nfint, optional
Amount of periods to repeat forward. The default is 0.
- scale_extremesbool, optional
Whether to scale extremes with havengetallen. The default is False.
- scale_periodbool, optional
Whether to scale to 12h25min (for boi). The default is False.
Returns#
- gemgetij_dictdict
dictionary with Dataframes with gemiddeld getij for mean, spring and neap tide.
- kenmerkendewaarden.gemiddeldgetij.plot_gemiddeldgetij(gemgetij_dict: dict, gemgetij_dict_raw: dict = None, tick_hours: int = None)[source]#
Default plotting function for gemiddeldgetij dictionaries.
Parameters#
- gemgetij_dictdict
dictionary as returned from kw.calc_gemiddeldgetij().
- gemgetij_raw_dictdict, optional
dictionary as returned from kw.calc_gemiddeldgetij() e.g. with uncorrected values. The default is None.
- ticks_12hbool, optional
whether to use xaxis ticks of 12 hours, otherwise automatic but less nice values
Returns#
- figmatplotlib.figure.Figure
Figure handle.
- axmatplotlib.axes._axes.Axes
Figure axis handle.
kenmerkendewaarden.overschrijding#
Computation of probabilities (overschrijdingsfrequenties) of extreme waterlevels
- kenmerkendewaarden.overschrijding.calc_overschrijding(df_ext: ~pandas.core.frame.DataFrame, dist: dict = None, inverse: bool = False, clip_physical_break: bool = False, rule_type: str = None, rule_value: (<class 'pandas._libs.tslibs.timestamps.Timestamp'>, <class 'float'>) = None, interp_freqs: list = None)[source]#
Compute exceedance/deceedance frequencies based on measured extreme waterlevels.
Parameters#
- df_extpd.DataFrame, optional
The timeseries of extremes (high and low waters). The default is None.
- distdict, optional
A pre-filled dictionary with a Hydra-NL and/or validation distribution. The default is None.
- inversebool, optional
Whether to compute deceedance instead of exceedance frequencies. The default is False.
- clip_physical_breakbool, optional
Whether to exclude the part of the timeseries before physical breaks like estuary closures. The default is False.
- rule_typestr, optional
break/linear/None, passed on to apply_trendanalysis(). The default is None.
- rule_value(pd.Timestamp, float), optional
Value corresponding to rule_type, pd.Timestamp (or anything understood by pd.Timestamp) in case of rule_type=’break’, float in case of rule_type=’linear’. The default is None.
- interp_freqslist, optional
The frequencies to interpolate to, providing this will result in a “geinterpoleerd” key in the returned dictionary. The default is None.
Returns#
- distdict
A dictionary with several distributions.