seismometer.data.pandas_helpers.merge_windowed_event

seismometer.data.pandas_helpers.merge_windowed_event(predictions, predtime_col, events, event_label, pks, *, min_leadtime_hrs=0, window_hrs=None, event_base_val_col='Value', event_base_time_col='Time', event_base_val_dtype='float', sort=True, merge_strategy='forward', impute_val_with_time=1, impute_val_no_time=0)

Merges a single windowed event into a predictions dataframe

Adds two new event columns: a _Value column with the event value and a _Time column with the event time. Ground-truth labeling for a model is considered an event and can have a time associated with it.

Joins on a set of keys and associates the first event occurring after the prediction time. The following special cases are also applied:

Invalidate late predictions - if a prediction occurs after all recorded events of the type, the prediction is considered invalid wrt to the event and the _Value is set to -1. Early predictions drop timing - if a prediction occurs before all recorded events of the type, the label is kept for analyses but the time is removed. Imputation of no event to negative label - if no row in the events frame is present for the prediction keys, it is assumed to be a Negative label (default 0) but will not have an event time.

Parameters:
  • predictions (pd.DataFrame) – The predictions or features frame where each row represents a prediction.

  • predtime_col (str) – The column in the predictions frame indicating the timestamp when inference occurred.

  • events (pd.DataFrame) – The narrow events dataframe

  • event_label (str) – The category name of the event to merge, expected to be a value in events.Type.

  • pks (list[str]) – A list of primary keys on which to perform the merge, keys are column names occurring in both predictions and events dataframes.

  • min_leadtime_hrs (Number, optional) – The number of hour offset to be required for prediction, by default 0. If set to 1, a prediction made within the hour before the last associated event will be invalidated and set to -1 even though it occurred before the event time.

  • window_hrs (Optional[Number], optional) – The number of hours the window of predictions of interest should be limited to, by default None. If None, then all predictions occurring before a known event will be included. If used with min_leadtime_hrs, the entire window is shifted maintaining its size. The maximum lookback for a prediction is window_hrs + min_leadtime_hrs.

  • event_base_val_col (str, optional) – The name of the column in the events frame to merge as the _Value, by default ‘Value’.

  • event_base_val_dtype (str) – The data type to cast the event value column to, by default ‘float’.

  • event_base_time_col (str, optional) – The name of the column in the events frame to merge as the _Time, by default ‘Time’.

  • sort (bool) – Whether or not to sort the predictions/events dataframes, by default True.

  • merge_strategy (str) – The method to use when merging the event data, by default ‘forward’. Options are ‘forward’, ‘nearest’, ‘first’, ‘last’, and ‘count’. See seismometer.configuration.model for more information.

  • impute_val_with_time (Optional[Number|str], optional) – The value to impute for the label if timestamp exists, defaults to 1.

  • impute_val_no_time (Optional[Number|str], optional) – The value to impute for the label if no timestamp exists, defaults to 0.

Returns:

The predictions dataframe with the new time and value columns for the event specified.

Return type:

pd.DataFrame

Raises:

ValueError – At least one column in pks must be in both the predictions and events dataframes.