seismometer.data.pandas_helpers.merge_windowed_event¶
- seismometer.data.pandas_helpers.merge_windowed_event(predictions, predtime_col, events, event_label, pks, *, min_leadtime_hrs=0, window_hrs=None, event_base_val_col='Value', event_base_time_col='Time', event_base_val_dtype='float', sort=True, merge_strategy='forward', impute_val_with_time=1, impute_val_no_time=0)¶
Merges a single windowed event into a predictions dataframe
Adds two new event columns: a _Value column with the event value and a _Time column with the event time. Ground-truth labeling for a model is considered an event and can have a time associated with it.
Joins on a set of keys and associates the first event occurring after the prediction time. The following special cases are also applied:
Early predictions drop timing - if a prediction occurs before all recorded events of the type, the label is kept for analyses but the time is removed. Imputation of no event to negative label - if no row in the events frame is present for the prediction keys, it is assumed to be a Negative label (default 0) but will not have an event time.
- Parameters:
predictions (pd.DataFrame) – The predictions or features frame where each row represents a prediction.
predtime_col (str) – The column in the predictions frame indicating the timestamp when inference occurred.
events (pd.DataFrame) – The narrow events dataframe
event_label (str) – The category name of the event to merge, expected to be a value in events.Type.
pks (list[str]) – A list of primary keys on which to perform the merge, keys are column names occurring in both predictions and events dataframes.
min_leadtime_hrs (Number, optional) – The number of hour offset to be required for prediction, by default 0.
window_hrs (Optional[Number], optional) – The number of hours the window of predictions of interest should be limited to, by default None. If None, then all predictions occurring before a known event will be included. If used with min_leadtime_hrs, the entire window is shifted maintaining its size. The maximum lookback for a prediction is window_hrs + min_leadtime_hrs.
event_base_val_col (str, optional) – The name of the column in the events frame to merge as the _Value, by default ‘Value’.
event_base_val_dtype (str) – The data type to cast the event value column to, by default ‘float’.
event_base_time_col (str, optional) – The name of the column in the events frame to merge as the _Time, by default ‘Time’.
sort (bool) – Whether or not to sort the predictions/events dataframes, by default True.
merge_strategy (str) – The method to use when merging the event data, by default ‘forward’. Options are ‘forward’, ‘nearest’, ‘first’, ‘last’, and ‘count’. See seismometer.configuration.model for more information.
impute_val_with_time (Optional[Number|str], optional) – The value to impute for the label if timestamp exists, defaults to 1.
impute_val_no_time (Optional[Number|str], optional) – The value to impute for the label if no timestamp exists, defaults to 0.
- Returns:
The predictions dataframe with the new time and value columns for the event specified.
- Return type:
pd.DataFrame
- Raises:
ValueError – At least one column in pks must be in both the predictions and events dataframes.