seismometer.data.pandas_helpers.event_score

seismometer.data.pandas_helpers.event_score(merged_frame, pks, score, ref_time=None, ref_event=None, aggregation_method='max')

Reduces a dataframe of all predictions to a single row of significance; such as the max or most recent value for an entity. Supports max/min for value only scores, and last/first if a reference timestamp is provided.

Parameters:
  • merged_frame (pd.DataFrame) – The dataframe with score and event data, such as those having an event added via merge_windowed_event.

  • pks (list[str]) – A list of identifying keys on which to aggregate, such as Id.

  • score (str) – The column name containing the score value.

  • ref_time (Optional[str], optional) – The column name containing the time to consider, by default None. Required when aggregation_method requires a time reference (e.g., ‘first’, ‘last’). Note that we drop NaT rows first and consequently we pick the row satisfying the aggregation_method that also corresponds to a positive case for the associated event.

  • ref_event (Optional[str], optional) – The column name containing the event to consider, by default None. Required when aggregation_method requires an event reference to prioritize positive cases (e.g., ‘max’, ‘min’) Note that we pick the row satisfying the aggregation_method among scores associated with a positive case of ref_event if there are any positive cases. In case there are no positive case, we just pick the row satisfying the aggregation_method.

  • aggregation_method (str, optional) – A string describing the method to select a value, by default ‘max’.

Returns:

The reduced dataframe with one row per combination of pks.

Return type:

pd.DataFrame