seismometer.data.pandas_helpers.get_model_scores

seismometer.data.pandas_helpers.get_model_scores(dataframe, entity_keys, score_col, ref_time, aggregation_method='max', per_context_id=False)

Reduces a dataframe of all predictions to a single row of significance; such as the max or most recent value for an entity. Supports max/min for value only scores, and last/first if a reference timestamp is provided.

Parameters:
  • merged_frame (pd.DataFrame) – The dataframe with score and event data, such as those having an event added via merge_windowed_event.

  • pks (list[str]) – A list of identifying keys on which to aggregate, such as Id.

  • score_col (str) – The column name containing the score value.

  • ref_time (Optional[str], optional) – The column name containing the time to consider, by default None.

  • aggregation_method (str, optional) – A string describing the method to select a value, by default ‘max’.

  • per_context_id (bool, optional) – If True, limits data to one row per context_id, by default False.

  • dataframe (DataFrame)

  • entity_keys (list[str])

Returns:

The reduced dataframe with one row per combination of pks.

Return type:

pd.DataFrame