seismometer.data.cohorts.get_cohort_performance_data¶
- seismometer.data.cohorts.get_cohort_performance_data(df, cohort_feature, *, proba, true='TARGET', splits=None, censor_threshold=10)¶
Generates a dataframe with particular performance metrics (accuracy, sensitivity, specificity, ppv, npv, and flag rate (predicted positive condition rate)) for particular threshold values and cohort.
- Parameters:
df (pd.DataFrame) – Dataframe of observations to use, must contain the column specified in cohort_feature. Additionally, must contain columns specified by proba and true if using strings and not arrays.
cohort_feature (str) – String specification of the dataframe column to split. Currently supports numeric and categorical columns.
proba (Union[str, SeriesOrArray]) –
The predictions made by the model under review.
If string - must be a column in the dataframe.
If series or array - must be the same length as the dataframe.
true (Union[str, SeriesOrArray], default="TARGET") –
The true label being predicted.
If string - must be a column in the dataframe.
If series or array - must be the same length as the dataframe and int values.
splits (Optional[List], default=None) – Optional - the numeric values to split cohorts or category values to include, treats each category value as its own split. If None, will create a dichotomy for numeric values split at the mean.
censor_threshold (int, default=10) – Minimum number of observations in a cohort to calculate performance metrics.
- Returns:
Performance statistics for particular threshold values by cohort.
- Return type:
pd.DataFrame