seismometer.data.performance.calculate_bin_stats¶
- seismometer.data.performance.calculate_bin_stats(y_true=None, y_pred=None, keep_score_values=False, not_point_thresholds=False, rho=None, threshold_precision=0)¶
Calculate summary statistics from y_true and y_pred (y_proba[:,1] for binary classification) arrays. Supports y_true & y_pred as individual series-likes or as a dataframe with true and proba columns.
- Parameters:
y_true (Optional[pd.Series], optional) – Series like of binary labels.
y_pred (Optional[pd.Series], optional) – Series like of probabilities for positive class.
keep_score_values (bool, optional) – Flag to prevent attempts to convert score to percentage (0-100), default False.
not_point_thresholds (bool, optional) – If True, does not use point thresholds, by default False; uses 0-100.
rho (float, optional) – The relative risk reduction for NNT calculation, by default DEFAULT_RHO.
threshold_precision (int, optional) – Number of decimal places to use when generating thresholds as percentages. - E.g., threshold_precision=0 yields thresholds like 0, 1, …, 100 (coarse). - threshold_precision=2 yields 0.00, 0.01, …, 100.00 (fine-grained). - Higher values improve AUC approximation but increase computation cost. By default 0.
- Return type:
pd.DataFrame of stats, rows for each threshold value between 0 and 100 with columns for basic statistics.