Internals

Plotting

Binary classification

calibration(truth, output, *[, highlight, axis])

Plots the calibration curve for the model.

compare_series(plotdata, cohort_col, ...[, ...])

Creates a line plot of the data using cohorts as hue.

cohort_evaluation_vs_threshold(stats, ...[, ...])

Creates a 2x3 grid of individual performance metrics across cohorts.

cohorts_overlay(data, plot_func[, axis, ...])

Uses a passed plotting function to plot a line per given split.

cohorts_vertical(df, plot_func[, gs, ...])

Uses a passed plotting function to plot a line per given split.

evaluation(stats, *, ci_data[, truth, ...])

Generates a 2x3 plot showing the performance of a model.

histogram_stacked(y_label, output, *[, ...])

Plots a stacked histogram of the model output by class.

leadtime_violin(data, x_col, y_col, *[, ...])

Violin plot of leadtime across cohorts.

metric_vs_threshold(stats, metric, *[, ...])

Plots a metric vs threshold curve.

performance_metrics(stats, *[, conf, ...])

Single plot of sensitivity, specificity, and PPV.

ppv_vs_sensitivity(ppv, sensitivity, ...[, ...])

Plots the PPV vs Sensitivity (precision-recall curve).

recall_condition(ppcr, recall, thresholds, ...)

Plots the recall of a model against the predicted condition rate.

singleROC(tpr, fpr, thresholds[, ...])

Creates an ROC plot.

Utility Functions

decorators.render_as_svg(plot_fn)

Given a plot function that retuns a figure, render to SVG and close the Figure

Data Manipulation

Cohorts

cohorts.find_bin_edges(series[, thresholds])

Creates list of bin edges from a series of continuous numeric data and list of inner thresholds.

cohorts.get_cohort_data(df, cohort_feature, ...)

Convenience function to create and format data for use in the cohort plots.

cohorts.get_cohort_performance_data(df, ...)

Generates a dataframe with particular performance metrics (accuracy, sensitivity, specificity, ppv, npv, and flag rate (predicted positive condition rate)) for particular threshold values and cohort.

cohorts.has_good_binning(bin_ixs, bin_edges)

Verifies that the binning is sound by making sure lists are equal length.

cohorts.label_cohorts_categorical(series[, ...])

Bin a categorical series of data, reduced to a set of category values.

cohorts.label_cohorts_numeric(series[, splits])

Bin a continuous numeric series of data, based on thresholds of inner bin edges.

cohorts.resolve_cohorts(series[, splits])

Bin a series of data based on the defined splits if defined.

cohorts.resolve_col_data(df, feature)

Handles resolving feature from either being a series or specifying a series in the dataframe.

Pandas Helpers

pandas_helpers.event_score(merged_frame, ...)

Reduces a dataframe of all predictions to a single row of significance; such as the max or most recent value for an entity.

pandas_helpers.event_time(event)

Converts an event name into the time column name.

pandas_helpers.event_value(event)

Converts an event name into the value column name.

pandas_helpers.post_process_event(dataframe, ...)

Infers and casts events.

pandas_helpers.merge_windowed_event(...[, ...])

Merges a single windowed event into a predictions dataframe

pandas_helpers.is_valid_event(dataframe, ...)

Creates a mask excluding rows (False) where the event occurs before the reference time.

pandas_helpers.try_casting(dataframe, ...)

Attempts to cast a column to a specified data type inplace.

Performance

performance.as_percentages(proba)

Converts a probability in the 0-1 range to a percentage in the 0-100 range.

performance.as_probabilities(perc)

Converts a percentage in the 0-100 range to a probability in the 0-1 range.

performance.assert_valid_performance_metrics_df(df)

Determines whether a passed dataframe has either all or a subset of columns that likely indicate it was generated by calculate_bin_stats.

performance.calculate_bin_stats([y_true, ...])

Calculate summary statistics from y_true and y_pred (y_proba[:,1] for binary classification) arrays.

performance.calculate_eval_ci(stats, truth, ...)

Calculate confidence intervals for ROC, PR, and other performance metrics from a stats frame.

performance.calculate_nnt(arr[, rho])

Calculates NNT (Number Needed to Treat) for the relative risk reduction, rho, and a perfect-ARR (absolute risk reduction), ie PPV.

Seismogram Loaders

loader.ConfigOnlyHook

TypeAlias for a callable taking a ConfigProvider, which returns a DataFrame.

loader.ConfigFrameHook

TypeAlias for a callable taking a ConfigProvider and a DataFrame, which returns a DataFrame.

loader.MergeFramesHook

TypeAlias for a callable taking a ConfigProvider and two DataFrames, which returns a DataFrame.

loader.SeismogramLoader(config[, ...])

A data loading pipeline using three types of hooks:

loader.SeismogramLoader.load_data([...])

Entry point for loading data for a Seismogram session.

loader.loader_factory(config[, post_load_fn])

Construct a SeismogramLoader from the provided configuration.

loader.event.parquet_loader(config)

Loads the events frame from a parquet file based on config.event_path.

loader.event.post_transform_fn(config, events)

Converts the Time column in events to a datetime64[ns] type, to be compatible with other operations.

loader.event.merge_onto_predictions(config, ...)

Merges each configured event onto the predictions dataframe.

loader.prediction.parquet_loader(config)

Load the predictions frame from a parquet file based on config.prediction_path.

loader.prediction.assumed_types(config, ...)

Convert the loaded predictions dataframe to the expected types.

loader.prediction.dictionary_types(config, ...)

Convert the loaded predictions dataframe to the expected types.

Summaries

summaries.default_cohort_summaries(...)

Generate a dataframe of summary counts from the input dataframe.

summaries.score_target_cohort_summaries(...)

Generate a dataframe of summary counts from the input dataframe.

Low-level patterns

Patterns

Singleton

Metaclass for implementing the single instance pattern.

Decorators

DiskCachedFunction(cache_name, save_fn, load_fn)