seismometer.configuration.model.DataUsage

pydantic model seismometer.configuration.model.DataUsage

The definitions of data to use in a notebook run.

This structure defines what data to load and how to use it. The entity_id and context_id are the possible keys for joining events and predictions, and are also used to summarize predictions to a single entity. Primary output and target are the score and target used in default performance analysis.

The features and scores list, when defined, limit the loading of data from the predictions file to only those inputs and outputs (plus primary_score and cohort attributes). The events similarly limits the event types that are merged into the working dataframe and available to analyses.

Show Entity Relationship Diagram
digraph "Entity Relationship Diagram created by erdantic" {
   graph [fontcolor=gray66,
      fontname="Times New Roman,Times,Liberation Serif,serif",
      fontsize=9,
      nodesep=0.5,
      rankdir=LR,
      ranksep=1.5
   ];
   node [fontname="Times New Roman,Times,Liberation Serif,serif",
      fontsize=14,
      label="\N",
      shape=plain
   ];
   edge [dir=both];
   "seismometer.configuration.model.Cohort"   [label=<<table border="0" cellborder="1" cellspacing="0"><tr><td port="_root" colspan="2"><b>Cohort</b></td></tr><tr><td>source</td><td port="source">str</td></tr><tr><td>display_name</td><td port="display_name">str</td></tr><tr><td>splits</td><td port="splits">Optional[list[Any]]</td></tr></table>>,
      tooltip="seismometer.configuration.model.Cohort&#xA;&#xA;The definition of an expected cohort attribute.&#xA;&#xA;This structure defines \
a cohort attribute that should be available for selection in a notebook.&#xA;For a categorical data, the splits should all be existing \
values and the list limits the selections available.&#xA;For numerical data, the splits should be the inner boundaries of bucketing; \
with a high and low being added&#xA;outside theses values.&#xA;"];
   "seismometer.configuration.model.DataUsage"   [label=<<table border="0" cellborder="1" cellspacing="0"><tr><td port="_root" colspan="2"><b>DataUsage</b></td></tr><tr><td>entity_id</td><td port="entity_id">str</td></tr><tr><td>context_id</td><td port="context_id">Optional[str]</td></tr><tr><td>primary_output</td><td port="primary_output">str</td></tr><tr><td>primary_target</td><td port="primary_target">str</td></tr><tr><td>predict_time</td><td port="predict_time">str</td></tr><tr><td>comparison_time</td><td port="comparison_time">str</td></tr><tr><td>event_table</td><td port="event_table">EventTableMap</td></tr><tr><td>outputs</td><td port="outputs">list[str]</td></tr><tr><td>cohorts</td><td port="cohorts">list[Cohort]</td></tr><tr><td>features</td><td port="features">list[str]</td></tr><tr><td>events</td><td port="events">list[Event]</td></tr><tr><td>censor_min_count</td><td port="censor_min_count">int</td></tr></table>>,
      tooltip="seismometer.configuration.model.DataUsage&#xA;&#xA;The definitions of data to use in a notebook run.&#xA;&#xA;This structure defines \
what data to load and how to use it.&#xA;The entity_id and context_id are the possible keys for joining events and predictions, \
and are also used to&#xA;summarize predictions to a single entity.&#xA;Primary output and target are the score and target used in \
default performance analysis.&#xA;&#xA;The features and scores list, when defined, limit the loading of data from the predictions \
file to only those&#xA;inputs and outputs (plus primary_score and cohort attributes).&#xA;The events similarly limits the event \
types that are merged into the working dataframe and available to analyses.&#xA;"];
   "seismometer.configuration.model.DataUsage":cohorts:e -> "seismometer.configuration.model.Cohort":_root:w   [arrowhead=crownone,
      arrowtail=nonenone];
   "seismometer.configuration.model.Event"   [label=<<table border="0" cellborder="1" cellspacing="0"><tr><td port="_root" colspan="2"><b>Event</b></td></tr><tr><td>source</td><td port="source">list[str]</td></tr><tr><td>display_name</td><td port="display_name">str</td></tr><tr><td>window_hr</td><td port="window_hr">Optional[float]</td></tr><tr><td>offset_hr</td><td port="offset_hr">float</td></tr><tr><td>impute_val</td><td port="impute_val">Optional[Any]</td></tr><tr><td>usage</td><td port="usage">Optional[str]</td></tr><tr><td>aggregation_method</td><td port="aggregation_method">Optional[Literal['min', 'max', 'first', 'last']]</td></tr><tr><td>merge_strategy</td><td port="merge_strategy">Optional[Literal['first', 'last', 'nearest', 'forward', 'count']]</td></tr></table>>,
      tooltip="seismometer.configuration.model.Event&#xA;&#xA;The definition of an event.&#xA;&#xA;This structure defines an event and which predictions \
are relevant to it.&#xA;If a window is specified:&#xA;&#xA;- the offset_hr defines the upper bound of the window relative to the \
event time,&#xA;  has default value of 0 (event time),&#xA;- the window_hr defines the size of the window looking backwards from \
the offset_hr.&#xA;&#xA;If an event is present but the prediction is not in the window, the predictions are ignored for the event \
type.&#xA;If multiple events are present then the closest one is used.&#xA;&#xA;The impute_val is used as the value for the event \
if no event is present.&#xA;&#xA;Usage is used for context when selecting events, such as analyzing performance of the model with \
respect to a&#xA;target or when comparing an expected intervention to a monitored outcome.&#xA;"];
   "seismometer.configuration.model.DataUsage":events:e -> "seismometer.configuration.model.Event":_root:w   [arrowhead=crownone,
      arrowtail=nonenone];
   "seismometer.configuration.model.EventTableMap"   [label=<<table border="0" cellborder="1" cellspacing="0"><tr><td port="_root" colspan="2"><b>EventTableMap</b></td></tr><tr><td>type</td><td port="type">str</td></tr><tr><td>time</td><td port="time">str</td></tr><tr><td>value</td><td port="value">str</td></tr></table>>,
      tooltip="seismometer.configuration.model.EventTableMap&#xA;&#xA;Override mapping of event table columns.&#xA;"];
   "seismometer.configuration.model.DataUsage":event_table:e -> "seismometer.configuration.model.EventTableMap":_root:w   [arrowhead=noneteetee,
      arrowtail=nonenone];
}

Show JSON schema
{
   "title": "DataUsage",
   "description": "The definitions of data to use in a notebook run.\n\nThis structure defines what data to load and how to use it.\nThe entity_id and context_id are the possible keys for joining events and predictions, and are also used to\nsummarize predictions to a single entity.\nPrimary output and target are the score and target used in default performance analysis.\n\nThe features and scores list, when defined, limit the loading of data from the predictions file to only those\ninputs and outputs (plus primary_score and cohort attributes).\nThe events similarly limits the event types that are merged into the working dataframe and available to analyses.",
   "type": "object",
   "properties": {
      "entity_id": {
         "default": "Id",
         "title": "Entity Id",
         "type": "string"
      },
      "context_id": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Context Id"
      },
      "primary_output": {
         "default": "Score",
         "title": "Primary Output",
         "type": "string"
      },
      "primary_target": {
         "default": "Target",
         "title": "Primary Target",
         "type": "string"
      },
      "predict_time": {
         "default": "Time",
         "title": "Predict Time",
         "type": "string"
      },
      "comparison_time": {
         "default": "",
         "title": "Comparison Time",
         "type": "string"
      },
      "event_table": {
         "$ref": "#/$defs/EventTableMap",
         "default": {
            "type": "Type",
            "time": "EventTime",
            "value": "Value"
         }
      },
      "outputs": {
         "default": [],
         "items": {
            "type": "string"
         },
         "title": "Outputs",
         "type": "array"
      },
      "cohorts": {
         "default": [],
         "items": {
            "$ref": "#/$defs/Cohort"
         },
         "title": "Cohorts",
         "type": "array"
      },
      "features": {
         "default": [],
         "items": {
            "type": "string"
         },
         "title": "Features",
         "type": "array"
      },
      "events": {
         "default": [],
         "items": {
            "$ref": "#/$defs/Event"
         },
         "title": "Events",
         "type": "array"
      },
      "censor_min_count": {
         "default": 10,
         "minimum": 10,
         "title": "Censor Min Count",
         "type": "integer"
      }
   },
   "$defs": {
      "Cohort": {
         "description": "The definition of an expected cohort attribute.\n\nThis structure defines a cohort attribute that should be available for selection in a notebook.\nFor a categorical data, the splits should all be existing values and the list limits the selections available.\nFor numerical data, the splits should be the inner boundaries of bucketing; with a high and low being added\noutside theses values.",
         "properties": {
            "source": {
               "title": "Source",
               "type": "string"
            },
            "display_name": {
               "default": "",
               "title": "Display Name",
               "type": "string"
            },
            "splits": {
               "anyOf": [
                  {
                     "items": {},
                     "type": "array"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": [],
               "title": "Splits"
            }
         },
         "required": [
            "source"
         ],
         "title": "Cohort",
         "type": "object"
      },
      "Event": {
         "description": "The definition of an event.\n\nThis structure defines an event and which predictions are relevant to it.\nIf a window is specified:\n\n- the offset_hr defines the upper bound of the window relative to the event time,\n  has default value of 0 (event time),\n- the window_hr defines the size of the window looking backwards from the offset_hr.\n\nIf an event is present but the prediction is not in the window, the predictions are ignored for the event type.\nIf multiple events are present then the closest one is used.\n\nThe impute_val is used as the value for the event if no event is present.\n\nUsage is used for context when selecting events, such as analyzing performance of the model with respect to a\ntarget or when comparing an expected intervention to a monitored outcome.",
         "properties": {
            "source": {
               "items": {
                  "type": "string"
               },
               "title": "Source",
               "type": "array"
            },
            "display_name": {
               "default": "",
               "title": "Display Name",
               "type": "string"
            },
            "window_hr": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Window Hr"
            },
            "offset_hr": {
               "default": 0,
               "title": "Offset Hr",
               "type": "number"
            },
            "impute_val": {
               "anyOf": [
                  {},
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Impute Val"
            },
            "usage": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Usage"
            },
            "aggregation_method": {
               "anyOf": [
                  {
                     "enum": [
                        "min",
                        "max",
                        "first",
                        "last"
                     ],
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "max",
               "title": "Aggregation Method"
            },
            "merge_strategy": {
               "anyOf": [
                  {
                     "enum": [
                        "first",
                        "last",
                        "nearest",
                        "forward",
                        "count"
                     ],
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "forward",
               "title": "Merge Strategy"
            }
         },
         "required": [
            "source"
         ],
         "title": "Event",
         "type": "object"
      },
      "EventTableMap": {
         "description": "Override mapping of event table columns.",
         "properties": {
            "type": {
               "default": "Type",
               "title": "Type",
               "type": "string"
            },
            "time": {
               "default": "EventTime",
               "title": "Time",
               "type": "string"
            },
            "value": {
               "default": "Value",
               "title": "Value",
               "type": "string"
            }
         },
         "title": "EventTableMap",
         "type": "object"
      }
   }
}

Fields:
Validators:
field censor_min_count: int = 10

The minimum size of a cohort to be considered displayable.

Constraints:
  • ge = 10

field cohorts: list[Cohort] = []

A list of all cohort attributes to make available in selections.

Validated by:
field comparison_time: str = ''

The timestamp to use as reference for comparison.

Validated by:
field context_id: str | None = None

A secondary identifier used to group an entity_id.

field entity_id: str = 'Id'

The identifier of the entity.

field event_table: EventTableMap = EventTableMap(type='Type', time='EventTime', value='Value')

Mapping of the non-id columns in events data.

field events: list[Event] = []

A list of all events to load.

Must have at least one target event.

Validated by:
field features: list[str] = []

A list of all features to load into predictions.

Can exclude any features that are specified elsewhere. If not specified, will load all columns from the specified location.

field outputs: list[str] = []

A list of all columns to consider outputs; does not need to include primary_output.

field predict_time: str = 'Time'

Column name of the timestamp for each prediction row.

field primary_output: str = 'Score'

Column name of the primary output of the model.

field primary_target: str = 'Target'

Display_name of the primary target event.

validator default_comparison  »  comparison_time

Return the default comparison_time.

Parameters:
  • comparison_time (str)

  • values (dict)

Return type:

str

validator reduce_cohorts_to_unique_names  »  cohorts

Reduces the list of cohorts to unique names.

Parameters:
  • cohorts (List[Cohorts]) – List of configured cohorts.

  • values (Any) – Values of the instance.

Returns:

The unique list of cohorts.

Return type:

list[Cohort]

validator reduce_events_to_unique_names  »  events

Reduces the list of events to unique names.

Parameters:
  • events (list[Event]) – List of configured events.

  • values (Any) – Values of the instance.

Returns:

The unique list of events.

Return type:

list[Cohort]