NestedLogit.preprocess_model_data#

NestedLogit.preprocess_model_data(choice_df, utility_equations)[source]#

Pre-process the model initiation inputs into a format that can be used by the PyMC model.

This method prepares the 3D design matrix X, fixed covariate matrix F (if applicable), and the encoded response vector y, while also extracting and storing relevant metadata such as alternatives, fixed covariate names, product index mappings, and nesting structures.

Parameters:

choice_dfpd.DataFrame: A pandas DataFrame containing the observed choices and covariates for each alternative. Each row represents an individual choice observation.
utility_equationslist[str]: A list of model formulas, one per alternative. Each formula should be of the form: "alt_name ~ alt_covariates | fixed_covariates". The left-hand side identifies the alternative name; the right-hand side specifies the covariates used to explain utility for that alternative.

Returns:

Xnp.ndarray: A 3D numpy array of shape (n_observations, n_alternatives, n_covariates), representing the covariate tensor for alternative-specific attributes.
Fnp.ndarray | None: A 2D numpy array (n_observations, n_fixed_covariates) for covariates shared across alternatives, or None if no such covariates are used.
ynp.ndarray: A 1D numpy array of encoded target labels (integers), where each entry represents the chosen alternative for an observation.

Notes

Updates internal state: assigns X_data, F, alternatives, fixed_covar, y,

prod_indices, nest_indices, all_nests, lambda_lkup, and coords. - Handles multi-level nesting structures if provided in self.nesting_structure. - Assumes the existence of instance attributes depvar, covariates, and nesting_structure.