Skip to content

User Manual for the Predict Batch Service

Introduction

The Predict Batch service allows you to apply a Machine Learning model to a dataset and obtain as a result a new dataset containing the predictions. It is designed to be used in analysis and validation pipelines, and is compatible with models generated by other Data Analytics System services that use the same libraries (e.g., scikit-learn and MLflow).

Service Features

  1. Loading model Supports models trained with scikit-learn or saved via MLflow, compatible with other Data Analytics System services.

  2. Prediction on the dataset
    It executes the configured prediction method (predict, predict_proba, or other) and adds the results as new columns to the dataset.

  3. Generation of Support Charts If the user specifies the prediction_type parameter, charts summarizing the distribution of results are created:

  4. regression → distribution of predicted values,
  5. binary classification → class distribution,
  6. multiclass classification → distribution by class,
  7. probability → distribution of average probabilities.

⚠️ The charts are built only on the prediction values obtained; the model is not queried further. For this reason, the prediction_type parameter must be consistent with the selected predict_method.

  1. Result Registration The dataset enriched with the prediction columns is saved and made available as a new asset within the Data Analytics System pipeline.

Configurable Parameters

  • predict_method (optional, default: predict) Specifies which model method should be used to calculate predictions.
  • Example: predict for discrete labels or numeric values, predict_proba for probabilities.

  • feature_method (optional, default: none) Allows specifying a method or attribute of the model from which to extract the expected feature names. Useful for ensuring correct alignment between the dataset columns and the model. If not set, the dataset columns are used directly.

  • prediction_type (optional, default: none)
    It defines the type of problem and, consequently, the charts to be generated from the results:

  • regression, binary, multiclass, proba.
    It must be consistent with the chosen predict_method. If not set, the service will only perform the prediction without generating charts.

Use of the Service

  1. Select an input dataset and a compatible model (sklearn/MLflow).
  2. Configure, if necessary, the parameters predict_method, feature_method, and prediction_type.
  3. Start execution.
  4. Get as output a dataset enriched with predictions, and if configured, summary charts.

Advantages

  • Direct compatibility with models generated in Data Analytics System (sklearn, MLflow).
  • Flexibility in choosing the prediction method.
  • Visual support to understand the trend of the results.
  • Automation: the enriched dataset is ready to be used in further analyses or pipelines.

Useful References