User Manual for the Predict Batch Service
Introduction
The Predict Batch service allows you to apply a Machine Learning model to a dataset and obtain as a result a new dataset containing the predictions. It is designed to be used in analysis and validation pipelines, and is compatible with models generated by other Data Analytics System services that use the same libraries (e.g., scikit-learn and MLflow).
Service Features
-
Loading model Supports models trained with scikit-learn or saved via MLflow, compatible with other Data Analytics System services.
-
Prediction on the dataset
It executes the configured prediction method (predict,predict_proba, or other) and adds the results as new columns to the dataset. -
Generation of Support Charts If the user specifies the prediction_type parameter, charts summarizing the distribution of results are created:
- regression → distribution of predicted values,
- binary classification → class distribution,
- multiclass classification → distribution by class,
- probability → distribution of average probabilities.
⚠️ The charts are built only on the prediction values obtained; the model is not queried further. For this reason, the prediction_type parameter must be consistent with the selected predict_method.
- Result Registration The dataset enriched with the prediction columns is saved and made available as a new asset within the Data Analytics System pipeline.
Configurable Parameters
- predict_method (optional, default:
predict) Specifies which model method should be used to calculate predictions. -
Example:
predictfor discrete labels or numeric values,predict_probafor probabilities. -
feature_method (optional, default: none) Allows specifying a method or attribute of the model from which to extract the expected feature names. Useful for ensuring correct alignment between the dataset columns and the model. If not set, the dataset columns are used directly.
-
prediction_type (optional, default: none)
It defines the type of problem and, consequently, the charts to be generated from the results: regression,binary,multiclass,proba.
It must be consistent with the chosenpredict_method. If not set, the service will only perform the prediction without generating charts.
Use of the Service
- Select an input dataset and a compatible model (sklearn/MLflow).
- Configure, if necessary, the parameters
predict_method,feature_method, andprediction_type. - Start execution.
- Get as output a dataset enriched with predictions, and if configured, summary charts.
Advantages
- Direct compatibility with models generated in Data Analytics System (sklearn, MLflow).
- Flexibility in choosing the prediction method.
- Visual support to understand the trend of the results.
- Automation: the enriched dataset is ready to be used in further analyses or pipelines.