User Manual for the Dataset to MinIO Service
Introduction
The Dataset to MinIO service allows you to save a dataset produced in an Data Analytics System pipeline to a MinIO storage. The dataset is transferred from the BDA application's Kubernetes volume to a MinIO bucket and registered as a dataset on the Data Analytics System. It is a data preparation component, useful for making the intermediate or final results of a pipeline persistent.
Service Features
-
Acquisition from Kubernetes volume The service receives the dataset as input through a volume-type port (
/inputs), connected to the previous block in the pipeline. -
Upload on MinIO
All files on the volume are uploaded to the configured MinIO bucket, preserving the folder structure. -
Automatic registration on the Data Analytics System
The service output is a dataset recorded on the Data Analytics System, with MinIO storage, ready to be reused in pipelines or downloaded from the interface. -
Transparent connection management
The user does not need to worry about endpoints, buckets, or credentials: these parameters are managed automatically through the platform's ports.
Use of the Service
Configuration
In the Data Analytics System interface it is sufficient to: - Connect the input port to the dataset produced by the previous block
Execution
When running the BDA application: 1. The service receives the data through the input port from the previous service, 2. It uploads them to the corresponding MinIO bucket, 3. It automatically registers the output as a dataset in the Data Analytics System.
Output
The output dataset is available as a registered asset in the Dataset section of the Data Analytics System, ready to be linked to other services or downloaded. To view the dataset, simply open the dropdown menu of the workflow corresponding to the datasets and click on the dataset tagged as "out."
Advantages
- Data persistence: intermediate or final results of the pipeline become officially registered datasets.
- Seamless integration: no manual upload steps to MinIO.
- Clear organization: the structure of local folders is maintained in MinIO storage.
- Immediate reuse: the recorded dataset can be used in other pipelines or downloaded from the interface.