Skip to content

Service User Manual Dataset from MinIO

Introduction

The Dataset from MinIO service allows you to use a dataset already registered on the platform and stored in MinIO storage within a Data Analytics System pipeline.
The service transfers the data to the Kubernetes volume of the BDA application, making it available to subsequent blocks without the need for manual downloads.

Service Features

  1. Access to datasets registered in the Data Analytics System
    The input must be a dataset already present in the platform’s Dataset section and registered with MinIO storage.

  2. Loading the dataset into the pipeline
    By connecting the input port to the dataset, the contents are downloaded from MinIO and written to the volume associated with the application.

  3. Output via Kubernetes volume
    In this service, the output is made available exclusively through a port based on a Kubernetes volume:

  4. the dataset is not automatically registered as a new asset,
  5. but it is immediately usable by the next service,
  6. and remains downloadable as a temporary asset.

  7. Simple and transparent integration
    No manual configuration is required: endpoint, bucket, and credentials are managed by the platform.

Service Usage

Configuration

In the Data Analytics System interface, simply:
- Select a dataset from those available,
- Connect it to the service’s input port,
- Connect the next service that has the corresponding input port.

Execution

When running the BDA application:
1. The service accesses the dataset registered on the Data Analytics System,
2. Retrieves the corresponding files from MinIO,
3. Makes them available in the Kubernetes volume /outputs connected to the next service.

Output

The dataset is available in the Kubernetes volume, accessible by other blocks in the pipeline.
It can be downloaded as a temporary asset but is not automatically registered in Data Analytics System as a new dataset.

Advantages

  • Direct integration: datasets registered on the Data Analytics System (with MinIO storage) are immediately usable in pipelines.
  • Efficiency: no manual download or configuration steps.
  • Flexibility: output in volume allows chaining multiple services without duplicating or registering intermediate data.
  • Accessibility: data remains downloadable as temporary assets, even without persistent registration.

Useful References