Creating a Service

A Service is an autonomous dockerized application designed to process data. It can operate in one of two modes:

Batch: Executes by starting, processing data, and then terminating.
Streaming: Executes continuously, processing data indefinitely until the enclosing Workflow is manually stopped.

A Service is defined by:

I/O Ports
User-configurable execution parameters

I/O Ports represent the input and output interfaces of the Service, enabling interaction with other Data Analytics System Assets (Dataset, other Services, Model) within a Workflow.

The figure below illustrates the simplified architecture of a generic Service:

generic-service-structure

tutorial-label
We will now proceed step by step to create a simple Service from scratch and register it on the platform. This Service will compute basic statistical indicators—such as mean and standard deviation—for an input dataset, and store the results in an output dataset.

The input and output datasets will be read from and written to MinIO, respectively.

This Service will therefore include:

One input port of type Dataset
One output port of type Dataset
One configurable execution parameter

as illustrated below:

to-be-created-service

^{Concept of the Service to be implemented}

Workflow Steps

Develop the core program implementing the data processing logic.
Build a Docker image containing this program.
Push the Docker image to a Docker Registry.
Register the dockerized program on the Data Analytics System, thereby promoting it to a Service.

1. Core Program Development

Input Port

service-with-single-input-dataset

Prepare the Input Port for our Service by adding the following command-line arguments to the core program:

--input-dataset
--input-columns
--input-dataset.minio_bucket
--input-dataset.minIO_URL
--input-dataset.minIO_ACCESS_KEY
--input-dataset.minIO_SECRET_KEY

Example Code Snippet:

Argument Data Types

Values passed from the Data Analytics System to the core program will always be strings or numbers. Therefore, proper conversion to the desired data type must be handled in the code.

Python Example

Use str2bool for boolean values
Use str2json for JSON values

main.py

import argparse
import pandas as pd
from minio import Minio
import os

parser = argparse.ArgumentParser(description="Basic Statistical Indicators")

# CLI Arguments for the Input dataset port
parser.add_argument('--input-dataset', 
  dest='input_dataset', 
  type=str, 
  required=True
)

parser.add_argument('--input-columns', 
  dest='input_columns', 
  type=str, 
  required=True
) 

parser.add_argument('--input-dataset.minio_bucket', 
  dest='input_dataset_minio_bucket', 
  type=str, 
  required=True
)

parser.add_argument('--input-dataset.minIO_URL', 
  dest='input_dataset_minio_url', 
  type=str, 
  required=True
)

parser.add_argument('--input-dataset.minIO_ACCESS_KEY', 
  dest='input_dataset_minio_access_key', 
  type=str, 
  required=True
)

parser.add_argument('--input-dataset.minIO_SECRET_KEY', 
  dest='input_dataset_minio_secret_key', 
  type=str, 
  required=True
)

# ... see next step ...

Output Port

service-with-single-output-dataset

Next, configure the Output Port for our Service by adding the following command-line arguments to the core program:

--output-dataset
--output-dataset.minio_bucket
--output-dataset.minIO_URL
--output-dataset.minIO_ACCESS_KEY
--output-dataset.minIO_SECRET_KEY

Implementation Notes:

Similar to the input port, these arguments will allow the Service to write the processed dataset to the designated MinIO bucket. Ensure proper handling of authentication credentials and secure storage of sensitive information.

Example Code Snippet:

main.py

# CLI Arguments for the Output dataset port
parser.add_argument('--output-dataset', 
  dest='output_dataset', 
  type=str, 
  required=True
)

parser.add_argument('--output-dataset.minio_bucket', 
  dest='output_dataset_minio_bucket', 
  type=str, 
  required=True
)

parser.add_argument('--output-dataset.minIO_URL', 
  dest='output_dataset_minio_url', 
  type=str, 
  required=True
)

parser.add_argument('--output-dataset.minIO_ACCESS_KEY', 
  dest='output_dataset_minio_access_key', 
  type=str, 
  required=True
)

parser.add_argument('--output-dataset.minIO_SECRET_KEY', 
  dest='output_dataset_minio_secret_key', 
  type=str, 
  required=True
)

# ... see next step ...

Execution parameters setting from user

We can set the port usign auxialiary running parameter. This will allow us to specify the statistic marker to calculate in UI.

setting-service-param-from-ui

So we can add to the program the following argument on the command line:

--indicators-to-compute

Here below, the sample code for this further arguments:

main.py

# ... omitted - see previous steps ...

# CLI Argument for user-configurable execution parameter
parser.add_argument(
  '--indicators-to-compute', 
  dest='indicators_to_compute', 
  type=str, 
  required=True, 
  choices=['mean', 'standard_deviation', 'all_supported']
)

# parse_known_args allows us to ignore the other invocation arguments coming in
# from  Data Analytics System
args, unknowns = parser.parse_known_args()

# ... see next step ...

Here we have the core program schema for all needed arguments:

to-be-created-service-with-core-program-args

Implementation logic

Once you added all the arguments on the command line, you can complete the program with to calculate statistic indicators.

Her below, the complete code:

Final COre Program Code

main.py

import argparse
import pandas as pd
from minio import Minio
import os

parser = argparse.ArgumentParser(description="Basic Statistical Indicators")

# CLI Arguments for the Input dataset port
parser.add_argument('--input-dataset', 
  dest='input_dataset', 
  type=str, 
  required=True
)

parser.add_argument('--input-columns', 
  dest='input_columns', 
  type=str, 
  required=True
) 

parser.add_argument('--input-dataset.minio_bucket', 
  dest='input_dataset_minio_bucket', 
  type=str, 
  required=True
)

parser.add_argument('--input-dataset.minIO_URL', 
  dest='input_dataset_minio_url', 
  type=str, 
  required=True
)

parser.add_argument('--input-dataset.minIO_ACCESS_KEY', 
  dest='input_dataset_minio_access_key', 
  type=str, 
  required=True
)

parser.add_argument('--input-dataset.minIO_SECRET_KEY', 
  dest='input_dataset_minio_secret_key', 
  type=str, 
  required=True
)

# CLI Arguments for the Output dataset port
parser.add_argument('--output-dataset', 
  dest='output_dataset', 
  type=str, 
  required=True
)

parser.add_argument('--output-dataset.minio_bucket', 
  dest='output_dataset_minio_bucket', 
  type=str, 
  required=True
)

parser.add_argument('--output-dataset.minIO_URL', 
  dest='output_dataset_minio_url', 
  type=str, 
  required=True
)

parser.add_argument('--output-dataset.minIO_ACCESS_KEY', 
  dest='output_dataset_minio_access_key', 
  type=str, 
  required=True
)

parser.add_argument('--output-dataset.minIO_SECRET_KEY', 
  dest='output_dataset_minio_secret_key', 
  type=str, 
  required=True
)

# CLI Argument for user-configurable execution parameter
parser.add_argument(
  '--indicators-to-compute', 
  dest='indicators_to_compute', 
  type=str, 
  required=True, 
  choices=['mean', 'standard_deviation', 'all_supported']
)

# parse_known_args allows us to ignore the other invocation arguments coming in
# from  Data Analytics System
args, unknowns = parser.parse_known_args()

minio_client = Minio(
    args.input_dataset_minio_url.replace("http://", "").replace("https://", ""),
    access_key=args.input_dataset_minio_access_key,
    secret_key=args.input_dataset_minio_secret_key,
    secure=False
)

objects = list(
    minio_client.list_objects(
        args.input_dataset_minio_bucket, 
        prefix=args.input_dataset, 
        recursive=True
    )
)

if not objects:
  raise FileNotFoundError("No files found in the given MinIO folder path.")

# Assume only one CSV file
csv_object = objects[0]
csv_filename = os.path.basename(csv_object.object_name)

"""
Since the program is designed to read from MinIO, we need to handle
connection to such an object storage
"""
connection_details = {
    'key': args.input_dataset_minio_access_key,
    'secret': args.input_dataset_minio_secret_key,
    'client_kwargs': {
        'endpoint_url': f'{args.input_dataset_minio_url}'
    }
}

file_path = f"s3://{args.input_dataset_minio_bucket}/{args.input_dataset}/{csv_filename}"

try:
    # Read
    df = pd.read_csv(file_path, storage_options=connection_details)
    numeric_cols_df = df.select_dtypes(include='number')

    # Compute
    operation = args.indicators_to_compute
    if operation == 'mean':
        result = numeric_cols_df.mean().to_frame(name='mean').T
    elif operation == 'standard_deviation':
        result = numeric_cols_df.std().to_frame(name='std').T
    elif operation == 'all_supported':
        mean_df = numeric_cols_df.mean().to_frame(name='mean').T
        std_df = numeric_cols_df.std().to_frame(name='std').T
        result = pd.concat([mean_df, std_df])
    else:
        raise ValueError(
            "Operation must be 'mean', " \
            "'standard_deviation', or "
            "'all_supported'." \
            "")

    # Save
    csv_filename = 'basic_statistical_indicators_results.csv'
    file_path = f"s3://{args.output_dataset_minio_bucket}/{args.output_dataset}/{csv_filename}"

    result.to_csv(
      file_path,
      index=False,
      storage_options={
        'key': args.input_dataset_minio_access_key,
        'secret': args.input_dataset_minio_secret_key,
        'client_kwargs': {
            'endpoint_url': f'{args.input_dataset_minio_url}'
        }
      }
    )

except Exception as e:
    print(f"Error: {e}")

2. Docker Image Build and Loader

Now you can proceed with Docker Image creation including the core program

Note

During the running of Service, the Data Analytics System will build this image by passing the valued command lines arguments. The build Docker container shall pass these arguments to core program. To cause this,
Il container Docker istanziato dovrà quindi passare gli argomenti al programma nucleo. So that this happens, the Docker image must have a proper ENTRYPOINT.

For our Service, here below the Dockerfile (note the entrypoint!):

Dockerfile

FROM python:3.13.7-alpine3.22

RUN pip install pandas==2.3.2 \
                minio==7.2.18 \
                fsspec==2025.9.0 \
                s3fs==2025.9.0

COPY . . 

ENTRYPOINT ["python", "main.py", "$@"]

The valued arguments passeto by Data Analytics System to Docker container, will be substitued to $@.

Other languages

In case of different languages, the entrypoint will be as this kind:

ENTRYPOINT ["./main", "$@"]
ENTRYPOINT ["java", "Main", "$@"]
ENTRYPOINT ["julia", "main.jl", "$@"]
etc ...

So, let's proceed with the build and push of Docker image by executing the following commands.

The Docker registry can be of two types:

Use a registry publicly accessible
Use a registry accessible from the Data Analytics System platform, in this case you need to contact the administrator.

docker build -t <dominio-registry-docker>:<porta>/<percorso-immagine>:<tag> .

docker login <dominio-registry-docker>:<porta> -u <username_docker>

docker push <dominio-registry-docker>:<porta>/<percorso-immagine>:<tag>

3. Registration of created Service

After having loaded your image on the registry, it will be possible to register the realted Service on the Data Analytics System, so to make it available to the catalogue Workflow creation.

You have to access to registration form of the Service as follows:

Access to Service managing page from side menu
Click on + Register Service on the righ up

sidebar-with-services-item-highlighted

You will have the Service registration form.

service-registration-form

Here we can set, first of all, basic metadata for the Service:

Name: basic-statistical-indicators
Version: 1.0.0
Framework: Python 3 (da menù a tendina)
URL: docker://<dominio-tuo-registry-docker>:<porta>/<percorso-immagine>:<tag>

After this, you can set the Service Properties. These ones will communicate to the Data Analytics System which I/O Ports and Configuration Parameters of the Service that the Data Analytics System will need to set before the execution.

We have to add the first Service Property to our Service of the input port: --input-dataset

Click on + Add Property to open the input form:

service-registration-add-property-form

So we have to fill the form as follows --input-dataset:

Key: input-dataset
Description: lasciare vuota
Type: Application Property
Mandatory: ticked
Invisible: ticked
Value Type: String
Default Value: lasciare vuoto
Data Type: Input Data
Streaming: unticked

click on Confirm

In the same way, we have to create the remaining Service Property:

For --input-columns:
- Key: input-columns
- Description: lasciare vuota
- Type: Application Property
- Mandatory: ticked
- Invisible: ticked
- Value Type: String
- Default Value: ANY
- Data Type: Input Data
- Streaming: unticked
For --output-dataset:
- Key: output-dataset
- Description: lasciare vuota
- Type: Application Property
- Mandatory: ticked
- Invisible: ticked
- Value Type: String
- Default Value: lasciare vuoto
- Data Type: Output Data
- Streaming: unticked
For --indicators-to-compute:
- Key: indicators-to-compute
- Description: lasciare vuota
- Type: Application Property
- Mandatory: ticked
- Invisible: unticked
- Value Type: String
- Default Value: lasciare vuoto
- Data Type: Input Data
- Streaming: unticked

Finally, we will have the following Service Property:

new-service-tutorial-service-properties-summary

We register the Service including its properties by clicking Save at the bottom right:

Once you will save, you will have the details page of the Service

just-registered-service-details-page

The Service is now available to catalogue and can be used to create a Workflow

next steps

Use the just created Service in a Workflow (as it is shown in Quick Start)
See sections: