Creating a Service
A Service is an autonomous dockerized application designed to process data. It can operate in one of two modes:
- Batch: Executes by starting, processing data, and then terminating.
- Streaming: Executes continuously, processing data indefinitely until the enclosing Workflow is manually stopped.
A Service is defined by:
- I/O Ports
- User-configurable execution parameters
I/O Ports represent the input and output interfaces of the Service, enabling interaction with other Data Analytics System Assets (Dataset, other Services, Model) within a Workflow.
The figure below illustrates the simplified architecture of a generic Service:


We will now proceed step by step to create a simple Service from scratch and register it on the platform. This Service will compute basic statistical indicators—such as mean and standard deviation—for an input dataset, and store the results in an output dataset.
The input and output datasets will be read from and written to MinIO, respectively.
This Service will therefore include:
- One input port of type Dataset
- One output port of type Dataset
- One configurable execution parameter
as illustrated below:
Workflow Steps
- Develop the core program implementing the data processing logic.
- Build a Docker image containing this program.
- Push the Docker image to a Docker Registry.
- Register the dockerized program on the Data Analytics System, thereby promoting it to a Service.
1. Core Program Development
Input Port

Prepare the Input Port for our Service by adding the following command-line arguments to the core program:
--input-dataset--input-columns--input-dataset.minio_bucket--input-dataset.minIO_URL--input-dataset.minIO_ACCESS_KEY--input-dataset.minIO_SECRET_KEY
Example Code Snippet:
Argument Data Types
Values passed from the Data Analytics System to the core program will always be strings or numbers. Therefore, proper conversion to the desired data type must be handled in the code.
Python Example
- Use
str2boolfor boolean values - Use
str2jsonfor JSON values
import argparse
import pandas as pd
from minio import Minio
import os
parser = argparse.ArgumentParser(description="Basic Statistical Indicators")
# CLI Arguments for the Input dataset port
parser.add_argument('--input-dataset',
dest='input_dataset',
type=str,
required=True
)
parser.add_argument('--input-columns',
dest='input_columns',
type=str,
required=True
)
parser.add_argument('--input-dataset.minio_bucket',
dest='input_dataset_minio_bucket',
type=str,
required=True
)
parser.add_argument('--input-dataset.minIO_URL',
dest='input_dataset_minio_url',
type=str,
required=True
)
parser.add_argument('--input-dataset.minIO_ACCESS_KEY',
dest='input_dataset_minio_access_key',
type=str,
required=True
)
parser.add_argument('--input-dataset.minIO_SECRET_KEY',
dest='input_dataset_minio_secret_key',
type=str,
required=True
)
# ... see next step ...
Output Port

Next, configure the Output Port for our Service by adding the following command-line arguments to the core program:
--output-dataset--output-dataset.minio_bucket--output-dataset.minIO_URL--output-dataset.minIO_ACCESS_KEY--output-dataset.minIO_SECRET_KEY
Implementation Notes:
Similar to the input port, these arguments will allow the Service to write the processed dataset to the designated MinIO bucket. Ensure proper handling of authentication credentials and secure storage of sensitive information.
Example Code Snippet:
# CLI Arguments for the Output dataset port
parser.add_argument('--output-dataset',
dest='output_dataset',
type=str,
required=True
)
parser.add_argument('--output-dataset.minio_bucket',
dest='output_dataset_minio_bucket',
type=str,
required=True
)
parser.add_argument('--output-dataset.minIO_URL',
dest='output_dataset_minio_url',
type=str,
required=True
)
parser.add_argument('--output-dataset.minIO_ACCESS_KEY',
dest='output_dataset_minio_access_key',
type=str,
required=True
)
parser.add_argument('--output-dataset.minIO_SECRET_KEY',
dest='output_dataset_minio_secret_key',
type=str,
required=True
)
# ... see next step ...
Execution parameters setting from user
We can set the port usign auxialiary running parameter. This will allow us to specify the statistic marker to calculate in UI.

So we can add to the program the following argument on the command line:
--indicators-to-compute
Here below, the sample code for this further arguments:
# ... omitted - see previous steps ...
# CLI Argument for user-configurable execution parameter
parser.add_argument(
'--indicators-to-compute',
dest='indicators_to_compute',
type=str,
required=True,
choices=['mean', 'standard_deviation', 'all_supported']
)
# parse_known_args allows us to ignore the other invocation arguments coming in
# from Data Analytics System
args, unknowns = parser.parse_known_args()
# ... see next step ...
Here we have the core program schema for all needed arguments:

Implementation logic
Once you added all the arguments on the command line, you can complete the program with to calculate statistic indicators.
Her below, the complete code:
Final COre Program Code
import argparse
import pandas as pd
from minio import Minio
import os
parser = argparse.ArgumentParser(description="Basic Statistical Indicators")
# CLI Arguments for the Input dataset port
parser.add_argument('--input-dataset',
dest='input_dataset',
type=str,
required=True
)
parser.add_argument('--input-columns',
dest='input_columns',
type=str,
required=True
)
parser.add_argument('--input-dataset.minio_bucket',
dest='input_dataset_minio_bucket',
type=str,
required=True
)
parser.add_argument('--input-dataset.minIO_URL',
dest='input_dataset_minio_url',
type=str,
required=True
)
parser.add_argument('--input-dataset.minIO_ACCESS_KEY',
dest='input_dataset_minio_access_key',
type=str,
required=True
)
parser.add_argument('--input-dataset.minIO_SECRET_KEY',
dest='input_dataset_minio_secret_key',
type=str,
required=True
)
# CLI Arguments for the Output dataset port
parser.add_argument('--output-dataset',
dest='output_dataset',
type=str,
required=True
)
parser.add_argument('--output-dataset.minio_bucket',
dest='output_dataset_minio_bucket',
type=str,
required=True
)
parser.add_argument('--output-dataset.minIO_URL',
dest='output_dataset_minio_url',
type=str,
required=True
)
parser.add_argument('--output-dataset.minIO_ACCESS_KEY',
dest='output_dataset_minio_access_key',
type=str,
required=True
)
parser.add_argument('--output-dataset.minIO_SECRET_KEY',
dest='output_dataset_minio_secret_key',
type=str,
required=True
)
# CLI Argument for user-configurable execution parameter
parser.add_argument(
'--indicators-to-compute',
dest='indicators_to_compute',
type=str,
required=True,
choices=['mean', 'standard_deviation', 'all_supported']
)
# parse_known_args allows us to ignore the other invocation arguments coming in
# from Data Analytics System
args, unknowns = parser.parse_known_args()
minio_client = Minio(
args.input_dataset_minio_url.replace("http://", "").replace("https://", ""),
access_key=args.input_dataset_minio_access_key,
secret_key=args.input_dataset_minio_secret_key,
secure=False
)
objects = list(
minio_client.list_objects(
args.input_dataset_minio_bucket,
prefix=args.input_dataset,
recursive=True
)
)
if not objects:
raise FileNotFoundError("No files found in the given MinIO folder path.")
# Assume only one CSV file
csv_object = objects[0]
csv_filename = os.path.basename(csv_object.object_name)
"""
Since the program is designed to read from MinIO, we need to handle
connection to such an object storage
"""
connection_details = {
'key': args.input_dataset_minio_access_key,
'secret': args.input_dataset_minio_secret_key,
'client_kwargs': {
'endpoint_url': f'{args.input_dataset_minio_url}'
}
}
file_path = f"s3://{args.input_dataset_minio_bucket}/{args.input_dataset}/{csv_filename}"
try:
# Read
df = pd.read_csv(file_path, storage_options=connection_details)
numeric_cols_df = df.select_dtypes(include='number')
# Compute
operation = args.indicators_to_compute
if operation == 'mean':
result = numeric_cols_df.mean().to_frame(name='mean').T
elif operation == 'standard_deviation':
result = numeric_cols_df.std().to_frame(name='std').T
elif operation == 'all_supported':
mean_df = numeric_cols_df.mean().to_frame(name='mean').T
std_df = numeric_cols_df.std().to_frame(name='std').T
result = pd.concat([mean_df, std_df])
else:
raise ValueError(
"Operation must be 'mean', " \
"'standard_deviation', or "
"'all_supported'." \
"")
# Save
csv_filename = 'basic_statistical_indicators_results.csv'
file_path = f"s3://{args.output_dataset_minio_bucket}/{args.output_dataset}/{csv_filename}"
result.to_csv(
file_path,
index=False,
storage_options={
'key': args.input_dataset_minio_access_key,
'secret': args.input_dataset_minio_secret_key,
'client_kwargs': {
'endpoint_url': f'{args.input_dataset_minio_url}'
}
}
)
except Exception as e:
print(f"Error: {e}")
2. Docker Image Build and Loader
Now you can proceed with Docker Image creation including the core program
Note
During the running of Service, the Data Analytics System will build this image by passing the valued command lines arguments.
The build Docker container shall pass these arguments to core program. To cause this,
Il container Docker istanziato dovrà quindi passare gli argomenti al programma nucleo. So that this happens, the Docker image must have a proper ENTRYPOINT.
For our Service, here below the Dockerfile (note the entrypoint!):
FROM python:3.13.7-alpine3.22
RUN pip install pandas==2.3.2 \
minio==7.2.18 \
fsspec==2025.9.0 \
s3fs==2025.9.0
COPY . .
ENTRYPOINT ["python", "main.py", "$@"]
The valued arguments passeto by Data Analytics System to Docker container, will be substitued to $@.
Other languages
In case of different languages, the entrypoint will be as this kind:
ENTRYPOINT ["./main", "$@"]ENTRYPOINT ["java", "Main", "$@"]ENTRYPOINT ["julia", "main.jl", "$@"]- etc ...
So, let's proceed with the build and push of Docker image by executing the following commands.
The Docker registry can be of two types:
- Use a registry publicly accessible
- Use a registry accessible from the Data Analytics System platform, in this case you need to contact the administrator.
docker build -t <dominio-registry-docker>:<porta>/<percorso-immagine>:<tag> .
docker login <dominio-registry-docker>:<porta> -u <username_docker>
docker push <dominio-registry-docker>:<porta>/<percorso-immagine>:<tag>
3. Registration of created Service
After having loaded your image on the registry, it will be possible to register the realted Service on the Data Analytics System, so to make it available to the catalogue Workflow creation.
You have to access to registration form of the Service as follows:
- Access to Service managing page from side menu
- Click on + Register Service on the righ up

You will have the Service registration form.

Here we can set, first of all, basic metadata for the Service:
- Name:
basic-statistical-indicators - Version:
1.0.0 - Framework:
Python 3(da menù a tendina) - URL:
docker://<dominio-tuo-registry-docker>:<porta>/<percorso-immagine>:<tag>
After this, you can set the Service Properties. These ones will communicate to the Data Analytics System which I/O Ports and Configuration Parameters of the Service that the Data Analytics System will need to set before the execution.
We have to add the first Service Property to our Service of the input port: --input-dataset
Click on + Add Property to open the input form:


So we have to fill the form as follows --input-dataset:
- Key:
input-dataset - Description: lasciare vuota
- Type:
Application Property - Mandatory:
ticked - Invisible:
ticked - Value Type:
String - Default Value: lasciare vuoto
- Data Type:
Input Data - Streaming:
unticked
click on Confirm
In the same way, we have to create the remaining Service Property:
-
For
--input-columns:- Key:
input-columns - Description: lasciare vuota
- Type:
Application Property - Mandatory:
ticked - Invisible:
ticked - Value Type:
String - Default Value:
ANY - Data Type:
Input Data - Streaming:
unticked
- Key:
-
For
--output-dataset:- Key:
output-dataset - Description: lasciare vuota
- Type:
Application Property - Mandatory:
ticked - Invisible:
ticked - Value Type:
String - Default Value: lasciare vuoto
- Data Type:
Output Data - Streaming:
unticked
- Key:
-
For
--indicators-to-compute:- Key:
indicators-to-compute - Description: lasciare vuota
- Type:
Application Property - Mandatory:
ticked - Invisible:
unticked - Value Type:
String - Default Value: lasciare vuoto
- Data Type:
Input Data - Streaming:
unticked
- Key:
Finally, we will have the following Service Property:

We register the Service including its properties by clicking Save at the bottom right:

Once you will save, you will have the details page of the Service

The Service is now available to catalogue and can be used to create a Workflow
next steps
- Use the just created Service in a Workflow (as it is shown in Quick Start)
- See sections: