Getting Started

Data Source

We use a public dataset specifically designed for testing and learning purposes, which includes five CSV files:

https://www.kaggle.com/datasets/innocentmfa/crm-sales-opportunities
├── accounts.csv
├── data_dictionary.csv
├── products.csv
├── sales_pipeline.csv
└── sales_teams.csv

Data Pipeline

AISRM provides make commands to streamline all repetitive tasks. Our entire Extract Transform Load (ETL) process can be executed with this simple terminal command:

make data

Let's break down the step-by-step process:

Extract the Dataset Files

We use a .env file where RAW_DATA_ARCHIVE and RAW_DATA_URL are specified.

# Download and extract raw data from remote source
mkdir -p data/raw
curl -L -o ./data/raw/$(RAW_DATA_ARCHIVE) $(RAW_DATA_URL)
unzip -u data/raw/$(RAW_DATA_ARCHIVE) -d data/raw
rm -rf data/raw/$(RAW_DATA_ARCHIVE)

Merge Data into a Single CSV File

Our custom Python module merges all files by relevant columns (e.g., sales_agent, product, etc.), reorders, removes or renames columns for better usage, and ensures correct data types (e.g., numeric).

# Compile the raw dataset for model training
python -m src.data

The dataset is now ready for model training.

Your data folder structure should look like this:

data
├── processed
│   └── dataset.csv
└── raw
    ├── accounts.csv
    ├── data_dictionary.csv
    ├── products.csv
    ├── sales_pipeline.csv
    └── sales_teams.csv

3 directories, 6 files

Model Training

We provide a convenient make command to train and save a new model:

# Train and save model for development
make model

This is equivalent to running:

python -m src.model dev

This command saves a new model, its preprocessor, and metadata as Pickle binary files under models/.

For production, we use model versioning with this command:

# Train and save models for deployment
make models_prod

Behind the scenes, this runs:

python -m src.model v1
python -m src.model v2

Your models folder structure should look like this:

models
├── dev-1753300409
│   ├── metadata.pkl
│   ├── model.pkl
│   └── preprocessor.pkl
├── dev-1753300434
│   ├── metadata.pkl
│   ├── model.pkl
│   └── preprocessor.pkl
├── dev-1753807777
│   ├── metadata.pkl
│   ├── model.pkl
│   └── preprocessor.pkl
├── v1
│   ├── metadata.pkl
│   ├── model.pkl
│   └── preprocessor.pkl
└── v2
    ├── metadata.pkl
    ├── model.pkl
    └── preprocessor.pkl

6 directories, 15 files

Development

# Run both backend and frontend services at once.
# Shorthand for: "docker-compose up --build".
make up
# Kill services
# Shorthand for: "docker-compose down".
make down

Running and testing the backend only

The backend application handles prediction making and is built with FastAPI. It can run locally and be shipped as a Docker container for production.

# Run locally for development
# Available at http://localhost:8500
make api_dev

# Build the Docker application
make api_docker_build

# Start and test the application
make api_docker_start
make test_api

# Stop the service
make api_docker_stop

Running and testing the frontend only

The frontend application serves as our decision-making tool and is built with Streamlit. It can run locally and be shipped as a Docker container for production.

# Run locally for development
# Available at http://localhost:8501
make app_dev

# Build the Docker application
make app_docker_build

# Start and test the application
make app_docker_start
make test_app

# Stop the service
make app_docker_stop

Deployment

Backend and frontend applications are decoupled to ensure better performance, scalability, and the ability to deploy individual applications when needed.

Deployments are triggered automatically via GitHub workflow when creating a new release tag:

# Check latest tags
git fetch --tags
git tag

# Tag must be named release-{app|api}-{0.0.x}
# See .github/workflows/deploy.yml -> on.push.tag
git tag -a release-app-0.0.1 -m "Describe this version"
git push --tags

# Monitor the GitHub Actions workflow
# See https://github.com/MatthieuScarset/aisrm/actions/workflows/deploy.yml

Once deployed, you can retrieve the URLs from Google Cloud using these make commands:

# Get the deployed API URL
make cloud_get_api_url

# Get the deployed APP URL
make cloud_get_app_url