Csmart-Digit Training

User Guide for Csmart Training Software

Introduction

Csmart Digit AI Training is a command-line software for training AI models for coffee classification tasks. It supports multiple architectures and provides integration with Mlflow for tracking experiments. This guide will walk you through installation, training, testing, explainability, exporting, and inference.

1. Installation

1.1. Install Miniconda

Miniconda is a lightweight version of Anaconda that helps manage Python environments efficiently. Download and install it from:

Miniconda Installation Guide

1.2. Create a Virtual Environment

Before installing dependencies, create an isolated virtual environment using Conda:

conda create -n csmart-training python=3.10

Activate the virtual environment:

conda activate csmart-training

1.3. Install Requirements

Once inside the environment, install the necessary dependencies:

pip install -r requirements.txt

1.4. Install the Project

Install the project in editable mode to allow local modifications:

pip install -e

1.5. Install PyTorch with GPU Support

Install PyTorch with CUDA support by following the instructions at:

PyTorch Installation Guide

2. Running the Mlflow Server (Optional)

Mlflow is used to track experiments and save model artifacts. If you do not want to use Mlflow, set uri: ./mlruns in config.yaml.

To start the Mlflow server, run:

mlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./mlflow-artifacts --host 127.0.0.1 --port 5000

Access the Mlflow UI by opening the following URL in your browser:

http://127.0.0.1:5000/

Why Mlflow?

The Mlflow server is useful for:

Logging model metrics, parameters, and hyperparameters.
Storing checkpoints for model recovery.
Providing a UI to analyze training behavior.

If you do not want to use Mlflow, you can disable it by setting uri: ./mlruns in config.yaml.

3. Training a Model

3.1. Training with a Real Dataset

To train with an actual dataset, run:

python src/train.py dataset_name=coffee_multiclass

3.2. Changing the Base Model

To specify a different model architecture, pass base_model_name as a command-line argument:

python src/train.py dataset_name=coffee_multiclass base_model_name=segformer-b5

Alternatively, you can modify the config.yaml file directly.

3.3. Available Models

Supported architectures include:

resnet101
resnet18
resnet34
resnet50
resnext50_32x4d
wide_resnet101_2
convnext_base
convnext_large
vit_base_patch16_224
swin_base_patch4_window7_224
resnext101_64x4d
fused_network
efficientnetb0
segformer-b0
segformer-b1
segformer-b2
segformer-b3
segformer-b4
segformer-b5

Models with higher accuracy were achieved using convnext_large and segformer-b5

3.4. Where Are the Trained Files Stored?

At the end of training, all relevant files are stored in Mlflow artifacts, under training_data. The training directory follows the structure:

/trained_models/version_XX/

Stored artifacts include:

Metrics
Checkpoints
TensorBoard logs
Training visualizations

4. Testing a Model

After training, run the test script to compute additional evaluation metrics:

python src/test.py checkpoint={path_to_ckpt_file}

4.1. Where Are Test Results Stored?

Test results are saved in Mlflow artifacts under test_data. Each test run generates a new folder:

test_data/test_data_YYYYMMDD_HHMMSS/

Multiple test runs can be performed, each saved separately.

5. Explainable AI (XAI) Analysis

To understand how the model makes decisions, use xai_analysis.py. This applies Explainable AI (XAI) techniques like GradCAM and Gradient SHAP.

Run with:

python src/xai_analysis.py checkpoint={path_to_ckpt_file}

5.1. Where Are XAI Results Stored?

XAI visualizations are stored in Mlflow artifacts under xai_data, organized by timestamps:

xai_data/xai_plots_YYYYMMDD_HHMMSS/gradcam/

6. Exporting the Model

Once the model achieves the desired performance, export it to ONNX format for inference:

python src/export.py checkpoint={path_to_ckpt_file}

7. Running Predictions with an Exported Model

After exporting, run inference with the ONNX model:

python src/predict.py onnx_weights={path_to_onnx_file}

Summary of Commands

Task

Command

Create virtual environment

conda create -n csmart-training python=3.10

Activate virtual environment

conda activate csmart-training

Install dependencies

pip install -r requirements.txt

Install project

pip install -e .

Install PyTorch with GPU support (Optional)

Follow PyTorch Installation Guide

Start Mlflow server

mlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./mlflow-artifacts --host 127.0.0.1 --port 5000

Train a model with real dataset

python src/train.py dataset_name=coffee_multiclass

Train a model with a specific architecture

python src/train.py dataset_name=coffee_multiclass base_model_name=segformer-b5

Modify base model inside config file

Edit config.yaml and change base_model_name

Test the trained model

python src/test.py checkpoint={path_to_ckpt_file}

Run XAI analysis

python src/xai_analysis.py checkpoint={path_to_ckpt_file}

Export model to ONNX

python src/export.py checkpoint={path_to_ckpt_file}

Run inference using ONNX model

python src/predict.py onnx_weights={path_to_onnx_file}

🚀 Now you’re ready to train and deploy your AI models efficiently!

PreviousAI Model Metrics NextCsmart-Digit Desktop

Last updated 5 months ago