Skip to content

AIM Backend

AIM is a lightweight, open-source experiment tracking system optimized for speed and simplicity.

Overview

AIM provides fast experiment tracking with a focus on performance and ease of use. It's perfect for:

  • Local development and experimentation
  • High-frequency metric logging
  • Simple deployment scenarios
  • Teams that prefer open-source solutions

Installation

bash pip install tracelet aim

bash uv add tracelet aim

Quick Start

Local Repository

import tracelet
from torch.utils.tensorboard import SummaryWriter

# Start with local AIM repository
tracelet.start_logging(
    exp_name="aim_experiment",
    project="my_project",
    backend="aim"
)

# Use TensorBoard as normal - metrics go to AIM
writer = SummaryWriter()
writer.add_scalar("loss", 0.5, 1)
writer.add_scalar("accuracy", 0.9, 1)

tracelet.stop_logging()

View Results

Start the AIM UI:

aim up

Visit http://localhost:43800 to view your experiments.

!!! note "AIM Ports" - UI server (aim up): Default port 43800 - API server: Default port 53800

Configuration

Current Limitation

Backend-specific configuration is currently not supported. The AIM backend uses default settings: - Repository path: Current directory (.) - Experiment name: "Tracelet Experiments" - Remote server: Not supported

Advanced configuration will be added in a future release.

Current Usage

# This works with default settings
tracelet.start_logging(
    backend="aim",
    exp_name="my_experiment",     # Sets experiment run name
    project="my_project"          # Sets project context
)

Planned Configuration (Future Release)

# This will be supported in future versions
tracelet.start_logging(
    backend="aim",
    config={
        "repo_path": "./aim_repo",           # Custom repository path
        "experiment_name": "My Experiments", # Custom experiment name
        "run_name": "baseline_run",         # Custom run name
        "tags": {                           # Run tags
            "model": "resnet",
            "dataset": "cifar10"
        },
        "remote_uri": "http://aim-server:53800"  # Remote server
    }
)

Features

Metrics Logging

AIM automatically captures all TensorBoard metrics:

writer = SummaryWriter()

# Scalars
writer.add_scalar("train/loss", loss, epoch)
writer.add_scalar("val/accuracy", acc, epoch)

# Histograms (converted to AIM distributions)
writer.add_histogram("model/weights", model.parameters(), epoch)

# Images
writer.add_image("predictions", image_tensor, epoch)

# Text
writer.add_text("notes", "Training progressing well", epoch)

Parameter Logging

exp = tracelet.get_active_experiment()
exp.log_params({
    "learning_rate": 0.001,
    "batch_size": 32,
    "optimizer": "adam",
    "model_architecture": "resnet50"
})

Artifact Storage

# Log model artifacts
exp.log_artifact("model.pth", artifact_path="models/")

# Log dataset info
exp.log_artifact("data_stats.json", artifact_path="data/")

AIM Artifact Limitations

AIM doesn't have full artifact storage like MLflow. Files are referenced by path rather than uploaded to a central store.

Advanced Features

Context-based Metrics

AIM supports rich metric contexts for better organization:

# Metrics are automatically organized by source and name
writer.add_scalar("train/loss", loss, step)      # Context: train
writer.add_scalar("val/loss", val_loss, step)    # Context: val

High-Frequency Logging

AIM is optimized for high-frequency metric logging:

# Log every batch without performance concerns
for batch_idx, (data, target) in enumerate(dataloader):
    # ... training code ...
    writer.add_scalar("batch/loss", batch_loss, batch_idx)
    writer.add_scalar("batch/lr", current_lr, batch_idx)

Multi-Run Comparison

AIM's UI excels at comparing multiple runs:

for lr in [0.001, 0.01, 0.1]:
    tracelet.start_logging(
        backend="aim",
        exp_name=f"lr_sweep_{lr}",
        config={"tags": {"learning_rate": lr}}
    )
    # ... training with this LR ...
    tracelet.stop_logging()

Deployment Options

Local Development

# Initialize repository
aim init

# Start tracking server
aim up --host 0.0.0.0 --port 43800

Docker Deployment

# docker-compose.yml
version: "3.8"
services:
  aim:
    image: aimhubio/aim:latest
    ports:
      - "43800:43800" # UI port
      - "53800:53800" # API port
    volumes:
      - ./aim_data:/opt/aim
    command: aim up --host 0.0.0.0 --port 43800

Production Server

# Install AIM server
pip install aim

# Run UI server
aim up --host 0.0.0.0 --port 43800

# Or run API server for remote connections
aim server --host 0.0.0.0 --port 53800

Best Practices

Repository Organization

# Current approach - organize by experiment name
tracelet.start_logging(
    backend="aim",
    exp_name="hyperparameter_tuning_resnet",
    project="computer_vision"
)

# Future: Custom repo paths (not currently supported)
# tracelet.start_logging(
#     backend="aim",
#     config={"repo_path": "./experiments/my_project"}
# )

Parameter Logging Strategy

# Use structured parameter names for organization
exp = tracelet.get_active_experiment()
exp.log_params({
    "model.architecture": "resnet50",
    "model.layers": 50,
    "data.dataset": "cifar10",
    "training.stage": "development",
    "training.version": "v1.2",
    "optimizer.name": "adam",
    "optimizer.lr": 0.001
})

Metric Naming

# Use hierarchical naming
writer.add_scalar("train/loss/total", loss, step)
writer.add_scalar("train/loss/classification", cls_loss, step)
writer.add_scalar("train/metrics/accuracy", acc, step)
writer.add_scalar("val/metrics/f1_score", f1, step)

Troubleshooting

Common Issues

Repository not found:

# Initialize AIM repository
aim init

Port already in use:

# Use different port
aim up --port 43801

Remote connection failed: Currently not supported. Remote AIM server connections will be available in a future release when backend configuration is implemented.

Performance Tuning

# Current: AIM backend uses optimal defaults for performance
# Repository is created in current directory on fast local storage

# Future: Custom configuration will support
# config = {
#     "repo_path": "/fast/ssd/aim_repo",  # Use SSD storage
#     "buffer_size": 1000,               # Batch metrics
# }

Comparison with Other Backends

Feature AIM MLflow ClearML W&B
Setup complexity ⭐⭐⭐ ⭐⭐ ⭐⭐⭐
Logging performance ⭐⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐
Visualization quality ⭐⭐⭐ ⭐⭐ ⭐⭐⭐ ⭐⭐⭐
Query capabilities ⭐⭐⭐ ⭐⭐ ⭐⭐⭐ ⭐⭐⭐
Resource usage ⭐⭐⭐ ⭐⭐⭐ ⭐⭐ ⭐⭐

Migration

From TensorBoard

# Before: Pure TensorBoard
# from torch.utils.tensorboard import SummaryWriter
# writer = SummaryWriter("./runs")

# After: TensorBoard + AIM via Tracelet
import tracelet
tracelet.start_logging(backend="aim")
writer = SummaryWriter()  # Same code!

To Other Backends

# Easy switch to different backend
# tracelet.start_logging(backend="aim")      # Old
tracelet.start_logging(backend="wandb")     # New
# All TensorBoard code remains unchanged

Complete Example

import tracelet
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.tensorboard import SummaryWriter
from torch.utils.data import DataLoader, TensorDataset

# Start AIM tracking
tracelet.start_logging(
    exp_name="aim_pytorch_example",
    project="tutorials",
    backend="aim"
)

# Model setup
model = nn.Sequential(
    nn.Linear(784, 128),
    nn.ReLU(),
    nn.Linear(128, 10)
)
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

# Data
X = torch.randn(1000, 784)
y = torch.randint(0, 10, (1000,))
dataset = TensorDataset(X, y)
dataloader = DataLoader(dataset, batch_size=32)

# Training with automatic AIM logging
writer = SummaryWriter()

for epoch in range(10):
    total_loss = 0
    for batch_idx, (data, target) in enumerate(dataloader):
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()

        # Metrics automatically sent to AIM
        writer.add_scalar("batch/loss", loss.item(),
                         epoch * len(dataloader) + batch_idx)
        total_loss += loss.item()

    # Epoch metrics
    avg_loss = total_loss / len(dataloader)
    writer.add_scalar("epoch/loss", avg_loss, epoch)

    # Model weights histogram
    for name, param in model.named_parameters():
        writer.add_histogram(f"weights/{name}", param, epoch)

# Log final parameters
exp = tracelet.get_active_experiment()
exp.log_params({
    "model.type": "simple_mlp",
    "model.input_size": 784,
    "model.hidden_size": 128,
    "model.output_size": 10,
    "optimizer.type": "adam",
    "optimizer.learning_rate": 0.001,
    "training.batch_size": 32,
    "training.epochs": 10
})

# Cleanup
writer.close()
tracelet.stop_logging()

print("✅ Training completed! View results with: aim up")

Next Steps