Tracelet¶

Tracelet Logo

Intelligent experiment tracking for PyTorch and PyTorch Lightning
Automagic hyperparameter detection and multi-backend logging

What is Tracelet?¶

Tracelet is a powerful Python library that automatically captures and logs your machine learning experiments without requiring code modifications. Simply add one line to start tracking, and Tracelet will:

Automagic instrumentation - Zero-config hyperparameter detection and logging
Automatically capture TensorBoard metrics, PyTorch Lightning logs, and system metrics
Route to multiple backends simultaneously (MLflow, ClearML, W&B, AIM)
Track everything - scalars, histograms, images, audio, text, and artifacts
Zero code changes required for existing TensorBoard workflows
Plugin architecture for extensible functionality

Quick Start¶

Installation¶

pipuvconda

pip install tracelet

uv add tracelet

conda install -c conda-forge tracelet

Demo¶

See Tracelet in action! The video above shows how easy it is to get started with automatic experiment tracking.

Basic Usage¶

import tracelet
import torch
from torch.utils.tensorboard import SummaryWriter

# 1. Start tracking (one line!)
tracelet.start_logging(
    exp_name="my_experiment",
    project="my_project",
    backend="mlflow"  # or "clearml", "wandb", "aim"
)

# 2. Use TensorBoard as normal - metrics automatically captured
writer = SummaryWriter()
for epoch in range(100):
    loss = train_one_epoch()  # Your existing training code
    writer.add_scalar('Loss/train', loss, epoch)
    # ✨ Metrics automatically sent to MLflow!

# 3. Stop tracking
tracelet.stop_logging()

That's it!

Your existing TensorBoard code now logs to MLflow, ClearML, W&B, or AIM with zero changes!

Key Features¶

Multi-Backend Support¶

Choose from 4 popular experiment tracking backends:

MLflow - Open source ML lifecycle management
ClearML - Enterprise-grade MLOps platform
Weights & Biases - Collaborative ML platform
AIM - Open source experiment tracking

Automatic Instrumentation¶

Tracelet automatically captures:

TensorBoard metrics - Scalars, histograms, images, audio, text
PyTorch Lightning - Training/validation metrics, hyperparameters
System metrics - CPU, memory, GPU usage
Git information - Repository state, commit hash, branch
Environment - Python version, package versions, hardware info

Unified Artifact System¶

Log and manage ML artifacts with intelligent routing:

Universal API - Models, checkpoints, images, audio, datasets, reports
Intelligent routing - Automatically routes to optimal backends
Framework integration - Auto-capture Lightning checkpoints
Large file support - External references for files >100MB
Rich metadata - Comprehensive artifact descriptions

Rich Data Types¶

Log and visualize various data types:

Scalars - Loss curves, accuracy, learning rates
Histograms - Weight distributions, gradients
Images - Sample predictions, confusion matrices
Audio - Speech samples, music generation
Text - Training summaries, generated text
Artifacts - Models, datasets, configuration files

Performance Optimized¶

Thread-safe orchestrator for concurrent logging
Batched operations to minimize overhead
Smart buffering for high-throughput scenarios
Configurable routing for different metric types

Architecture Overview¶

graph TB
    A[Your PyTorch Code] --> B[TensorBoard SummaryWriter]
    A --> C[PyTorch Lightning Trainer]
    A --> D[Direct Tracelet API]

    B --> E[Tracelet Orchestrator]
    C --> E
    D --> E

    E --> F[Plugin System]

    F --> G[MLflow Backend]
    F --> H[ClearML Backend]
    F --> I[W&B Backend]
    F --> J[AIM Backend]

    G --> K[MLflow Server]
    H --> L[ClearML Platform]
    I --> M[W&B Platform]
    J --> N[AIM Repository]

Why Tracelet?¶

Before Tracelet¶

# Different APIs for each backend
import mlflow
import wandb
from clearml import Task

# Separate logging calls
mlflow.log_metric("loss", loss)
wandb.log({"loss": loss})
Task.current_task().logger.report_scalar("loss", loss)

# Manual setup for each backend
mlflow.start_run()
wandb.init(project="my-project")
task = Task.init(project_name="my-project")

With Tracelet¶

# One API, any backend
import tracelet

tracelet.start_logging(backend="mlflow")  # or any backend
writer.add_scalar("loss", loss)  # Works everywhere!

What's Next?¶

:material-rocket-launch: Quick Start Guide

Get up and running in under 5 minutes with your first experiment

:material-cog: Installation Guide

Detailed installation instructions for all backends and environments

:material-api: API Reference

Complete API documentation with examples and type hints

:material-package: Artifact System

Unified artifact management for models, data, and media files

:material-book-open: Examples

Real-world examples and Jupyter notebooks to learn from

Community & Support¶

📚 Documentation - Comprehensive guides and API docs
🐛 Issues - Bug reports and feature requests
💬 Discussions - Questions and community support
📧 Email - Direct contact with maintainers

License¶

Tracelet is released under the MIT License.

Ready to supercharge your ML experiments?
Get Started View on GitHub