Getting Started¶

This guide will walk you through creating your first ML deployment with Modalkit on Modal.

Prerequisites¶

Install Modal CLI
```
pip install modal
modal setup
```
Follow the prompts to authenticate with your Modal account.
Install Modalkit
```
pip install modalkit
```

Project Structure¶

Create a new directory for your project:

my-ml-project/
├── app.py              # Modal app definition
├── model.py            # Your ML model code
├── modalkit.yaml       # Configuration
└── requirements.txt    # Dependencies

Step 1: Define Your Model¶

Create model.py with your inference logic:

from modalkit.inference_pipeline import InferencePipeline
from pydantic import BaseModel
from typing import List

# Define input/output schemas
class TextInput(BaseModel):
    text: str
    language: str = "en"

class TextOutput(BaseModel):
    translated_text: str
    confidence: float

# Implement your model
class TranslationModel(InferencePipeline):
    def __init__(self, model_name: str, all_model_data_folder: str, common_settings: dict, *args, **kwargs):
        super().__init__(model_name, all_model_data_folder, common_settings)
        # Load your model here
        # self.model = load_model(...)

    def preprocess(self, input_list: List[TextInput]) -> dict:
        """Prepare batch of inputs for the model"""
        texts = [item.text for item in input_list]
        languages = [item.language for item in input_list]
        return {"texts": texts, "languages": languages}

    def predict(self, input_list: List[TextInput], preprocessed_data: dict) -> dict:
        """Run batch inference"""
        # Your actual model inference
        translations = [f"Translated: {text}" for text in preprocessed_data["texts"]]
        scores = [0.95] * len(translations)
        return {"translations": translations, "scores": scores}

    def postprocess(self, input_list: List[TextInput], raw_output: dict) -> List[TextOutput]:
        """Format outputs"""
        return [
            TextOutput(translated_text=text, confidence=score)
            for text, score in zip(raw_output["translations"], raw_output["scores"])
        ]

Step 2: Configure Your Deployment¶

Create modalkit.yaml:

app_settings:
  app_prefix: "translation-demo"

  auth_config:
    # For development - use hardcoded key
    api_key: "dev-key-123"
    auth_header: "x-api-key"

  build_config:
    image: "python:3.11-slim"
    tag: "latest"
    workdir: "/app"

  deployment_config:
    # No GPU for this example
    gpu: null
    concurrency_limit: 10
    container_idle_timeout: 300
    # Use Modal proxy auth in production
    secure: false

  batch_config:
    max_batch_size: 8
    wait_ms: 50

model_settings:
  local_model_repository_folder: "./models"
  common:
    device: "cpu"
  model_entries:
    translation_model:
      version: "1.0"

Create app.py:

import modal
from modalkit.modal_service import ModalService, create_web_endpoints
from modalkit.modal_config import ModalConfig
from model import TranslationModel, TextInput, TextOutput

# Initialize Modalkit
modal_config = ModalConfig()
app = modal.App(name=modal_config.app_name)

# Define Modal app class
@app.cls(**modal_config.get_app_cls_settings())
class TranslationApp(ModalService):
    inference_implementation = TranslationModel
    model_name: str = modal.parameter(default="translation_model")
    modal_utils: ModalConfig = modal_config

# Create API endpoints
@app.function(**modal_config.get_handler_settings())
@modal.asgi_app(**modal_config.get_asgi_app_settings())
def web_endpoints():
    return create_web_endpoints(
        app_cls=TranslationApp,
        input_model=TextInput,
        output_model=TextOutput
    )

Step 4: Test Locally¶

Run your app locally with Modal:

modal serve app.py

Test with curl:

# Sync endpoint
curl -X POST http://localhost:8000/predict_sync \
  -H "Content-Type: application/json" \
  -H "x-api-key: dev-key-123" \
  -d '{"text": "Hello world", "language": "en"}'

# Batch endpoint
curl -X POST http://localhost:8000/predict_batch \
  -H "Content-Type: application/json" \
  -H "x-api-key: dev-key-123" \
  -d '[{"text": "Hello"}, {"text": "World"}]'

Deploy your app:

modal deploy app.py

Your app is now live! Modal will provide the URL:

✓ Deployed app to https://your-workspace--translation-demo.modal.run

What's Next?¶

Add GPU Support¶

deployment_config:
  gpu: "T4"  # or A10G, A100, etc.

Use Cloud Storage¶

cloud_bucket_mounts:
  - mount_point: "/mnt/models"
    bucket_name: "my-model-bucket"
    secret: "aws-credentials"

deployment_config:
  secure: true  # Requires Modal-Key and Modal-Secret headers

Add Async Processing¶

See the Async Example for queue-based base_inference.

Troubleshooting¶

Common Issues¶

Import errors: Make sure all dependencies are in your container image
Auth failures: Check your API key configuration
GPU issues: Ensure your Modal workspace has GPU access

Getting Help¶

Check Modal's documentation
View Modalkit examples
Open an issue on GitHub