API Reference¶
Core Components¶
Inference¶
Base class for an inference pipeline. Subclasses should implement preprocess, predict, and postprocess methods. This exists for a single model version. Downstream app layers (like modal) will decide how to manage multiple.
Source code in modalkit/inference_pipeline.py
__init__(model_name, all_model_data_folder, common_settings, *args, **kwargs)
¶
Initializes the InferencePipeline class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
Name of the model |
required |
all_model_data_folder
|
str
|
Path to the folder containing all model data |
required |
common_settings
|
dict
|
Common settings shared across models |
required |
*args
|
tuple[Any, ...]
|
Variable length argument list. |
()
|
**kwargs
|
dict[str, Any]
|
Arbitrary keyword arguments. |
{}
|
Source code in modalkit/inference_pipeline.py
on_volume_reload()
¶
Hook method called after a volume reload occurs.
This method is called by the Modal app layer after volumes have been reloaded. Subclasses can override this method to perform any necessary actions after a volume reload, such as reloading models or updating cached data.
By default, this method does nothing.
Source code in modalkit/inference_pipeline.py
postprocess(input_list, raw_output)
abstractmethod
¶
Processes the raw output from the model into usable results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_list
|
list[BaseModel]
|
The list of original input data. |
required |
raw_output
|
dict
|
The raw output from the model. |
required |
Returns:
Type | Description |
---|---|
list[InferenceOutputModel]
|
list[InferenceOutputModel]: The list of final processed results. |
Source code in modalkit/inference_pipeline.py
predict(input_list, preprocessed_data)
abstractmethod
¶
Performs the prediction using the model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_list
|
list[BaseModel]
|
The list of original input data. |
required |
preprocessed_data
|
dict
|
The preprocessed data. |
required |
Returns:
Name | Type | Description |
---|---|---|
Any |
dict
|
The raw output from the model. |
Source code in modalkit/inference_pipeline.py
preprocess(input_list)
abstractmethod
¶
Prepares the input data for the model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_list
|
list[BaseModel]
|
The list of input data to be preprocessed. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
dict
|
The preprocessed data. |
Source code in modalkit/inference_pipeline.py
run_inference(input_list)
¶
Runs the full inference pipeline: preprocess -> predict -> postprocess.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_list
|
list[BaseModel]
|
A list of input messages to the inference pipeline. |
required |
Returns:
Type | Description |
---|---|
list[InferenceOutputModel]
|
list[InferenceOutputModel]: The list of final processed results after base_inference. |
Source code in modalkit/inference_pipeline.py
options: show_root_heading: true show_source: false
Settings¶
Bases: YamlBaseSettings
Main configuration settings for Modalkit applications.
This class manages all configuration settings for both the application and model deployment. It supports loading settings from YAML files and environment variables with proper type validation.
Attributes:
Name | Type | Description |
---|---|---|
app_settings |
AppSettings
|
Application-level configuration settings |
model_settings |
ModelSettings
|
Model-specific configuration settings |
Configuration is loaded from
- Environment variables with MODALKIT_ prefix
- modalkit.yaml file. This location can be overridden by the MODALKIT_CONFIG environment variable.
- .env file
Source code in modalkit/settings.py
options: show_root_heading: true show_source: false
Bases: BaseModel
Represents the application settings for a model.
Attributes:
Name | Type | Description |
---|---|---|
app_prefix |
str
|
Application prefix for the model |
build_config |
BuildConfig
|
Build configuration for the model |
deployment_config |
DeploymentConfig
|
Deployment configuration for the model |
batch_config |
BatchConfig
|
Batch endpoint configuration |
queue_config |
QueueConfig
|
Queue configuration for async messaging |
Source code in modalkit/settings.py
options: show_root_heading: true show_source: false
Bases: BaseModel
Represents the model settings for a model.
Attributes:
Name | Type | Description |
---|---|---|
local_model_repository_folder |
Path
|
Local model repository folder for the model |
model_entries |
dict[str, Any]
|
Model entries for the model |
common |
dict[str, Any]
|
Common settings for the model |
Source code in modalkit/settings.py
options: show_root_heading: true show_source: false
Modal Integration¶
Modal Application¶
Base class for Modal-based ML application deployment.
This class provides the foundation for deploying ML models using Modal, handling model loading, inference, and API endpoint creation. It integrates with the InferencePipeline class to standardize model serving.
The queue backend is fully optional and supports dependency injection for maximum flexibility:
Usage Examples:
-
No Queues (Default):
-
Configuration-Based Queues:
-
Dependency Injection with TaskIQ:
from taskiq_redis import AsyncRedisTaskiqBroker class TaskIQBackend: def __init__(self, broker_url="redis://localhost:6379"): self.broker = AsyncRedisTaskiqBroker(broker_url) async def send_message(self, queue_name: str, message: str) -> bool: @self.broker.task(task_name=f"process_{queue_name}") async def process_message(msg: str) -> None: # Your custom task processing logic logger.info(f"Processing: {msg}") await process_message.kiq(message) return True # Inject your TaskIQ backend taskiq_backend = TaskIQBackend("redis://localhost:6379") service = MyService(queue_backend=taskiq_backend)
-
Custom Queue Implementation:
Attributes:
Name | Type | Description |
---|---|---|
model_name |
str
|
Name of the model to be served |
inference_implementation |
type[InferencePipeline]
|
Implementation class of the inference pipeline |
modal_utils |
ModalConfig
|
Modal config object, containing the settings and config functions |
queue_backend |
Optional[QueueBackend]
|
Optional queue backend for dependency injection |
Source code in modalkit/modal_service.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 |
|
__init__(queue_backend=None)
¶
Initialize ModalService with optional queue backend.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
queue_backend
|
Optional[QueueBackend]
|
Optional queue backend for dependency injection. If None, will use configuration-based approach or skip queues. |
None
|
Source code in modalkit/modal_service.py
async_call()
staticmethod
¶
Creates an asynchronous callable function for processing and returning inference results via queues.
This method generates a function that spawns an asynchronous task for the process_request
method.
It allows triggering an async inference job while returning a job ID for tracking purposes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cls
|
type[ModalService]
|
The class reference for creating an instance of |
required |
Returns:
Name | Type | Description |
---|---|---|
Callable |
Callable[[str, BaseModel], Awaitable[AsyncOutputModel]]
|
A function that, when called, spawns an asynchronous task and returns an AsyncOutputModel with job ID. |
Example
async_fn = ModalService.async_call(MyApp) result = async_fn(model_name="example_model", input_data) print(result) AsyncOutputModel(job_id="some_job_id")
Source code in modalkit/modal_service.py
load_artefacts()
¶
Loads model artifacts and initializes the inference instance.
This method is called when the Modal container starts up. It: 1. Retrieves model-specific settings from configuration 2. Initializes the inference implementation with the model settings 3. Sets up the model for inference 4. Initializes volume reloading if configured
The method is decorated with @modal.enter() to ensure it runs during container startup.
Source code in modalkit/modal_service.py
process_request(input_list)
¶
Processes a batch of inference requests.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_list
|
list[Union[SyncInputModel, AsyncInputModel]]
|
The list of input models containing either sync or async requests |
required |
Returns:
Type | Description |
---|---|
list[InferenceOutputModel]
|
list[InferenceOutputModel]: The list of processed outputs conforming to the model's output schema |
Source code in modalkit/modal_service.py
send_async_response(message_idx, raw_output_data, input_data)
¶
Sends the inference result to the success or failure queues depending on the message status. Queue functionality is optional - only attempts to send if queue names are provided.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
message_idx
|
int
|
Index of the message in the batch (for logging) |
required |
raw_output_data
|
InferenceOutputModel
|
The processed output result |
required |
input_data
|
AsyncInputModel
|
Object containing the async input data |
required |
Source code in modalkit/modal_service.py
sync_call()
staticmethod
¶
Creates a synchronous callable function for processing inference requests. Each request is processed individually to maintain immediate response times. For batch processing, use async endpoints.
This method generates a function that triggers the process
method of the ModalService
class.
It allows synchronous inference processing with input data passed to the model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cls
|
type[ModalService]
|
The class reference for creating an instance of |
required |
Returns:
Name | Type | Description |
---|---|---|
Callable |
Callable[[str, BaseModel], Awaitable[BaseModel]]
|
A function that, when called, executes a synchronous inference call and returns the result. |
Example
sync_fn = ModalService.sync_call(MyApp) result = sync_fn(model_name="example_model", input_data) print(result) InferenceOutputModel(status="success", ...)
Source code in modalkit/modal_service.py
options: show_root_heading: true show_source: false
Modal Utilities¶
Configuration class for handling Modal-specific operations. Provide many helper methods to permit shorthand usage of Modal, in app code
Additionally, has some staticmethods that can be used without instantiating the class.
Source code in modalkit/modal_config.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 |
|
all_volumes
property
¶
Gets all volume mounts including both regular volumes and cloud bucket mounts.
Returns:
Name | Type | Description |
---|---|---|
dict |
dict[str, Union[Volume, CloudBucketMount]]
|
Combined dictionary of Modal volumes and CloudBucketMounts |
app_name
property
¶
Gets the complete application name.
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The application name with prefix and postfix |
app_postfix
property
¶
Gets the application postfix from environment.
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The application postfix, defaults to "-dev" |
app_settings
property
¶
Gets the application-specific settings.
Returns:
Name | Type | Description |
---|---|---|
AppSettings |
AppSettings
|
The application configuration settings |
cloud_bucket_mounts
property
¶
Gets the Modal cloud bucket mounts based on configuration.
Returns:
Name | Type | Description |
---|---|---|
dict |
dict[str, CloudBucketMount]
|
A dictionary mapping mount points to Modal CloudBucketMount objects |
model_settings
property
¶
Gets the model-specific settings.
Returns:
Name | Type | Description |
---|---|---|
ModelSettings |
ModelSettings
|
The model configuration settings |
region
property
¶
Gets the Modal deployment region.
Returns:
Type | Description |
---|---|
Optional[str]
|
Optional[str]: String of the modal deployment region. |
volumes
property
¶
Gets the Modal volumes based on the deployment config. Returns cached volumes if already computed.
Returns:
Type | Description |
---|---|
dict[str, Volume]
|
dict[str, modal.Volume]: Dictionary of Modal volumes |
get_app_cls_settings()
¶
Gets Modal application class settings.
Returns:
Name | Type | Description |
---|---|---|
dict |
dict[str, Any]
|
Application settings with None values removed, including: - Container image configuration (with local mounts embedded) - GPU requirements - Secrets and concurrency settings - Volume configurations |
Source code in modalkit/modal_config.py
get_asgi_app_settings()
¶
Gets Modal ASGI app settings for web endpoints.
Returns:
Name | Type | Description |
---|---|---|
dict |
dict[str, Any]
|
ASGI app settings including: - requires_proxy_auth: Whether to enable Modal proxy authentication |
Source code in modalkit/modal_config.py
get_batched_method_settings()
¶
Gets Modal batched method settings.
Returns:
Name | Type | Description |
---|---|---|
dict |
dict[str, Any]
|
batched method including: - max_batch_size - wait_ms |
get_handler_settings()
¶
Gets Modal request handler settings.
Returns:
Name | Type | Description |
---|---|---|
dict |
dict[str, Any]
|
Handler settings including: - Application image (with local mounts embedded) - Required secrets - Concurrency settings |
Source code in modalkit/modal_config.py
get_image()
¶
Creates a Modal container image configuration.
Returns:
Type | Description |
---|---|
Image
|
modal.Image: Configured Modal container image with: - Base image (either from registry or debian_slim) - Build commands - Environment variables - Working directory - Local file/directory mounts (added via Modal 1.0 API) |
Source code in modalkit/modal_config.py
reload_volumes()
¶
Reloads the Modal volumes. Handles errors gracefully and provides detailed logging of the process.
Source code in modalkit/modal_config.py
options: show_root_heading: true show_source: false
Input/Output Models¶
Async Models¶
Bases: BaseModel
, Generic[T]
Model for asynchronous operation inputs, wrapping the message with queue information.
Attributes:
Name | Type | Description |
---|---|---|
message |
T
|
The actual input data, must be a Pydantic model |
success_queue |
str
|
SQS queue name for successful results |
failure_queue |
str
|
SQS queue name for error messages |
meta |
dict
|
Additional metadata to be passed through the processing pipeline |
Notes
The model_config ensures no extra fields are allowed in the input
Source code in modalkit/iomodel.py
options: show_root_heading: true show_source: false
Bases: BaseModel
Model for asynchronous operation outputs.
Attributes:
Name | Type | Description |
---|---|---|
job_id |
str
|
Unique identifier for tracking the asynchronous job |
Source code in modalkit/iomodel.py
options: show_root_heading: true show_source: false
Task Queue¶
Bases: Protocol
Protocol for queue backends.
Implement this interface to create custom queue backends. The interface is intentionally minimal - just message sending.
Source code in modalkit/task_queue.py
send_message(queue_name, message)
async
¶
Send a message to the specified queue.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
queue_name
|
str
|
Name/identifier of the queue |
required |
message
|
str
|
Message content (JSON string) |
required |
Returns:
Type | Description |
---|---|
bool
|
True if message was sent successfully, False otherwise |
Source code in modalkit/task_queue.py
options: show_root_heading: true show_source: false
Simple in-memory backend for testing and development. Messages are just logged, not actually queued.
Source code in modalkit/task_queue.py
send_message(queue_name, message)
async
¶
Send message to in-memory log
Source code in modalkit/task_queue.py
options: show_root_heading: true show_source: false
Direct AWS SQS backend implementation.
This is a basic implementation - for production use, consider implementing a custom backend with proper error handling, retry logic, etc.
Source code in modalkit/task_queue.py
send_message(queue_name, message)
async
¶
Send message to SQS queue
Source code in modalkit/task_queue.py
options: show_root_heading: true show_source: false