feat: Add task-native and agent-native multimodality support #4004

ssam18 · 2025-12-01T00:23:47Z

Summary

This PR implements native multimodality support for crewAI Agents and Tasks, addressing feature request #3860.

Changes

New Modules

multimodal/image.py

Image class for handling images in multiple formats:
- HTTP(S) URLs via from_url()
- Local file paths via from_file() with automatic MIME type detection
- Base64 encoded strings via from_base64()
- Binary data via from_binary()
- Runtime placeholders via from_placeholder()
Automatic source type inference
Conversion methods: to_data_url(), to_message_content()
Compatible with OpenAI, Anthropic, and other vision-capable LLM APIs

multimodal/multipart_content.py

MultipartContent class for managing compound text+image context
Methods: add_text(), add_image(), to_message_content()
Utility methods: get_text_only(), has_images()

Integration

agent/core.py and task.py

Added multipart_context field to both Agent and Task classes
Field type: list[str | Image] | MultipartContent | None
Enables multimodal system prompts for agents and multimodal inputs for tasks

Usage Example

from crewai import Agent, Task
from crewai.multimodal import Image, MultipartContent

# Agent with multimodal context
agent = Agent(
    role="Vision Analyst",
    goal="Analyze images and diagrams",
    multipart_context=[
        "You are an expert at analyzing visual content.",
        Image.from_url("https://example.com/reference.png")
    ]
)

# Task with multipart content
content = MultipartContent()
content.add_text("Compare these two architectural diagrams:")
content.add_image(Image.from_file("diagram1.png"))
content.add_image(Image.from_file("diagram2.png"))

task = Task(
    description="Identify differences between the diagrams",
    multipart_context=content,
    agent=agent
)

Testing

Comprehensive test suite included in tests/multimodal/:

test_image.py: 19 test cases covering all Image class functionality
test_multipart_content.py: 13 test cases for MultipartContent
run_tests.py: Custom test runner for environments without pytest
validate_implementation.py: Syntax and structure validation

All validation checks pass successfully.

Closes

#3860

Note

Introduces multimodal support via new Image and MultipartContent classes and integrates them as multipart_context on Agent and Task, with comprehensive tests.

Multimodal module (crewai/multimodal/):
- Image: constructors for URL/file/base64/binary/placeholder; conversion to data URLs and LLM message content.
- MultipartContent: manage mixed text+images; convert to LLM message parts; text-only extraction and helpers.
Core integration:
- crewai/agent/core.py: add multipart_context: list[str | Image] | MultipartContent | None to include multimodal context in system prompt.
- crewai/task.py: add multipart_context: list[str | Image] | MultipartContent | None to enrich task context.
Tests (tests/multimodal/):
- Unit tests for Image and MultipartContent; integration checks; standalone runner and validator.

^{Written by Cursor Bugbot for commit 448c815. This will update automatically on new commits. Configure here.}

Implements crewAIInc#3860 This PR introduces comprehensive multimodality support for crewAI Agents and Tasks, enabling them to work with images and mixed text+image content. ## Key Features ### Image Class (multimodal/image.py) - Support for multiple image sources: - HTTP(S) URLs via from_url() - Local file paths via from_file() with auto MIME detection - Base64 encoded strings via from_base64() - Binary data via from_binary() - Runtime placeholders via from_placeholder() - Automatic source type inference - Conversion to data URLs and LLM message format - Compatible with OpenAI, Anthropic, and other vision-capable LLMs ### MultipartContent Class (multimodal/multipart_content.py) - Manage compound context with text and images - add_text() and add_image() methods for building content - to_message_content() for LLM API format conversion - Utility methods: get_text_only(), has_images() ### Agent & Task Integration - Added multipart_context field to both Agent and Task classes - Accepts: list[str | Image], MultipartContent, or None - Enables multimodal system prompts and task inputs ## Usage Examples ## Testing - Comprehensive unit tests in tests/multimodal/ - test_image.py: 19 test cases for Image class - test_multipart_content.py: 13 test cases for MultipartContent - Custom test runner (run_tests.py) for environments without pytest Signed-off-by: Samaresh Kumar Singh <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add task-native and agent-native multimodality support #4004

feat: Add task-native and agent-native multimodality support #4004

Uh oh!

ssam18 commented Dec 1, 2025 •

edited by cursor bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Add task-native and agent-native multimodality support #4004

Are you sure you want to change the base?

feat: Add task-native and agent-native multimodality support #4004

Uh oh!

Conversation

ssam18 commented Dec 1, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New Modules

Integration

Usage Example

Testing

Closes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ssam18 commented Dec 1, 2025 •

edited by cursor bot

Loading