Skip to content

Bedrock

Install

To use BedrockConverseModel, you need to either install pydantic-ai, or install pydantic-ai-slim with the bedrock optional group:

pip install "pydantic-ai-slim[bedrock]"
uv add "pydantic-ai-slim[bedrock]"

Configuration

To use AWS Bedrock, you'll need an AWS account with Bedrock enabled and appropriate credentials. You can use either AWS credentials directly or a pre-configured boto3 client.

BedrockModelName contains a list of available Bedrock models, including models from Anthropic, Amazon, Cohere, Meta, and Mistral.

Environment variables

You can set your AWS credentials as environment variables (among other options):

export AWS_BEARER_TOKEN_BEDROCK='your-api-key'
# or:
export AWS_ACCESS_KEY_ID='your-access-key'
export AWS_SECRET_ACCESS_KEY='your-secret-key'
export AWS_DEFAULT_REGION='us-east-1'  # or your preferred region

You can then use BedrockConverseModel by name:

Learn about Gateway
from pydantic_ai import Agent

agent = Agent('gateway/bedrock:anthropic.claude-3-sonnet-20240229-v1:0')
...
from pydantic_ai import Agent

agent = Agent('bedrock:anthropic.claude-3-sonnet-20240229-v1:0')
...

Or initialize the model directly with just the model name:

from pydantic_ai import Agent
from pydantic_ai.models.bedrock import BedrockConverseModel

model = BedrockConverseModel('anthropic.claude-3-sonnet-20240229-v1:0')
agent = Agent(model)
...

Customizing Bedrock Runtime API

You can customize the Bedrock Runtime API calls by adding additional parameters, such as guardrail configurations and performance settings. For a complete list of configurable parameters, refer to the documentation for BedrockModelSettings.

customize_bedrock_model_settings.py
from pydantic_ai import Agent
from pydantic_ai.models.bedrock import BedrockConverseModel, BedrockModelSettings

# Define Bedrock model settings with guardrail and performance configurations
bedrock_model_settings = BedrockModelSettings(
    bedrock_guardrail_config={
        'guardrailIdentifier': 'v1',
        'guardrailVersion': 'v1',
        'trace': 'enabled'
    },
    bedrock_performance_configuration={
        'latency': 'optimized'
    }
)


model = BedrockConverseModel(model_name='us.amazon.nova-pro-v1:0')

agent = Agent(model=model, model_settings=bedrock_model_settings)

Prompt Caching

Bedrock supports prompt caching on Anthropic models so you can reuse expensive context across requests. Pydantic AI provides four ways to use prompt caching:

  1. Cache User Messages with CachePoint: Insert a CachePoint marker to cache everything before it in the current user message.
  2. Cache System Instructions: Enable BedrockModelSettings.bedrock_cache_instructions to append a cache point after the system prompt.
  3. Cache Tool Definitions: Enable BedrockModelSettings.bedrock_cache_tool_definitions to cache your tool schemas.
  4. Cache All Messages: Set BedrockModelSettings.bedrock_cache_messages to True to automatically cache the last user message.

No TTL Support

Unlike the direct Anthropic API, Bedrock manages cache TTL automatically. All cache settings are boolean only — no '5m' or '1h' options.

Minimum Token Threshold

AWS only serves cached content once a segment crosses the provider-specific minimum token thresholds (see the Bedrock prompt caching docs). Short prompts or tool definitions below those limits will bypass the cache, so don't expect savings for tiny payloads.

Example 1: Automatic Message Caching

Use bedrock_cache_messages to automatically cache the last user message:

Learn about Gateway
from pydantic_ai import Agent
from pydantic_ai.models.bedrock import BedrockModelSettings

agent = Agent(
    'gateway/bedrock:us.anthropic.claude-sonnet-4-5-20250929-v1:0',
    system_prompt='You are a helpful assistant.',
    model_settings=BedrockModelSettings(
        bedrock_cache_messages=True,  # Automatically caches the last message
    ),
)

# The last message is automatically cached - no need for manual CachePoint
result1 = agent.run_sync('What is the capital of France?')

# Subsequent calls with similar conversation benefit from cache
result2 = agent.run_sync('What is the capital of Germany?')
print(f'Cache write: {result1.usage().cache_write_tokens}')
print(f'Cache read: {result2.usage().cache_read_tokens}')
from pydantic_ai import Agent
from pydantic_ai.models.bedrock import BedrockModelSettings

agent = Agent(
    'bedrock:us.anthropic.claude-sonnet-4-5-20250929-v1:0',
    system_prompt='You are a helpful assistant.',
    model_settings=BedrockModelSettings(
        bedrock_cache_messages=True,  # Automatically caches the last message
    ),
)

# The last message is automatically cached - no need for manual CachePoint
result1 = agent.run_sync('What is the capital of France?')

# Subsequent calls with similar conversation benefit from cache
result2 = agent.run_sync('What is the capital of Germany?')
print(f'Cache write: {result1.usage().cache_write_tokens}')
print(f'Cache read: {result2.usage().cache_read_tokens}')

Example 2: Comprehensive Caching Strategy

Combine multiple cache settings for maximum savings:

from pydantic_ai import Agent, RunContext
from pydantic_ai.models.bedrock import BedrockConverseModel, BedrockModelSettings

model = BedrockConverseModel('us.anthropic.claude-sonnet-4-5-20250929-v1:0')
agent = Agent(
    model,
    system_prompt='Detailed instructions...',
    model_settings=BedrockModelSettings(
        bedrock_cache_instructions=True,       # Cache system instructions
        bedrock_cache_tool_definitions=True,   # Cache tool definitions
        bedrock_cache_messages=True,           # Also cache the last message
    ),
)


@agent.tool
def search_docs(ctx: RunContext, query: str) -> str:
    """Search documentation."""
    return f'Results for {query}'


result = agent.run_sync('Search for Python best practices')
print(result.output)

Example 3: Fine-Grained Control with CachePoint

Use manual CachePoint markers to control cache locations precisely:

Learn about Gateway
from pydantic_ai import Agent, CachePoint

agent = Agent(
    'gateway/bedrock:us.anthropic.claude-sonnet-4-5-20250929-v1:0',
    system_prompt='Instructions...',
)

# Manually control cache points for specific content blocks
result = agent.run_sync([
    'Long context from documentation...',
    CachePoint(),  # Cache everything up to this point
    'First question'
])
print(result.output)
from pydantic_ai import Agent, CachePoint

agent = Agent(
    'bedrock:us.anthropic.claude-sonnet-4-5-20250929-v1:0',
    system_prompt='Instructions...',
)

# Manually control cache points for specific content blocks
result = agent.run_sync([
    'Long context from documentation...',
    CachePoint(),  # Cache everything up to this point
    'First question'
])
print(result.output)

Accessing Cache Usage Statistics

Access cache usage statistics via RequestUsage:

Learn about Gateway
from pydantic_ai import Agent, CachePoint

agent = Agent('gateway/bedrock:us.anthropic.claude-sonnet-4-5-20250929-v1:0')


async def main():
    result = await agent.run(
        [
            'Reference material...',
            CachePoint(),
            'What changed since last time?',
        ]
    )
    usage = result.usage()
    print(f'Cache writes: {usage.cache_write_tokens}')
    print(f'Cache reads: {usage.cache_read_tokens}')
from pydantic_ai import Agent, CachePoint

agent = Agent('bedrock:us.anthropic.claude-sonnet-4-5-20250929-v1:0')


async def main():
    result = await agent.run(
        [
            'Reference material...',
            CachePoint(),
            'What changed since last time?',
        ]
    )
    usage = result.usage()
    print(f'Cache writes: {usage.cache_write_tokens}')
    print(f'Cache reads: {usage.cache_read_tokens}')

Cache Point Limits

Bedrock enforces a maximum of 4 cache points per request. Pydantic AI automatically manages this limit to ensure your requests always comply without errors.

How Cache Points Are Allocated

Cache points can be placed in three locations:

  1. System Prompt: Via bedrock_cache_instructions setting (adds cache point to last system prompt block)
  2. Tool Definitions: Via bedrock_cache_tool_definitions setting (adds cache point to last tool definition)
  3. Messages: Via CachePoint markers or bedrock_cache_messages setting (adds cache points to message content)

Each setting uses at most 1 cache point, but you can combine them.

Automatic Cache Point Limiting

When cache points from all sources (settings + CachePoint markers) exceed 4, Pydantic AI automatically removes excess cache points from older message content (keeping the most recent ones).

Learn about Gateway
from pydantic_ai import Agent, CachePoint
from pydantic_ai.models.bedrock import BedrockModelSettings

agent = Agent(
    'gateway/bedrock:us.anthropic.claude-sonnet-4-5-20250929-v1:0',
    system_prompt='Instructions...',
    model_settings=BedrockModelSettings(
        bedrock_cache_instructions=True,      # 1 cache point
        bedrock_cache_tool_definitions=True,  # 1 cache point
    ),
)

@agent.tool_plain
def search() -> str:
    return 'data'


# Already using 2 cache points (instructions + tools)
# Can add 2 more CachePoint markers (4 total limit)
result = agent.run_sync([
    'Context 1', CachePoint(),  # Oldest - will be removed
    'Context 2', CachePoint(),  # Will be kept (3rd point)
    'Context 3', CachePoint(),  # Will be kept (4th point)
    'Question'
])
# Final cache points: instructions + tools + Context 2 + Context 3 = 4
print(result.output)
from pydantic_ai import Agent, CachePoint
from pydantic_ai.models.bedrock import BedrockModelSettings

agent = Agent(
    'bedrock:us.anthropic.claude-sonnet-4-5-20250929-v1:0',
    system_prompt='Instructions...',
    model_settings=BedrockModelSettings(
        bedrock_cache_instructions=True,      # 1 cache point
        bedrock_cache_tool_definitions=True,  # 1 cache point
    ),
)

@agent.tool_plain
def search() -> str:
    return 'data'


# Already using 2 cache points (instructions + tools)
# Can add 2 more CachePoint markers (4 total limit)
result = agent.run_sync([
    'Context 1', CachePoint(),  # Oldest - will be removed
    'Context 2', CachePoint(),  # Will be kept (3rd point)
    'Context 3', CachePoint(),  # Will be kept (4th point)
    'Question'
])
# Final cache points: instructions + tools + Context 2 + Context 3 = 4
print(result.output)

Key Points:

  • System and tool cache points are always preserved
  • The cache point created by bedrock_cache_messages is always preserved (as it's the newest message cache point)
  • Additional CachePoint markers in messages are removed from oldest to newest when the limit is exceeded
  • This ensures critical caching (instructions/tools) is maintained while still benefiting from message-level caching

provider argument

You can provide a custom BedrockProvider via the provider argument. This is useful when you want to specify credentials directly or use a custom boto3 client:

from pydantic_ai import Agent
from pydantic_ai.models.bedrock import BedrockConverseModel
from pydantic_ai.providers.bedrock import BedrockProvider

# Using AWS credentials directly
model = BedrockConverseModel(
    'anthropic.claude-3-sonnet-20240229-v1:0',
    provider=BedrockProvider(
        region_name='us-east-1',
        aws_access_key_id='your-access-key',
        aws_secret_access_key='your-secret-key',
    ),
)
agent = Agent(model)
...

You can also pass a pre-configured boto3 client:

import boto3

from pydantic_ai import Agent
from pydantic_ai.models.bedrock import BedrockConverseModel
from pydantic_ai.providers.bedrock import BedrockProvider

# Using a pre-configured boto3 client
bedrock_client = boto3.client('bedrock-runtime', region_name='us-east-1')
model = BedrockConverseModel(
    'anthropic.claude-3-sonnet-20240229-v1:0',
    provider=BedrockProvider(bedrock_client=bedrock_client),
)
agent = Agent(model)
...

Configuring Retries

Bedrock uses boto3's built-in retry mechanisms. You can configure retry behavior by passing a custom boto3 client with retry settings:

import boto3
from botocore.config import Config

from pydantic_ai import Agent
from pydantic_ai.models.bedrock import BedrockConverseModel
from pydantic_ai.providers.bedrock import BedrockProvider

# Configure retry settings
config = Config(
    retries={
        'max_attempts': 5,
        'mode': 'adaptive'  # Recommended for rate limiting
    }
)

bedrock_client = boto3.client(
    'bedrock-runtime',
    region_name='us-east-1',
    config=config
)

model = BedrockConverseModel(
    'us.amazon.nova-micro-v1:0',
    provider=BedrockProvider(bedrock_client=bedrock_client),
)
agent = Agent(model)

Retry Modes

  • 'legacy' (default): 5 attempts, basic retry behavior
  • 'standard': 3 attempts, more comprehensive error coverage
  • 'adaptive': 3 attempts with client-side rate limiting (recommended for handling ThrottlingException)

For more details on boto3 retry configuration, see the AWS boto3 documentation.

Note

Unlike other providers that use httpx for HTTP requests, Bedrock uses boto3's native retry mechanisms. The retry strategies described in HTTP Request Retries do not apply to Bedrock.