Skip to content

Fix: AWS Bedrock Not Working — Model Access, IAM, Converse API, Streaming, and Cross-Region

FixDevs · (Updated: )

Part of:  Docker, DevOps & Infrastructure

Quick Answer

How to fix AWS Bedrock errors — AccessDeniedException for model access, bedrock vs bedrock-runtime client, Converse vs InvokeModel API, streaming with ConverseStream, regional availability, and Knowledge Bases setup.

The Error

You call Bedrock and AWS denies access:

AccessDeniedException: You don't have access to the model with 
the specified model ID.

Or the client throws when invoking:

import boto3
client = boto3.client("bedrock")
response = client.invoke_model(modelId="anthropic.claude-3-5-sonnet-20241022-v2:0", body=...)
# AttributeError: 'BedrockClient' object has no attribute 'invoke_model'

Or Converse returns a model-not-found error:

ValidationException: The model ID anthropic.claude-3-5-sonnet-20241022-v2:0 
is not available in this region.

Or streaming hangs:

response = client.converse_stream(modelId="...", messages=...)
for event in response["stream"]:
    print(event)
# Hangs waiting for first event.

Why This Happens

Bedrock is AWS’s managed foundation model service — a single API surface in front of Anthropic Claude, Meta Llama, Mistral, AI21, Cohere, Amazon Titan, and Stable Diffusion image models. The “managed” part is real: AWS handles provisioning, the regional capacity, and the IAM integration. The downside is that Bedrock inherits AWS’s surface area — six different ARN formats for resources, region-by-region availability matrices, and three separate client classes for what looks like one service. Most failures are not bugs in the models; they’re configuration errors at the AWS layer.

The four most common root causes. Model access is opt-in. Each model (Claude, Llama, Mistral, Titan) must be enabled in the Bedrock console per AWS account, per region. New accounts have none enabled by default. Some models require an EULA acceptance form before AWS grants access, and a small number (notably Anthropic’s frontier releases) gate access behind a use-case justification that AWS reviews manually. Two clients: bedrock vs bedrock-runtime. bedrock is for management (list models, manage Knowledge Bases). bedrock-runtime is for actually calling models (invoke_model, converse). For RAG via Knowledge Bases you need a third client, bedrock-agent-runtime. Two APIs: InvokeModel vs Converse. InvokeModel is provider-specific (different JSON shape per model — Claude wants anthropic_version and a specific message format, Llama wants prompt as a plain string, Titan wants inputText). Converse is unified across models. Use Converse unless you have a reason not to. Regional availability. Not every model is in every region. us-east-1 and us-west-2 typically have the most. Some models require “cross-region inference” to access from other regions.

There’s also a quieter category of failures: behavioral differences between Bedrock-hosted and provider-hosted versions of the same model. Anthropic’s Claude on Bedrock is the same weights as Claude on the Anthropic API, but the system prompt handling, default temperature, and safety filters can differ. If a prompt works on the Anthropic API and behaves oddly on Bedrock, this is usually why. Test prompts against both surfaces before assuming Bedrock is “broken.”

How Other Tools Handle This

Bedrock is one option in a crowded LLM gateway market. The differences are practical, not theoretical, and they affect which problems you have.

  • AWS Bedrock. Multi-vendor (Claude, Llama, Mistral, Titan, Cohere) behind one IAM-gated API. Strengths: VPC endpoints, KMS encryption, native CloudWatch metrics, Bedrock Guardrails for policy enforcement. Weaknesses: opt-in model access, regional fragmentation, the three-client awkwardness above. Pricing is per-token, with on-demand and Provisioned Throughput tiers.
  • Google Vertex AI. Hosts Gemini natively plus Anthropic Claude (in select regions) and a model garden for Hugging Face / Mistral. Strengths: tight integration with BigQuery, Cloud Storage, and Google IAM. Weaknesses: model availability skewed toward Google’s own models; Claude on Vertex lags Anthropic’s release schedule by weeks. Region setup is less of an issue because Vertex has multi-region endpoints by default.
  • Azure OpenAI. Microsoft’s hosted OpenAI deployments. Strengths: same OpenAI SDK works with api_base swap; SOC2/HIPAA compliance, regional data residency. Weaknesses: deployment quota is per-region, per-model, and per-subscription, and quota requests can take days. Streaming uses the same SSE format as OpenAI direct, no Bedrock-style event vocabulary.
  • OpenAI direct. Simplest. One API key, one endpoint, one SDK. Streaming with SSE. Weaknesses for enterprises: no VPC option, regional residency is limited to the OpenAI Enterprise tier.
  • Anthropic API direct. Same Claude models as on Bedrock, same SDK feel. Strengths: first to get new Claude releases, cleanest tool-use API. Weaknesses: no native AWS IAM, separate billing, no built-in RAG primitive.

If you’re choosing between them: pick Bedrock when you’re already on AWS and need IAM/VPC. Pick Vertex when you want Gemini specifically. Pick Azure when corporate procurement already has an EA. Pick the provider’s direct API when you want the latest model and least friction. The quota/region/model-availability problems below are most acute on Bedrock and Azure, less so on Vertex, and basically absent on the direct APIs.

Fix 1: Enable Model Access

In the AWS Console:

  1. Go to Bedrock in the AWS Console.
  2. Switch to the region you want (top-right region selector).
  3. Left sidebar → Model access.
  4. Click Modify model access (or Manage model access).
  5. Check the models you want (e.g. Claude 3.5 Sonnet, Llama 3.1, etc.).
  6. Submit. Some models require justification (1-2 sentence form); approval is usually instant or within minutes.

To verify via CLI:

aws bedrock list-foundation-models --region us-east-1 \
  --query 'modelSummaries[?contains(modelId, `claude`)]'

This lists Claude models in the region. If a model is in the catalog but AccessDeniedException on invoke, you haven’t enabled access for that specific model.

Pro Tip: Enable all models you might use upfront in your primary region. Approval is usually free and fast; partial approval gets you confused later when you switch models.

Fix 2: Use the Right Client

import boto3

# WRONG — bedrock (management):
client = boto3.client("bedrock")
client.invoke_model(...)  # AttributeError

# RIGHT — bedrock-runtime (data plane):
client = boto3.client("bedrock-runtime", region_name="us-east-1")
client.converse(modelId="anthropic.claude-3-5-sonnet-20241022-v2:0", ...)

Two clients, completely separate:

  • bedrock — list models, manage Provisioned Throughput, manage Guardrails, manage Knowledge Bases.
  • bedrock-runtime — invoke models, converse, stream.

For JS:

import { BedrockRuntimeClient, ConverseCommand } from "@aws-sdk/client-bedrock-runtime";

const client = new BedrockRuntimeClient({ region: "us-east-1" });
const response = await client.send(new ConverseCommand({
  modelId: "anthropic.claude-3-5-sonnet-20241022-v2:0",
  messages: [{ role: "user", content: [{ text: "Hello" }] }],
}));

The bedrock-runtime package is separate from client-bedrock. Install the right one.

Fix 3: Use the Converse API (Unified Across Models)

The unified API works the same way regardless of which model you target:

import boto3
client = boto3.client("bedrock-runtime", region_name="us-east-1")

response = client.converse(
    modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
    messages=[
        {"role": "user", "content": [{"text": "What's the capital of France?"}]},
    ],
    inferenceConfig={
        "maxTokens": 1024,
        "temperature": 0.7,
        "topP": 0.9,
    },
)

print(response["output"]["message"]["content"][0]["text"])

For multi-turn conversations:

messages = [
    {"role": "user", "content": [{"text": "What's 2+2?"}]},
    {"role": "assistant", "content": [{"text": "4."}]},
    {"role": "user", "content": [{"text": "And 3+3?"}]},
]

response = client.converse(modelId=model_id, messages=messages)

For system prompts:

response = client.converse(
    modelId=model_id,
    system=[{"text": "You are a concise assistant. Answer in one sentence."}],
    messages=[{"role": "user", "content": [{"text": "Tell me about Python."}]}],
)

For tool use:

response = client.converse(
    modelId=model_id,
    messages=messages,
    toolConfig={
        "tools": [
            {
                "toolSpec": {
                    "name": "get_weather",
                    "description": "Get current weather",
                    "inputSchema": {
                        "json": {
                            "type": "object",
                            "properties": {"city": {"type": "string"}},
                            "required": ["city"],
                        }
                    },
                }
            }
        ],
    },
)

# Check for tool use:
for block in response["output"]["message"]["content"]:
    if "toolUse" in block:
        tool_name = block["toolUse"]["name"]
        tool_input = block["toolUse"]["input"]
        # Execute tool, send result back in next turn.

Pro Tip: Use Converse over invoke_model. The latter is provider-specific (different JSON for Claude vs Llama vs Titan); Converse is the same regardless of model. Easier to switch models.

Fix 4: Streaming With ConverseStream

For token-by-token streaming:

response = client.converse_stream(
    modelId=model_id,
    messages=[{"role": "user", "content": [{"text": "Write a poem about Python."}]}],
)

for event in response["stream"]:
    if "contentBlockDelta" in event:
        text = event["contentBlockDelta"]["delta"]["text"]
        print(text, end="", flush=True)
    elif "messageStop" in event:
        print(f"\nStop reason: {event['messageStop']['stopReason']}")

The event types in the stream:

  • messageStart — beginning of the assistant’s message.
  • contentBlockStart — beginning of a content block (text, toolUse).
  • contentBlockDelta — incremental update (text tokens).
  • contentBlockStop — end of a content block.
  • messageStop — end of the message with stopReason (end_turn, max_tokens, tool_use, etc.).
  • metadata — token usage stats and latency.

For Node:

import { BedrockRuntimeClient, ConverseStreamCommand } from "@aws-sdk/client-bedrock-runtime";

const client = new BedrockRuntimeClient({ region: "us-east-1" });
const response = await client.send(new ConverseStreamCommand({
  modelId,
  messages: [{ role: "user", content: [{ text: "Write a poem about JavaScript." }] }],
}));

for await (const event of response.stream) {
  if (event.contentBlockDelta) {
    process.stdout.write(event.contentBlockDelta.delta.text);
  }
}

response.stream is an async iterable.

Common Mistake: Awaiting individual stream events. The whole stream is one async iteration; don’t await events individually (they fire as a stream).

Fix 5: IAM Permissions

Your IAM role/user needs bedrock:InvokeModel, bedrock:Converse, etc.:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream",
        "bedrock:Converse",
        "bedrock:ConverseStream"
      ],
      "Resource": "arn:aws:bedrock:*:*:foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0"
    }
  ]
}

For all models in a region:

"Resource": "arn:aws:bedrock:us-east-1::foundation-model/*"

The empty account ID in the ARN (bedrock:us-east-1::) is because foundation models are AWS-owned.

For cross-region inference (calling a model in a different region):

"Resource": [
  "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0",
  "arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0",
  "arn:aws:bedrock:*:*:inference-profile/*"
]

The inference-profile/* resource is for cross-region routing — Bedrock load-balances across regions, requiring access to the profile resource too.

Common Mistake: Granting bedrock:* to everyone. Scope to specific actions for read-only or invocation-only access. bedrock:InvokeModel doesn’t let you modify models; bedrock:* does.

Fix 6: Cross-Region Inference

For Claude on Bedrock in Asia or Europe, you often need cross-region inference. Use an inference profile ID:

response = client.converse(
    modelId="us.anthropic.claude-3-5-sonnet-20241022-v2:0",  # "us." prefix
    messages=...,
)

The us. prefix routes through US regions. Available prefixes typically include us., eu., apac..

To find available inference profiles:

aws bedrock list-inference-profiles --region us-east-1

Cross-region inference improves availability (multiple regions can serve) but adds slight latency (~50ms).

Pro Tip: Always test with inference profiles first if you’re in a region with limited model availability. They’re transparent — your code is the same except for the model ID prefix.

Fix 7: Knowledge Bases for RAG

For built-in RAG, Bedrock Knowledge Bases connect to S3, Confluence, Salesforce, etc., embed documents, and serve queries:

client = boto3.client("bedrock-agent-runtime", region_name="us-east-1")

response = client.retrieve_and_generate(
    input={"text": "What does the company policy say about parental leave?"},
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            "knowledgeBaseId": "ABC123XYZ",
            "modelArn": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0",
        },
    },
)

print(response["output"]["text"])
print("Sources:", response.get("citations", []))

retrieve_and_generate does retrieval + LLM call in one API call. For separate retrieval (so you can post-process):

response = client.retrieve(
    knowledgeBaseId="ABC123XYZ",
    retrievalQuery={"text": "parental leave"},
    retrievalConfiguration={
        "vectorSearchConfiguration": {"numberOfResults": 5},
    },
)

for result in response["retrievalResults"]:
    print(result["content"]["text"])
    print(result["score"])

Use a third client: bedrock-agent-runtime (yes, a third one). It’s separate from bedrock-runtime.

Fix 8: Cost and Provisioned Throughput

Bedrock pricing:

  • On-demand — pay per input/output token. No commitment. Cold start variable.
  • Provisioned Throughput — pay hourly/monthly for guaranteed capacity. Faster, no rate limits, predictable cost.

For on-demand monitoring:

# Token usage in Converse response:
response = client.converse(modelId=..., messages=...)
usage = response["usage"]
print(f"Input: {usage['inputTokens']}, Output: {usage['outputTokens']}")

For Provisioned Throughput:

aws bedrock create-provisioned-model-throughput \
  --model-id anthropic.claude-3-5-sonnet-20241022-v2:0 \
  --provisioned-model-name my-prod \
  --model-units 1 \
  --commitment-duration OneMonth

Then invoke using the provisioned ARN as the modelId:

response = client.converse(
    modelId="arn:aws:bedrock:us-east-1:123456:provisioned-model/abc-123",
    messages=...,
)

Note: Provisioned Throughput is expensive ($10K+/month per unit for top models). Use only when you have sustained, predictable load that warrants the commitment. For variable load, on-demand is much cheaper.

Still Not Working?

A few less-obvious failures:

  • Could not load credentials. Standard AWS auth issue. Set AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY or run on an EC2/Lambda role with Bedrock permissions.
  • Rate limited (ThrottlingException). On-demand has per-account quotas. Request quota increases via AWS Support or use Provisioned Throughput.
  • Model returns wildly different output than via Anthropic API directly. Some Bedrock models have additional safety filters or default system prompts. Compare prompt + response exactly between providers.
  • Streaming buffers entire response. Some proxies (CloudFront, ALB) buffer SSE. Use the SDK’s streaming API directly — don’t proxy.
  • Different model versions on Bedrock vs Anthropic. Bedrock model IDs include the version date (-20241022). Always pin the date — latest aliases can shift, breaking reproducibility.
  • Tool use returns weird formats. The Converse API normalizes tool format, but model behavior differs. Test tool prompts with the specific model you’ll deploy.
  • Bedrock LLM in VPC fails. Bedrock supports VPC Endpoints for private network access. Without one, your VPC-only EC2 can’t reach Bedrock.
  • Image inputs (multimodal) fail. Each model has its own image format limits (size, format, count). Claude 3 accepts up to 5 images per message, max 5 MB each.
  • Bedrock Guardrails block valid output. If you enabled a Guardrail, it filters both inputs and outputs by topic/PII rules. Test with guardrailIdentifier removed to confirm the guardrail is what’s blocking; tune sensitivity in the Bedrock console.
  • bedrock-runtime works but bedrock-agent-runtime returns ResourceNotFoundException. Knowledge Bases are region-scoped. Calling retrieve in the wrong region fails. Verify the KB’s region in the console matches your client’s region_name.
  • S3 access denied during Knowledge Base sync. The Knowledge Base needs its own IAM role with s3:GetObject on the source bucket. Even if your user has S3 access, the KB’s execution role might not. Check IAM → Roles → AmazonBedrockExecutionRoleForKnowledgeBase_*.

For related LLM / API issues, see OpenAI API not working, LiteLLM not working, LangChain Python not working, and AWS IAM permission denied.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles