🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz
  • Home
  • AI Reference
  • How do I deploy or use a custom fine-tuned model from Bedrock for inference once the fine-tuning job is complete?

How do I deploy or use a custom fine-tuned model from Bedrock for inference once the fine-tuning job is complete?

Once your custom fine-tuned model in AWS Bedrock is ready, you can deploy it for inference using Bedrock’s built-in APIs. After the fine-tuning job completes, check its status in the Bedrock console or via the AWS CLI to confirm it’s marked as “Completed.” Bedrock automatically hosts the fine-tuned model, so no additional deployment steps like provisioning endpoints are required. You’ll receive a unique model identifier (ARN) for your custom model, which you’ll use in API calls. For example, using the AWS CLI, you can verify the model’s availability with aws bedrock list-custom-models and note the ARN in the response.

To perform inference, use the Bedrock Runtime API with your preferred SDK or direct HTTP requests. For instance, in Python with Boto3, you’d call invoke_model from the bedrock-runtime client, specifying the model ARN and input data. Here’s a simplified example:

import boto3
client = boto3.client('bedrock-runtime')
response = client.invoke_model(
 modelId='arn:aws:bedrock:.../custom-model-123',
 contentType='application/json',
 body=json.dumps({'prompt': 'Translate: Hello world', 'max_tokens': 50})
)
output = json.loads(response['body'].read())['generation']

The input format (e.g., prompt vs. input_text) depends on the base model used for fine-tuning. Check Bedrock’s documentation for your model type to ensure the request structure matches its requirements. You can test this locally or integrate it into applications via AWS Lambda, EC2, or containerized services.

For production use, consider monitoring and scaling. Bedrock integrates with CloudWatch to track metrics like invocation counts or latency. If your application requires low-latency responses, test performance under load and adjust parameters like max_tokens or batch size. For security, ensure IAM roles attached to your application have the bedrock:InvokeModel permission. If you need to update the model later, you’ll repeat the fine-tuning process and update the ARN in your code. Bedrock handles the underlying infrastructure, so you avoid managing servers or scaling clusters manually.

Like the article? Spread the word