MedGemma on AWS — Deployment Guide

Subscribe to MedGemma via AWS Marketplace, deploy it to Amazon SageMaker, and run medical-image inference — all from your existing AWS account.

MedGemma is a medical vision-language model packaged by Tech42 and delivered as an AWS Marketplace SageMaker model package. This guide walks you through deploying it as a real-time endpoint or a batch transform job.

← Back to Tech42 home

Prerequisites
Subscribe in AWS Marketplace
Get the Model ARN
Real-time inference
Batch transform
Known issues

Prerequisites

An AWS account with permission to use AWS Marketplace, CloudFormation, and SageMaker.
Service quota for the instance type used by the deployment:
- Real-time inference → SageMaker hosting (endpoint usage) quota.
- Batch transform → SageMaker batch transform job quota.
The quota must exist in the AWS Region where you deploy. See Known issues if your quota is zero or insufficient.

Open the AWS Marketplace.
Search for “MedGemma” and select the product offered by Tech42 (top match: MedGemma 1.5 4B).
On the listing page, click View purchase options.
Review the EULA, pricing, and purchase details, then click Subscribe.
Wait 1–2 minutes for the subscription to validate in your account.

Confirm the subscription:

In the console, go to AWS Marketplace → Manage Subscriptions → Active Subscriptions.
Filter Delivery method by “SageMaker Model”.
Confirm MedGemma appears in your product list.

Get the Model ARN

Both deployment options need the Marketplace Model package ARN. It changes with both the Region and the product version, so copy the one that matches your target Region.

Go to AWS Marketplace → Manage Subscriptions and open your MedGemma subscription, then click Configure (top right).
On the Setup page, set Launch method to AWS CloudFormation, then choose the Version and the Region you will deploy in.
In the Model ARNs panel on the right, copy the ARN for your chosen Region — you will paste it into the stack parameters below.

This page is only for copying the Model ARN. Do not click Launch CloudFormation template (or Download CloudFormation template) here — those use the default AWS Marketplace stack. Instead, deploy with the Tech42 CloudFormation templates under Real-time inference / Batch transform, which provision the full set of resources recommended for this implementation (autoscaling, CloudWatch dashboard, execution role, encryption/VPC options, and more).

Real-time inference

A persistent HTTPS endpoint for synchronous, low-latency inference — best for interactive or online use.

Real-time inference architecture diagram

Steps

Launch the real-time stack (or open it from the product’s Usage instructions page):
Set a Stack name.
Paste the Model ARN from Get the Model ARN into Marketplace Product ARN — the only required field.
Click Create stack and wait for CREATE_COMPLETE.

Template in this repo: cf/template-marketplace-realtime.json — creates the SageMaker model, endpoint config, endpoint, and a CloudWatch dashboard.

Parameters

Required

Marketplace Product ARN — the Model ARN from Get the Model ARN (your subscribed version + Region). Everything below has a sensible default.

Optional settings (defaults shown)

General

Parameter	Default	Notes
Endpoint Name	`medgemma-marketplace-endpoint`	Name of the SageMaker endpoint. 1–63 chars; letters, numbers, hyphens.
Marketplace Referrer URL	—	Optional link back to the Marketplace configuration page.

Size

Parameter	Default	Notes
Instance Type	`ml.g7e.2xlarge`	Allowed: `ml.g6.xlarge/2xlarge/4xlarge`, `ml.g6e.xlarge/2xlarge/4xlarge`, `ml.g7e.2xlarge/4xlarge`. The G5 family is not supported (CUDA/driver image incompatibility). Drives cost and the quota you need.
Initial Instance Count	`1`	Instances launched with the endpoint (min 1).

Scaling

Parameter	Default	Notes
Enable Auto Scaling	`Yes`	`Yes`/`No`. Configures Application Auto Scaling on the variant.
Minimum Instance Count	`1`	Floor for autoscaling.
Maximum Instance Count	`4`	Ceiling for autoscaling.
Invocations Per Instance Target	`5`	Target invocations/instance that triggers scaling.
Scale-In Cooldown Seconds	`300`	Wait after a scale-in before the next.
Scale-Out Cooldown Seconds	`60`	Wait after a scale-out before the next.

Advanced

Parameter	Default	Notes
SageMaker Execution Role ARN	—	Leave blank and the stack creates a least-privilege role; set an ARN to reuse your own.
Production Variant Name	`AllTraffic`	Name of the endpoint production variant.
Model Data Download Timeout Seconds	`3600`	60–3600. Max wait for model artifacts to download.
Container Startup Health Check Timeout Seconds	`1800`	60–3600. Max wait for the container to pass health checks.

Security & Encryption

Parameter	Default	Notes
KMS Key ID	—	Key ID (UUID) or ARN to encrypt endpoint config + captured data. Blank = default SSE-S3.
VPC Subnet IDs	—	Comma-separated subnet IDs to run the model in a VPC. Blank = no VPC.
VPC Security Group IDs	—	Comma-separated SG IDs. Required if VPC Subnet IDs is set.
Enable Network Isolation	`Yes`	`Yes`/`No`. Blocks outbound network from the model container.

How to use

Test the endpoint: notebooks/realtime_endpoint.ipynb invokes the endpoint with a medical image and reads the response. A sample input image is at notebooks/inputs/chest_xray.png.
Autoscaling is on by default (Enable Auto Scaling = Yes); tune it with the Scaling parameters above.
Quota: the endpoint needs SageMaker hosting quota for the chosen instance type in the Region — see Known issues.

Batch transform

An offline job that runs the model over a dataset in S3 and writes results back to S3 — no persistent endpoint. Best for bulk/asynchronous processing.

The CloudFormation template creates the infrastructure (SageMaker model, IAM role, and S3 bucket). The actual transform job is started by the notebook.

Batch transform architecture diagram

Steps

Launch the batch stack:
Set a Stack name.
Fill the required parameter (see Parameters).
Click Create stack and wait for CREATE_COMPLETE.
Open notebooks/batch_transform.ipynb, fill the CloudFormation outputs (ModelName, ExecutionRoleArn, BatchDataBucketName), and run the cells. The notebook uploads a JSONL batch, starts the transform job, waits for completion, and reads the output.

Template in this repo: cf/template-marketplace-batch-transform.json

Parameters

Required

Parameter	Value to set
Marketplace Product ARN	The Model ARN from Get the Model ARN (your subscribed version + Region).

Optional settings (defaults shown)

Parameter	Default	Notes
Input S3 Prefix	`input/`	Prefix in the managed bucket where the notebook will upload batch input.
Output S3 Prefix	`output/`	Prefix in the managed bucket where transform output will be written.
SageMaker Execution Role ARN	(created by stack)	Optional. Leave blank to let the stack create a role with the required S3 and CloudWatch permissions.
Marketplace Referrer URL	—	Optional link back to the Marketplace configuration page.

How to use

Run a job: notebooks/batch_transform.ipynb uploads a JSONL batch request, launches the transform job, and reads the output. A sample input image is at notebooks/inputs/chest_xray.png.
Instance families: the G5 family is not supported (CUDA/driver image incompatibility) — pick a newer supported GPU family in the notebook. See Known issues.
Quota: the job needs SageMaker batch transform quota for the chosen instance type in the Region.

Known issues

Symptom	Cause	Fix
Stack fails creating the endpoint	Quota not available — the account has 0 quota for the instance type.	Request a quota increase in Service Quotas → Amazon SageMaker for the endpoint instance type, then redeploy.
`ResourceLimitExceeded` during deploy	Insufficient quota in the Region.	Raise the quota in that Region, or deploy in a Region where you already have capacity.
Endpoint fails to start or batch transform crashes on a G5 instance	The MedGemma image ships the GPU driver / CUDA build required by the newer instance families. The G5 family is not compatible with that image for either real-time endpoints or batch transform.	Choose a newer supported GPU family — *do not select `ml.g5.`** for real-time or batch transform.

_{MedGemma is provided by Tech42 via AWS Marketplace. This guide is intended for AWS account owners deploying the product into their own environment.}

MedGemma on AWS — Deployment Guide

Contents

Prerequisites

Subscribe in AWS Marketplace

Get the Model ARN

Real-time inference

Steps

Parameters

How to use

Batch transform

Steps

Parameters

How to use

Known issues