Merge pull request #132 from bobbywlindsey/main

sridevi1209 · web-flow · commit 8de9f7daf238 · 2025-07-08T13:32:02.000-04:00
Include Amazon SageMaker AI as model provider
diff --git a/.env.template b/.env.template
@@ -1,6 +1,6 @@
 # LITELLM_VERSION eg: main-v1.56.5
 # Get it from https://github.com/berriai/litellm/pkgs/container/litellm/versions?filters%5Bversion_type%5D=tagged
-LITELLM_VERSION="litellm_stable_release_branch-v1.63.2-stable"
+LITELLM_VERSION="litellm_stable_release_branch-v1.73.0-stable"
 TERRAFORM_S3_BUCKET_NAME="" #Must be globally unique
 BUILD_FROM_SOURCE="false"
 HOSTED_ZONE_NAME=""
diff --git a/README.md b/README.md
@@ -1,7 +1,5 @@
 # Guidance for Multi-Provider Generative AI Gateway on AWS
 
-Project ACTIVE as of Feb 15, 2025
-
 ## Table of contents
 
 - [Project Overview](#project-overview)
@@ -35,7 +33,7 @@ If you are unfamiliar with LiteLLM, it provides a consistent interface to access
 1. Tenants and client applications access the LiteLLM gateway proxy API through the [Amazon Route 53](https://aws.amazon.com/route53/) URL endpoint or [Amazon CloudFront](https://aws.amazon.com/cloudfront/) distribution, which is protected against common web exploits and bots using [AWS Web Application Firewall (WAF)](https://aws.amazon.com/waf/).
 2. AWS WAF forwards requests to [Application Load Balancer (ALB)](https://aws.amazon.com/elasticloadbalancing/application-load-balancer/)to automatically distribute incoming application traffic to [Amazon Elastic Container Service (ECS)](https://aws.amazon.com/ecs/) tasks or [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/) pods running generative AI gateway containers. TLS/SSL encryption secures traffic using a certificate issued by [AWS Certificate Manager (ACM)](https://aws.amazon.com/certificate-manager/).
 3. Container images for API/middleware and LiteLLM applications are built during guidance deployment and pushed to [Amazon Elastic Container registry (ECR)](http://aws.amazon.com/ecr/). They are used for deployment to Amazon ECS on AWS Fargate or Amazon EKS clusters that run these applications as containers in ECS tasks or EKS pods, respectively. LiteLLM provides a unified application interface for configuration and interacting with LLM providers. The API/middleware integrates natively with [Amazon Bedrock](https://aws.amazon.com/bedrock/) to enable features not supported by the [LiteLLM Open source project](https://docs.litellm.ai/).
-4. Models hosted on [Amazon Bedrock](https://aws.amazon.com/bedrock/) and [Amazon Nova](https://aws.amazon.com/ai/generative-ai/nova/) provide model access, guardrails, prompt caching, and routing to enhance the AI gateway and additional controls for clients through a unified API. [Access to required Amazon Bedrock models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access-modify.html) must be properly configured. 
+4. Models hosted on [Amazon Bedrock](https://aws.amazon.com/bedrock/) and [Amazon Nova](https://aws.amazon.com/ai/generative-ai/nova/) provide model access, guardrails, prompt caching, and routing to enhance the AI gateway and additional controls for clients through a unified API. Model access is also available for models deployed on [Amazon SageMaker AI](https://aws.amazon.com/sagemaker-ai/). [Access to required Amazon Bedrock models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access-modify.html) must be properly configured. 
 5. External model providers (such as OpenAI, Anthropic, or Vertex AI) are configured using the LiteLLM Admin UI to enable additional model access through LiteLLM’s unified application interface. Integrate pre-existing configurations of third-party providers into the gateway using LiteLLM APIs. 
 6. LiteLLM integrates with [Amazon ElastiCache (Redis OSS)](https://aws.amazon.com/elasticache/), [Amazon Relational Database Service (RDS)](https://aws.amazon.com/rds/), and [AWS Secrets Manager](https://aws.amazon.com/secrets-manager/) services. Amazon ElastiCache enables multi-tenant distribution of application settings and prompt caching. Amazon RDS enables persistence of virtual API keys and other configuration settings provided by LiteLLM. Secrets Manager stores external model provider credentials and other sensitive settings securely.
 7. LiteLLM and the API/middleware store application sends logs to the dedicated [Amazon S3](https://aws.amazon.com/s3) storage bucket for troubleshooting and access analysis. 
@@ -192,6 +190,7 @@ This provides a robust defense against direct ALB access even if someone discove
 | **AWS Service**                                                                                         | **Role**           | **Description**                                                                                             |
 | ------------------------------------------------------------------------------------------------------- | ------------------ | ----------------------------------------------------------------------------------------------------------- |
 | [Amazon Bedrock](https://aws.amazon.com/bedrock/)                                    | Core service       | Manages Single API access to multiple Foundational Models                                                   |
+| [Amazon SageMaker AI](https://aws.amazon.com/sagemaker-ai/)                          | Core service       | Manages access to any Foundational Model deployed on Amazon SageMaker AI                                    |
 | [Amazon Elastic Container Service](https://aws.amazon.com/ecs/) ( ECS)               | Core service       | Manages application platform and on-demand infrastructure for LiteLLM container orchestration.              |
 | [Amazon Elastic Kubernetes Service](https://aws.amazon.com/eks/) ( EKS)              | Core service       | Manages Kubernetes control plane and compute nodes for LiteLLM container orchestration.                     |
 | [Amazon Elastic Compute Cloud](https://aws.amazon.com/ec2/) (EC2)                    | Core service       | Provides compute instances for EKS compute nodes and runs containerized applications.                       |
@@ -222,7 +221,7 @@ When implementing this guidance on AWS, it's important to understand the various
 
 The total cost of running this solution can be broadly categorized into two main components:
 
-1. **LLM Provider Costs**: These are the charges incurred for using services from LLM providers such as Amazon Bedrock, Amazon SageMaker, Anthropic, and others. Each provider has its own pricing model, typically based on factors like the number of tokens processed, model complexity, and usage volume.
+1. **LLM Provider Costs**: These are the charges incurred for using services from LLM providers such as Amazon Bedrock, Amazon SageMaker AI, Anthropic, and others. Each provider has its own pricing model, typically based on factors like the number of tokens processed, model complexity, and usage volume.
 
 2. **AWS Infrastructure Costs**: These are the costs associated with running the Gen AI Gateway proxy server on AWS infrastructure. This includes various AWS services and resources used to host and operate the solution.
 
diff --git a/media/Gateway-Architecture-with-CloudFront.png b/media/Gateway-Architecture-with-CloudFront.png