-
Notifications
You must be signed in to change notification settings - Fork 70
Model Engine OnPrem Support and vLLM 0.11.1 + Model Engine Integration Fixes #744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
charlesahn-scale
wants to merge
30
commits into
main
Choose a base branch
from
onPremSuport
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
704b51c
add support for on-prem
TarunRavikumar 6cc02b5
clean up on-prem artificats
TarunRavikumar cb3023e
add back comments from initial code
TarunRavikumar 2c20c05
fix lint
TarunRavikumar d1c5b8f
use ecr image repo:tag directly
TarunRavikumar 7776b66
fix: isort import ordering
TarunRavikumar e94c1db
fix: remove unused infra_config import
TarunRavikumar 5a08443
fix: mypy type annotation errors
TarunRavikumar 45a37d4
fix: remove type annotation causing mypy no-redef error
TarunRavikumar af30729
fix: mypy type errors in s3_utils.py and io.py - use botocore.config.…
TarunRavikumar e4a7285
fix: mypy typeddict-item errors - use broad type ignore
TarunRavikumar 93bbb2b
fix: update test mocks to use get_s3_resource from s3_utils
TarunRavikumar 37237bf
test: add unit tests for s3_utils, onprem_docker_repository, and onpr…
TarunRavikumar 65a2051
style: format test files with black
TarunRavikumar 6c8b546
refactor: use filesystem_gateway abstraction for S3 operations
TarunRavikumar 7789d12
fix: deduplicate S3 client config by using centralized s3_utils
TarunRavikumar 9ed8d4a
fix: add pagination to list_objects to handle >1000 objects
TarunRavikumar 50d04f5
fix: make OnPremDockerRepository.get_image_url consistent with ECR/ACR
TarunRavikumar deea6ee
refactor: add explicit on-prem branches in dependencies.py for clarity
TarunRavikumar cca8f8a
feat: implement Redis LLEN for queue depth in OnPremQueueEndpointReso…
TarunRavikumar 9fefc15
fix: replace mutable default argument with None in _get_client
TarunRavikumar 4f24079
refactor: extract inline import to module-level helper function
TarunRavikumar d223b9d
fix: reduce excessive debug logging in s3_utils
TarunRavikumar 366dd6e
chore: remove unused TYPE_CHECKING import
TarunRavikumar a0b1773
fix: make Dockerfile multi-arch compatible for ARM/AMD64
TarunRavikumar 348f03c
style: fix black formatting in test_onprem_queue_endpoint_resource_de…
TarunRavikumar 58fc437
fix: restore AWS_PROFILE env var fallback in s3_utils
TarunRavikumar 04e9c6d
fix: correct isort ordering in s3_filesystem_gateway.py
TarunRavikumar fdeee61
fix: use Literal type for s3 addressing_style to satisfy mypy
TarunRavikumar e6b0e6d
Onprem Compatibility Change
charlesahn-scale File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| # On-premise deployment configuration | ||
| # This configuration file provides defaults for on-prem deployments | ||
| # Many values can be overridden via environment variables | ||
|
|
||
| cloud_provider: "onprem" | ||
| env: "production" # Can be: production, staging, development, local | ||
| k8s_cluster_name: "onprem-cluster" | ||
| dns_host_domain: "ml.company.local" | ||
| default_region: "us-east-1" # Placeholder for compatibility with cloud-agnostic code | ||
|
|
||
| # ==================== | ||
| # Object Storage (MinIO/S3-compatible) | ||
| # ==================== | ||
| s3_bucket: "model-engine" | ||
| # S3 endpoint URL - can be overridden by S3_ENDPOINT_URL env var | ||
| # Examples: "https://minio.company.local", "http://minio-service:9000" | ||
| s3_endpoint_url: "" # Set via S3_ENDPOINT_URL env var if not specified here | ||
| # MinIO requires path-style addressing (bucket in URL path, not subdomain) | ||
| s3_addressing_style: "path" | ||
|
|
||
| # ==================== | ||
| # Redis Configuration | ||
| # ==================== | ||
| # Redis is used for: | ||
| # - Celery task queue broker | ||
| # - Model endpoint caching | ||
| # - Inference autoscaling metrics | ||
| redis_host: "" # Set via REDIS_HOST env var (e.g., "redis.company.local" or "redis-service") | ||
| redis_port: 6379 | ||
| # Whether to use Redis as Celery broker (true for on-prem) | ||
| celery_broker_type_redis: true | ||
|
|
||
| # ==================== | ||
| # Celery Configuration | ||
| # ==================== | ||
| # Backend protocol: "redis" for on-prem (not "s3" or "abs") | ||
| celery_backend_protocol: "redis" | ||
|
|
||
| # ==================== | ||
| # Database Configuration | ||
| # ==================== | ||
| # Database connection settings (credentials from environment variables) | ||
| # DB_HOST, DB_PORT, DB_NAME, DB_USER, DB_PASSWORD | ||
| db_host: "postgres" # Default hostname, can be overridden by DB_HOST env var | ||
| db_port: 5432 | ||
| db_name: "llm_engine" | ||
| db_engine_pool_size: 20 | ||
| db_engine_max_overflow: 10 | ||
| db_engine_echo: false | ||
| db_engine_echo_pool: false | ||
| db_engine_disconnect_strategy: "pessimistic" | ||
|
|
||
| # ==================== | ||
| # Docker Registry Configuration | ||
| # ==================== | ||
| # Docker registry prefix for container images | ||
| # Examples: "registry.company.local", "harbor.company.local/ml-platform" | ||
| # Leave empty if using full image paths directly | ||
| docker_repo_prefix: "registry.company.local" | ||
|
|
||
| # ==================== | ||
| # Monitoring & Observability | ||
| # ==================== | ||
| # Prometheus server address for metrics (optional) | ||
| # prometheus_server_address: "http://prometheus:9090" | ||
|
|
||
| # ==================== | ||
| # Not applicable for on-prem (kept for compatibility) | ||
| # ==================== | ||
| ml_account_id: "onprem" | ||
| profile_ml_worker: "default" | ||
| profile_ml_inference_worker: "default" |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit