Skip to content

Conversation

@asimurka
Copy link
Contributor

@asimurka asimurka commented Jan 13, 2026

Description

This PR brings the new functionality of Azure Entra ID authentication.
Part of it is also the management of the automatic enrichment of the llama stack config with the BYOK section from the LCS config, which shares the principle with the dynamic injection of access tokens.
Since Llama Stack needs a valid Azure API key already at the start of the service, it was necessary to add mechanisms to generate the initial access token even before the start of the Llama Stack service. This works for both library and service mode.
Documentation is also part of it, which discusses in detail the workflow of access tokens for both deployment modes and provides examples of configs for local deployment.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (Cursor
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added Azure Entra ID authentication support for Azure OpenAI inference, replacing API key-based authentication with tenant/client credentials.
    • Introduced automatic token refresh management for Azure Entra ID sessions.
    • Added support for library-mode client reloading with dynamic provider configuration updates.
  • Documentation

    • Added comprehensive Azure Provider Entra ID Authentication Guide covering configuration, deployment, and security considerations.
    • Updated LLM provider compatibility table with Azure model availability information.
  • Dependencies

    • Added azure-core and azure-identity packages for Entra ID authentication support.

✏️ Tip: You can customize this high-level summary in your review settings.

@asimurka asimurka marked this pull request as draft January 13, 2026 16:15
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 13, 2026

Walkthrough

This PR introduces Azure Entra ID token management to replace static API key environment variables. It adds a singleton AzureEntraIDManager for token lifecycle handling, integrates token refresh into query endpoints, updates Docker and workflow configurations with separate credential variables, provides example configurations for different deployment modes, and includes comprehensive documentation and tests for the new authentication flow.

Changes

Cohort / File(s) Summary
Azure Token Management Core
src/authorization/azure_token_manager.py, src/models/config.py
Added AzureEntraIDManager singleton class with token caching, expiration tracking (30-second leeway), and async refresh via ClientSecretCredential. Introduced AzureEntraIdConfiguration model with tenant_id, client_id, client_secret, and scope fields.
Application Startup & Configuration
src/main.py, src/configuration.py, src/lightspeed_stack.py
Integrated AzureEntraIDManager initialization at app startup to refresh tokens and populate AZURE_API_KEY environment variable. Added azure_entra_id property to AppConfig. Removed legacy Llama Stack configuration generation CLI flags.
Query & Streaming Endpoints
src/app/endpoints/query.py, src/app/endpoints/query_v2.py, src/app/endpoints/streaming_query.py
Added Azure token refresh checks before API calls; if token expired, refreshes and reloads library client or updates provider data with new credentials. Removed informational OpenAI response logging.
Client Lifecycle Management
src/client.py
Added reload_library_client() and update_provider_data() methods to AsyncLlamaStackClientHolder; refactored client initialization to support library vs service modes; added config enrichment for BYOK RAG with dynamic reload capability.
Llama Stack Configuration Integration
src/llama_stack_configuration.py
Introduced setup_azure_entra_id_token() to write AZURE_API_KEY to env file; refactored BYOK RAG enrichment; updated generate_configuration() to handle Azure token setup and config enrichment; changed BYOK RAG typing from strongly-typed to dict-based.
Container & Deployment
test.containerfile, scripts/llama-stack-entrypoint.sh
Added azure-identity pip dependency; created entrypoint script to enrich llama-stack config when lightspeed config present; copied configuration and entrypoint scripts into image with proper permissions.
Docker Compose Configuration
docker-compose.yaml, docker-compose-library.yaml
Replaced AZURE_API_KEY environment variable with separate TENANT_ID, CLIENT_ID, CLIENT_SECRET credentials in both services; added lightspeed-stack.yaml volume mount; updated comments to reflect dynamic token retrieval.
Makefile & Build
Makefile
Added CONFIG and LLAMA_STACK_CONFIG default variables; updated run target to pass -c $(CONFIG) flag; introduced new run-llama-stack target for service initialization with env extraction.
CI/CD Workflows
.github/workflows/e2e_tests.yaml, .github/workflows/e2e_tests_providers.yaml
Updated message text to reference run-${{ matrix.environment }}.yaml; added Azure Entra ID config injection step for azure environment; removed explicit Azure API token retrieval; replaced AZURE_API_KEY with TENANT_ID, CLIENT_ID, CLIENT_SECRET in Run services steps.
Example Configurations
examples/azure-run.yaml, examples/lightspeed-stack-azure-entraid-lib.yaml, examples/lightspeed-stack-azure-entraid-service.yaml
Substantially rewrote azure-run.yaml with new provider structure, storage paths, vector_io support; added two new example YAMLs for library and service modes with Azure Entra ID credentials wired to environment variables.
Documentation
docs/providers.md, README.md
Added comprehensive Azure Provider Entra ID Authentication Guide covering configuration, token lifecycle, deployment workflows, security considerations; updated LLM provider compatibility table rows for Azure.
Unit Tests
tests/unit/authorization/test_azure_token_manager.py, tests/unit/test_configuration.py, tests/unit/test_client.py, tests/unit/test_llama_stack_configuration.py, tests/unit/models/config/test_dump_configuration.py
Added comprehensive token manager tests (singleton, expiration, refresh, error handling); added configuration validation tests for azure_entra_id field; updated client tests for reload/update functionality; adapted BYOK RAG tests to dict-based structures; updated config dump tests for new azure_entra_id field.
E2E Test Configuration & Code
tests/e2e/configs/run-azure.yaml, tests/e2e/configuration/library-mode/*, tests/e2e/configuration/server-mode/*, tests/e2e/features/steps/info.py
Minor formatting and whitespace normalization in test configs; updated info.py step to use hardcoded "openai/gpt-4o-mini" assertion instead of dynamic provider/model composition.
Dependencies
pyproject.toml
Added azure-core and azure-identity package dependencies.

Sequence Diagram(s)

sequenceDiagram
    participant Client as API Client
    participant Main as App Main
    participant AEM as AzureEntraIDManager
    participant Endpoint as Query Endpoint
    participant LlamaStack as Llama Stack Service
    
    Note over Main: Application Startup
    Main->>AEM: set_config(azure_config)
    Main->>AEM: refresh_token()
    AEM->>AEM: _retrieve_access_token() via ClientSecretCredential
    AEM->>AEM: _update_access_token() store in AZURE_API_KEY env
    Main->>Main: AZURE_API_KEY set in environment
    
    Note over Client,Main: Request Processing
    Client->>Endpoint: POST /query
    Endpoint->>AEM: check is_token_expired
    alt Token Expired
        Endpoint->>AEM: refresh_token()
        AEM->>AEM: fetch new token via credentials
        AEM->>AEM: store in AZURE_API_KEY
        Endpoint->>Endpoint: reload_library_client or update_provider_data
    end
    Endpoint->>LlamaStack: call with refreshed credentials
    LlamaStack-->>Endpoint: response
    Endpoint-->>Client: response
Loading
sequenceDiagram
    participant Script as llama-stack-entrypoint.sh
    participant Config as llama_stack_configuration.py
    participant Enricher as ConfigEnricher
    participant Container as Llama Stack Container
    
    Note over Script: Container Startup
    Script->>Script: Check if lightspeed config present
    alt Lightspeed Config Exists
        Script->>Config: execute generate_configuration()
        Config->>Enricher: setup_azure_entra_id_token()
        Enricher->>Enricher: retrieve creds from environment
        Enricher->>Enricher: write AZURE_API_KEY to .env
        Config->>Enricher: enrich_byok_rag()
        Enricher->>Enricher: construct vector_dbs & vector_io
        Config->>Config: write enriched YAML
        Script->>Script: source generated .env (optional)
        Script->>Container: exec llama-stack with enriched config
    else No Lightspeed Config
        Script->>Container: exec llama-stack with original config
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

needs-ok-to-test

Suggested reviewers

  • radofuchs
  • tisnik
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'LCORE-861: Azure Entra ID token managment' accurately reflects the main objective of implementing Azure Entra ID authentication and token management, though it contains a typo ('managment' should be 'management').
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@asimurka asimurka force-pushed the azure-entra-id-final branch from 481a427 to def9221 Compare January 14, 2026 08:27
echo "=== Recent commits ==="
git log --oneline -5
- name: Add Azure Entra ID config block to all test configs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same set of LCS config files is shared across all the test jobs. Thus the Entra ID config block has to be added only to the relevant test jobs.

def _update_access_token(self, token: str, expires_on: int) -> None:
"""Update the token in env var and track expiration time."""
self._expires_on = expires_on - TOKEN_EXPIRATION_LEEWAY
os.environ["AZURE_API_KEY"] = token
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Azure token is stored only as an environment variable (single source of truth principle). Every uvicorn process should have its own scope for environment variables. Environment variable is used for the injection into LLS config.

def update_provider_data(self, updates: dict[str, str]) -> AsyncLlamaStackClient:
"""Update provider data headers for service client.
For use with service mode only.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different principle for service client then for library client. The re-injection of a new access token would need to restart the whole service, which is not feasible. Therefore, it replaces tokens in providers' header which is preferably used.

# configuration for Llama Stack
if args.generate_llama_stack_configuration:
try:
generate_configuration(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to src/llama_stack_configuration.py

@asimurka asimurka marked this pull request as ready for review January 14, 2026 09:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/lightspeed_stack.py (2)

26-40: Docstring references removed CLI flags.

The docstring still documents the removed -g/--generate-llama-stack-configuration, -i/--input-config-file, and -o/--output-config-file flags (lines 33-36) that are no longer part of the argument parser.

📝 Suggested fix
 def create_argument_parser() -> ArgumentParser:
     """Create and configure argument parser object.

     The parser includes these options:
     - -v / --verbose: enable verbose output
     - -d / --dump-configuration: dump the loaded configuration to JSON and exit
     - -c / --config: path to the configuration file (default "lightspeed-stack.yaml")
-    - -g / --generate-llama-stack-configuration: generate a Llama Stack
-                                                 configuration from the service configuration
-    - -i / --input-config-file: Llama Stack input configuration filename (default "run.yaml")
-    - -o / --output-config-file: Llama Stack output configuration filename (default "run_.yaml")

     Returns:
         Configured ArgumentParser for parsing the service CLI options.
     """

69-86: Docstring references removed functionality.

The main() docstring still references --generate-llama-stack-configuration behavior (lines 77-79) and mentions "Llama Stack generation fails" as a SystemExit condition (lines 84-85), but this functionality has been moved to src/llama_stack_configuration.py.

📝 Suggested fix
 def main() -> None:
     """Entry point to the web service.

     Start the Lightspeed Core Stack service process based on CLI flags and configuration.

     Parses command-line arguments, loads the configured settings, and then:
     - If --dump-configuration is provided, writes the active configuration to
       configuration.json and exits (exits with status 1 on failure).
-    - If --generate-llama-stack-configuration is provided, generates and stores
-      the Llama Stack configuration to the specified output file and exits
-      (exits with status 1 on failure).
     - Otherwise, sets LIGHTSPEED_STACK_CONFIG_PATH for worker processes, starts
       the quota scheduler, and starts the Uvicorn web service.

     Raises:
-        SystemExit: when configuration dumping or Llama Stack generation fails
-                    (exits with status 1).
+        SystemExit: when configuration dumping fails (exits with status 1).
     """
tests/unit/test_client.py (2)

53-66: Missing @pytest.mark.asyncio decorator.

This async test function is missing the @pytest.mark.asyncio decorator, which could cause it to not execute properly as an async test.

Proposed fix
+@pytest.mark.asyncio
 async def test_get_async_llama_stack_remote_client() -> None:

69-83: Missing @pytest.mark.asyncio decorator.

Same issue as above - this async test function needs the @pytest.mark.asyncio decorator.

Proposed fix
+@pytest.mark.asyncio
 async def test_get_async_llama_stack_wrong_configuration() -> None:
🤖 Fix all issues with AI agents
In `@docs/providers.md`:
- Around line 87-89: The YAML example uses AZURE_TENANT_ID / AZURE_CLIENT_ID /
AZURE_CLIENT_SECRET but the bash export example uses TENANT_ID / CLIENT_ID /
CLIENT_SECRET; make them consistent by renaming the bash exports to
AZURE_TENANT_ID, AZURE_CLIENT_ID, and AZURE_CLIENT_SECRET (or alternately rename
the YAML keys to the non-prefixed names) and update all occurrences (including
the other instance noted) or add a brief note explaining the intentional
difference so readers aren’t confused.

In `@examples/azure-run.yaml`:
- Around line 71-77: The YAML uses an outdated Azure OpenAI api_version value;
update the api_version field under the inference -> config block (the entry with
provider_id: azure and provider_type: remote::azure) to a current release—e.g.,
set api_version to "2025-04-01-preview" for the data-plane or "2025-06-01" if
you need control-plane GA features—so the inference provider uses a supported
API version.

In `@scripts/llama-stack-entrypoint.sh`:
- Around line 7-10: The issue is that INPUT_CONFIG and ENRICHED_CONFIG can point
to the same file so the existence check for ENRICHED_CONFIG can be true even
when enrichment failed; update the script so ENRICHED_CONFIG uses a distinct
default path (e.g., a different filename) or capture the enrichment script's
exit status instead of suppressing errors with "|| true" and use that exit code
to decide whether to treat the output as enriched; refer to the variables
INPUT_CONFIG and ENRICHED_CONFIG and the enrichment invocation (the command
currently followed by "|| true") and the conditional "if [ -f
\"$ENRICHED_CONFIG\" ]" when making the change.

In `@src/app/endpoints/query.py`:
- Around line 319-339: The current code uses next(...) to find azure_config and
will raise StopIteration if no provider with provider_type == "remote::azure"
exists and may convert a missing api_base into the string "None"; change the
lookup to a safe search (e.g., use next((p.config for p in await
client.providers.list() if p.provider_type == "remote::azure"), None) or
explicitly iterate and return None if not found), then check that azure_config
is not None before calling client_holder.update_provider_data (handle the
missing provider case appropriately), and when building azure_api_base use a
safe default instead of str(None) (e.g., use azure_config.get("api_base") or ""
only if not None) so you never store the literal "None" string.

In `@src/app/endpoints/streaming_query.py`:
- Around line 896-916: The next(...) call that looks for a provider with
provider_type == "remote::azure" can raise StopIteration; change the lookup to
use a safe lookup (e.g., azure_config = next((p.config for p in await
client.providers.list() if p.provider_type == "remote::azure"), None)) and then
check azure_config is not None before using it in
client_holder.update_provider_data; if azure_config is None, handle gracefully
(log/raise a clear error or return an appropriate response) so
AzureEntraIDManager.refresh_token() path won't crash when the Azure provider is
missing.

In `@src/llama_stack_configuration.py`:
- Around line 106-108: The except block that catches ClientAuthenticationError
and CredentialUnavailableError (the Azure Entra ID token generation block that
currently calls logger.error("Azure Entra ID: Failed to generate token: %s", e))
must not silently swallow failures; after logging the error re-raise the caught
exception (raise) so callers can handle or fail fast. Locate the except handling
those two exceptions in the token generation function and append a re-raise (or
alternatively change the function to return an explicit failure status and
propagate that to callers) so authentication failures are observable to callers.
- Around line 146-148: The code builds provider_id using
brag.get("vector_db_id") which can be None and yield a malformed "byok_" value;
update the logic around the vector_db_id/provider_id assignment (where
"vector_db_id", "provider_id", "embedding_model" are set) to validate or provide
a safe default for vector_db_id (e.g., use empty-string-safe fallback or a
sentinel like "unknown") and then construct provider_id from that validated
value (or skip/omit provider_id when vector_db_id is absent) so provider_id is
never "byok_" with an empty suffix.
🧹 Nitpick comments (22)
pyproject.toml (1)

63-64: Add version constraints to Azure dependencies for consistency with other packages.

Both azure-core and azure-identity are missing version constraints, while every other dependency in the file specifies minimum or exact versions (e.g., fastapi>=0.115.12, kubernetes>=30.1.0). Consider using azure-core>=1.37.0 and azure-identity>=1.25.1 to maintain consistency and prevent unexpected behavior from major version updates.

src/app/main.py (1)

44-55: Redundant environment variable assignment.

Lines 49-51 set AZURE_API_KEY from access_token, but AzureEntraIDManager.refresh_token() already sets os.environ["AZURE_API_KEY"] internally via _update_access_token(). The explicit assignment is unnecessary.

♻️ Suggested simplification
     azure_config = configuration.configuration.azure_entra_id
     if azure_config is not None:
         AzureEntraIDManager().set_config(azure_config)
         try:
             await AzureEntraIDManager().refresh_token()
-            os.environ["AZURE_API_KEY"] = (
-                AzureEntraIDManager().access_token.get_secret_value()
-            )
             logger.info("Azure Entra ID token set in environment")
         except ValueError as e:
             logger.error("Failed to refresh Azure token: %s", e)

Additionally, consider whether a failed token refresh should prevent startup. Currently, a ValueError is logged but the service continues, which could lead to authentication failures at runtime.

src/app/endpoints/query.py (1)

319-327: Consider storing the singleton reference for cleaner code.

AzureEntraIDManager() is instantiated multiple times (lines 321, 322, 324, and 336). While it's a singleton, storing a single reference improves readability.

♻️ Suggested refactor
         if provider_id == "azure":
+            azure_manager = AzureEntraIDManager()
             if (
-                AzureEntraIDManager().is_entra_id_configured
-                and AzureEntraIDManager().is_token_expired
+                azure_manager.is_entra_id_configured
+                and azure_manager.is_token_expired
             ):
-                await AzureEntraIDManager().refresh_token()
+                await azure_manager.refresh_token()

                 if client_holder.is_library_client:
                     client = await client_holder.reload_library_client()
                 else:
                     # ... provider lookup ...
                     client = client_holder.update_provider_data(
                         {
-                            "azure_api_key": AzureEntraIDManager().access_token.get_secret_value(),
+                            "azure_api_key": azure_manager.access_token.get_secret_value(),
                             "azure_api_base": str(azure_config.get("api_base")),
                         }
                     )
scripts/llama-stack-entrypoint.sh (1)

15-19: Consider capturing enrichment failures for debugging.

The || true suppresses all errors from the enrichment script without logging them. If enrichment fails, operators won't have visibility into why.

♻️ Suggested improvement
     echo "Enriching llama-stack config..."
-    python3 /opt/app-root/llama_stack_configuration.py \
+    if ! python3 /opt/app-root/llama_stack_configuration.py \
         -c "$LIGHTSPEED_CONFIG" \
         -i "$INPUT_CONFIG" \
         -o "$ENRICHED_CONFIG" \
-        -e "$ENV_FILE" 2>&1 || true
+        -e "$ENV_FILE" 2>&1; then
+        echo "Warning: Config enrichment failed, will use original config"
+    fi
src/models/config.py (1)

1482-1494: Consider adding Field metadata for consistency.

The tenant_id, client_id, and client_secret fields lack the title and description metadata that other configuration fields in this module include. This affects API documentation and schema generation consistency.

♻️ Suggested addition of field metadata
 class AzureEntraIdConfiguration(ConfigurationBase):
     """Microsoft Entra ID authentication attributes for Azure."""

-    tenant_id: SecretStr
-    client_id: SecretStr
-    client_secret: SecretStr
+    tenant_id: SecretStr = Field(
+        ...,
+        title="Tenant ID",
+        description="Azure AD tenant ID for authentication.",
+    )
+    client_id: SecretStr = Field(
+        ...,
+        title="Client ID",
+        description="Azure AD application (client) ID.",
+    )
+    client_secret: SecretStr = Field(
+        ...,
+        title="Client Secret",
+        description="Azure AD client secret for authentication.",
+    )
     scope: str = Field(
         "https://cognitiveservices.azure.com/.default",
         title="Token scope",
         description="Azure Cognitive Services scope for token requests. "
         "Override only if using a different Azure service.",
     )
examples/lightspeed-stack-azure-entraid-service.yaml (2)

1-8: Consider adding a security note for auth_enabled: false.

This example has auth_enabled: false on line 5. While appropriate for development examples, consider adding a comment clarifying this should be enabled in production deployments.


9-17: Clarify the placeholder api_key value.

The api_key: xyzzy on line 17 is a placeholder. Since Azure Entra ID manages authentication dynamically, consider adding a comment explaining that this value is overridden at runtime by the token management system, or use a more descriptive placeholder like "<replaced-at-runtime>".

Makefile (1)

20-23: Handle missing .env file gracefully and clarify in-place config modification.

Two observations:

  1. Missing .env handling: If .env doesn't exist or AZURE_API_KEY isn't defined, grep will fail silently and AZURE_API_KEY will be empty, potentially causing runtime issues without clear error messages.

  2. In-place modification: Using the same file for -i and -o ($(LLAMA_STACK_CONFIG)) overwrites the original. This may be intentional but could surprise users.

Suggested improvement for `.env` handling
 run-llama-stack: ## Start Llama Stack with enriched config (for local service mode)
+	`@test` -f .env || (echo "Error: .env file not found" && exit 1)
 	uv run src/llama_stack_configuration.py -c $(CONFIG) -i $(LLAMA_STACK_CONFIG) -o $(LLAMA_STACK_CONFIG) && \
-	AZURE_API_KEY=$$(grep '^AZURE_API_KEY=' .env | cut -d'=' -f2-) \
+	AZURE_API_KEY=$$(grep '^AZURE_API_KEY=' .env | cut -d'=' -f2- || echo "") \
 	uv run llama stack run $(LLAMA_STACK_CONFIG)
src/app/endpoints/streaming_query.py (1)

897-901: Consider caching the AzureEntraIDManager() instance.

While AzureEntraIDManager is a singleton, calling it three times in this block is slightly verbose. Consider storing it in a local variable for readability.

Suggested improvement
         if provider_id == "azure":
+            azure_manager = AzureEntraIDManager()
             if (
-                AzureEntraIDManager().is_entra_id_configured
-                and AzureEntraIDManager().is_token_expired
+                azure_manager.is_entra_id_configured
+                and azure_manager.is_token_expired
             ):
-                await AzureEntraIDManager().refresh_token()
+                await azure_manager.refresh_token()
tests/unit/authorization/test_azure_token_manager.py (1)

122-143: Type annotation improvement and assertion placement.

  1. The caplog parameter should use pytest.LogCaptureFixture instead of Any for proper type checking.
  2. The assertion at line 143 is inside the caplog.at_level context manager - while this works, placing it outside is cleaner.
Suggested improvement
     `@pytest.mark.asyncio`
     async def test_refresh_token_failure_logs_error(
         self,
         token_manager: AzureEntraIDManager,
         dummy_config: AzureEntraIdConfiguration,
         mocker: MockerFixture,
-        caplog: Any,
+        caplog: pytest.LogCaptureFixture,
     ) -> None:
         """Log error when token retrieval fails due to ClientAuthenticationError."""
         token_manager.set_config(dummy_config)
         mock_credential_instance = mocker.Mock()
         mock_credential_instance.get_token.side_effect = ClientAuthenticationError(
             "fail"
         )
         mocker.patch(
             "authorization.azure_token_manager.ClientSecretCredential",
             return_value=mock_credential_instance,
         )

         with caplog.at_level("ERROR"):
             await token_manager.refresh_token()
-            assert "Failed to retrieve Azure access token" in caplog.text
+        assert "Failed to retrieve Azure access token" in caplog.text
tests/unit/test_llama_stack_configuration.py (1)

157-177: Multiple type: ignore comments indicate potential model validation issues.

The extensive use of # type: ignore[call-arg] suggests the Pydantic models have required fields not being provided. While this works at runtime, consider:

  1. Creating a test fixture with all required fields
  2. Using model_construct() to bypass validation for test-only scenarios

This would improve type safety and make the test more maintainable.

docker-compose.yaml (1)

11-14: Inconsistent SELinux volume mount labels.

Line 12 uses :Z (exclusive private label) for run.yaml, while line 14 uses :z (shared label) for lightspeed-stack.yaml. This inconsistency could cause issues:

  • :Z relabels the content to be private to the container
  • :z allows sharing between containers

If both services need to read lightspeed-stack.yaml, :z is correct for that file. Consider making the labels consistent based on actual sharing requirements.

src/authorization/azure_token_manager.py (2)

59-74: Consider making token retrieval truly async or documenting blocking behavior.

The refresh_token method is async but calls the synchronous _retrieve_access_token which blocks on the Azure SDK's get_token(). This blocks the event loop during token retrieval.

Options:

  1. Run the blocking call in an executor: await asyncio.to_thread(self._retrieve_access_token)
  2. Document that brief blocking is expected (token refresh is infrequent)

For a token refresh that happens every ~hour, the brief blocking may be acceptable, but worth documenting.

Option 1: Use asyncio.to_thread
+import asyncio
+
     async def refresh_token(self) -> None:
         """Refresh the cached Azure access token.

         This is async to enforce proper ordering in the event loop -
         callers must await this before using the refreshed token.

         Raises:
             ValueError: If Entra ID configuration has not been set.
         """
         if self._entra_id_config is None:
             raise ValueError("Azure Entra ID configuration not set")

         logger.info("Refreshing Azure access token")
-        token_obj = self._retrieve_access_token()
+        token_obj = await asyncio.to_thread(self._retrieve_access_token)
         if token_obj:
             self._update_access_token(token_obj.token, token_obj.expires_on)

90-100: Error logging lacks exception details for debugging.

When token retrieval fails, only a generic message is logged without the exception details. This makes troubleshooting authentication issues harder.

Include exception in log
         except (ClientAuthenticationError, CredentialUnavailableError):
-            logger.error("Failed to retrieve Azure access token")
+            logger.error("Failed to retrieve Azure access token", exc_info=True)
             return None
src/llama_stack_configuration.py (8)

82-100: Consider security implications of writing token to disk.

The Azure access token is written in plaintext to the .env file. While this enables the Llama Stack service to read it at startup, be aware that:

  1. Tokens persisted to disk may be accessed by other processes
  2. The file should have restrictive permissions (e.g., 0600)

Consider adding file permission restrictions after writing.

Proposed enhancement: Set restrictive file permissions
         with open(env_file, "w", encoding="utf-8") as f:
             f.writelines(lines)
+        
+        # Restrict file permissions to owner-only
+        os.chmod(env_file, 0o600)

         logger.info(

126-127: Docstring type doesn't match signature.

The docstring references list[ByokRag] but the actual parameter type is list[dict[str, Any]]. Update the docstring to match the implementation.

Proposed fix
     Parameters:
         ls_config (dict[str, Any]): Existing Llama Stack configuration mapping
         used as the base; existing `vector_dbs` entries are preserved if
         present.
-        byok_rag (list[ByokRag]): List of BYOK RAG definitions to be added to
+        byok_rag (list[dict[str, Any]]): List of BYOK RAG definitions to be added to
         the `vector_dbs` section.

139-140: Potential mutation of input data via reference assignment.

Assigning output = ls_config["vector_dbs"] creates a reference to the original list. Subsequent output.append() calls will modify the original ls_config["vector_dbs"] in place. Per coding guidelines, avoid in-place parameter modification; return new data structures instead.

Proposed fix: Create a copy
     # fill-in existing vector_dbs entries
     if "vector_dbs" in ls_config:
-        output = ls_config["vector_dbs"]
+        output = list(ls_config["vector_dbs"])

173-174: Docstring type doesn't match signature.

Same issue as construct_vector_dbs_section: docstring mentions list[ByokRag] but actual type is list[dict[str, Any]].

Proposed fix
-        byok_rag (list[ByokRag]): List of BYOK RAG specifications to convert
+        byok_rag (list[dict[str, Any]]): List of BYOK RAG specifications to convert
         into provider entries.

187-188: Same reference assignment issue - creates mutation side effect.

As per coding guidelines, avoid in-place parameter modification.

Proposed fix
     # fill-in existing vector_io entries
     if "providers" in ls_config and "vector_io" in ls_config["providers"]:
-        output = ls_config["providers"]["vector_io"]
+        output = list(ls_config["providers"]["vector_io"])

253-256: Missing error handling for input file operations.

If input_file doesn't exist or contains invalid YAML, the exception will propagate up unhandled. Consider wrapping in try-except for consistent error handling.

Proposed fix
     logger.info("Reading Llama Stack configuration from file %s", input_file)

-    with open(input_file, "r", encoding="utf-8") as file:
-        ls_config = yaml.safe_load(file)
+    try:
+        with open(input_file, "r", encoding="utf-8") as file:
+            ls_config = yaml.safe_load(file)
+    except FileNotFoundError:
+        logger.error("Input Llama Stack config not found: %s", input_file)
+        raise
+    except yaml.YAMLError as e:
+        logger.error("Invalid YAML in input file %s: %s", input_file, e)
+        raise

310-312: Consider using sys.exit for error conditions.

When the config file is not found, the function logs an error and returns. For CLI tools, it's more conventional to exit with a non-zero status code to signal failure to calling scripts or CI pipelines.

Proposed fix
+import sys
+
 ...
 
     except FileNotFoundError:
         logger.error("Config not found: %s", args.config)
-        return
+        sys.exit(1)

275-276: Docstring could be more descriptive.

Per coding guidelines, all functions require docstrings with brief descriptions. The current docstring "CLI entry point." is minimal. Consider expanding to describe what the CLI does.

Proposed enhancement
 def main() -> None:
-    """CLI entry point."""
+    """CLI entry point for Llama Stack configuration enrichment.
+
+    Reads Lightspeed configuration, applies Azure Entra ID token setup
+    and BYOK RAG enrichment, and writes the enriched Llama Stack config.
+    """
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c8a7ff and def9221.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (38)
  • .github/workflows/e2e_tests.yaml
  • .github/workflows/e2e_tests_providers.yaml
  • Makefile
  • README.md
  • docker-compose-library.yaml
  • docker-compose.yaml
  • docs/providers.md
  • examples/azure-run.yaml
  • examples/lightspeed-stack-azure-entraid-lib.yaml
  • examples/lightspeed-stack-azure-entraid-service.yaml
  • pyproject.toml
  • scripts/llama-stack-entrypoint.sh
  • src/app/endpoints/query.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/streaming_query.py
  • src/app/main.py
  • src/authorization/azure_token_manager.py
  • src/client.py
  • src/configuration.py
  • src/lightspeed_stack.py
  • src/llama_stack_configuration.py
  • src/models/config.py
  • test.containerfile
  • tests/e2e/configs/run-azure.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-no-cache.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-no-cache.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/e2e/features/steps/info.py
  • tests/unit/authentication/test_api_key_token.py
  • tests/unit/authorization/test_azure_token_manager.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/test_client.py
  • tests/unit/test_configuration.py
  • tests/unit/test_llama_stack_configuration.py
💤 Files with no reviewable changes (6)
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-no-cache.yaml
  • src/app/endpoints/query_v2.py
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-no-cache.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml
🧰 Additional context used
📓 Path-based instructions (8)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Use FastAPI dependencies: from fastapi import APIRouter, HTTPException, Request, status, Depends
Use Llama Stack imports: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = logging.getLogger(__name__) pattern for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types, using typing_extensions.Self for model validators
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional parameters
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Complete type annotations for all class attributes
Follow Google Python docstring conventions (https://google.github.io/styleguide/pyguide.html) with sections: Args, Returns, Raises, Attributes
Run uv run make format to auto-format code with black and ruff before completion
Run uv run make verify to run all linters (black, pyl...

Files:

  • src/models/config.py
  • src/configuration.py
  • tests/unit/authentication/test_api_key_token.py
  • tests/unit/test_client.py
  • tests/e2e/features/steps/info.py
  • tests/unit/authorization/test_azure_token_manager.py
  • src/app/endpoints/streaming_query.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/test_configuration.py
  • tests/unit/test_llama_stack_configuration.py
  • src/lightspeed_stack.py
  • src/app/endpoints/query.py
  • src/client.py
  • src/authorization/azure_token_manager.py
  • src/app/main.py
  • src/llama_stack_configuration.py
src/models/config.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/config.py: All config uses Pydantic models extending ConfigurationBase
Configuration base class sets extra="forbid" to reject unknown fields
Use type hints: Optional[FilePath], PositiveInt, SecretStr for configuration fields
Extend ConfigurationBase for configuration Pydantic models

Files:

  • src/models/config.py
src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Extend BaseModel for data Pydantic models
Use @model_validator and @field_validator for Pydantic model validation

Files:

  • src/models/config.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest for all unit and integration tests, not unittest
Use pytest-mock for AsyncMock objects in tests
Use MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") pattern for authentication mocks in tests

Files:

  • tests/unit/authentication/test_api_key_token.py
  • tests/unit/test_client.py
  • tests/e2e/features/steps/info.py
  • tests/unit/authorization/test_azure_token_manager.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/test_configuration.py
  • tests/unit/test_llama_stack_configuration.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/unit/**/*.py: Unit tests require 60% code coverage
Write unit tests covering new functionality before completion

Files:

  • tests/unit/authentication/test_api_key_token.py
  • tests/unit/test_client.py
  • tests/unit/authorization/test_azure_token_manager.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/test_configuration.py
  • tests/unit/test_llama_stack_configuration.py
tests/e2e/features/steps/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define E2E test steps in tests/e2e/features/steps/ directory

Files:

  • tests/e2e/features/steps/info.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/query.py
  • src/app/main.py
pyproject.toml

📄 CodeRabbit inference engine (CLAUDE.md)

pyproject.toml: Check pyproject.toml for supported Python versions before development
Always check pyproject.toml for existing dependencies before adding new ones
Always verify current library versions in pyproject.toml rather than assuming versions
Use pylint with source-roots = "src" configuration

Files:

  • pyproject.toml
🧠 Learnings (13)
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
📚 Learning: 2025-08-19T08:57:27.714Z
Learnt from: onmete
Repo: lightspeed-core/lightspeed-stack PR: 417
File: src/lightspeed_stack.py:60-63
Timestamp: 2025-08-19T08:57:27.714Z
Learning: In the lightspeed-stack project, file permission hardening (chmod 0o600) for stored configuration JSON files is not required as it's not considered a security concern in their deployment environment.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`

Applied to files:

  • src/models/config.py
  • src/configuration.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/**/*.py : Use `MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")` pattern for authentication mocks in tests

Applied to files:

  • tests/unit/authentication/test_api_key_token.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Use Llama Stack imports: `from llama_stack_client import AsyncLlamaStackClient`

Applied to files:

  • tests/unit/test_client.py
  • test.containerfile
  • src/app/endpoints/streaming_query.py
  • src/lightspeed_stack.py
  • src/client.py
📚 Learning: 2025-08-18T10:56:55.349Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:0-0
Timestamp: 2025-08-18T10:56:55.349Z
Learning: The lightspeed-stack project intentionally uses a "generic image" approach, bundling many dependencies directly in the base runtime image to work for everyone, rather than using lean base images with optional dependency groups.

Applied to files:

  • test.containerfile
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.

Applied to files:

  • examples/azure-run.yaml
  • src/lightspeed_stack.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Handle `APIConnectionError` from Llama Stack in error handling

Applied to files:

  • src/lightspeed_stack.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/e2e/test_list.txt : Maintain test list in `tests/e2e/test_list.txt`

Applied to files:

  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
  • .github/workflows/e2e_tests.yaml
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to pyproject.toml : Always check `pyproject.toml` for existing dependencies before adding new ones

Applied to files:

  • pyproject.toml
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Never commit secrets/keys; use environment variables for sensitive data

Applied to files:

  • src/authorization/azure_token_manager.py
🧬 Code graph analysis (11)
src/models/config.py (1)
src/configuration.py (1)
  • azure_entra_id (345-349)
src/configuration.py (1)
src/models/config.py (1)
  • AzureEntraIdConfiguration (1482-1493)
tests/unit/test_client.py (2)
src/client.py (5)
  • load (32-40)
  • get_client (101-115)
  • is_library_client (28-30)
  • update_provider_data (133-164)
  • reload_library_client (117-131)
src/utils/types.py (1)
  • Singleton (43-57)
tests/unit/authorization/test_azure_token_manager.py (3)
src/authorization/azure_token_manager.py (7)
  • AzureEntraIDManager (22-100)
  • access_token (55-57)
  • is_token_expired (50-52)
  • is_entra_id_configured (45-47)
  • set_config (39-42)
  • refresh_token (59-74)
  • _update_access_token (76-83)
src/configuration.py (1)
  • configuration (95-106)
src/models/config.py (1)
  • AzureEntraIdConfiguration (1482-1493)
src/app/endpoints/streaming_query.py (2)
src/authorization/azure_token_manager.py (5)
  • AzureEntraIDManager (22-100)
  • is_entra_id_configured (45-47)
  • is_token_expired (50-52)
  • refresh_token (59-74)
  • access_token (55-57)
src/client.py (5)
  • AsyncLlamaStackClientHolder (21-164)
  • get_client (101-115)
  • is_library_client (28-30)
  • reload_library_client (117-131)
  • update_provider_data (133-164)
tests/unit/test_configuration.py (1)
src/configuration.py (2)
  • load_configuration (65-74)
  • azure_entra_id (345-349)
tests/unit/test_llama_stack_configuration.py (1)
src/llama_stack_configuration.py (3)
  • generate_configuration (239-267)
  • construct_vector_dbs_section (115-157)
  • construct_vector_io_providers_section (160-210)
src/app/endpoints/query.py (2)
src/authorization/azure_token_manager.py (4)
  • is_entra_id_configured (45-47)
  • is_token_expired (50-52)
  • refresh_token (59-74)
  • access_token (55-57)
src/client.py (4)
  • get_client (101-115)
  • is_library_client (28-30)
  • reload_library_client (117-131)
  • update_provider_data (133-164)
src/authorization/azure_token_manager.py (3)
src/configuration.py (1)
  • configuration (95-106)
src/models/config.py (1)
  • AzureEntraIdConfiguration (1482-1493)
src/utils/types.py (1)
  • Singleton (43-57)
src/app/main.py (6)
src/authorization/azure_token_manager.py (4)
  • AzureEntraIDManager (22-100)
  • set_config (39-42)
  • refresh_token (59-74)
  • access_token (55-57)
src/app/database.py (1)
  • initialize_database (102-136)
src/configuration.py (2)
  • configuration (95-106)
  • azure_entra_id (345-349)
src/log.py (1)
  • get_logger (7-25)
src/a2a_storage/storage_factory.py (1)
  • A2AStorageFactory (20-185)
src/models/responses.py (1)
  • InternalServerErrorResponse (1787-1945)
src/llama_stack_configuration.py (2)
src/models/config.py (3)
  • config (325-342)
  • config (1306-1314)
  • dump (1607-1619)
src/lightspeed_stack.py (1)
  • main (69-116)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🔇 Additional comments (55)
tests/unit/authentication/test_api_key_token.py (1)

3-6: LGTM!

The added blank line after the module docstring improves formatting consistency with PEP 8 style conventions.

tests/e2e/configs/run-azure.yaml (2)

79-82: Verify: Should this config use the new Azure Entra ID credentials?

This e2e test configuration still uses ${env.AZURE_API_KEY} for the Azure inference provider. Given that the PR introduces Azure Entra ID token management with TENANT_ID, CLIENT_ID, and CLIENT_SECRET, please confirm whether this configuration should be updated to use the new credential pattern or if AZURE_API_KEY is intentionally retained for backward compatibility testing.


90-90: LGTM!

The whitespace normalization (blank line after provider entry and trailing newline) improves file formatting consistency.

Also applies to: 156-156

tests/e2e/configuration/server-mode/lightspeed-stack.yaml (1)

19-20: LGTM!

No functional changes; the file formatting is normalized with a proper trailing newline.

tests/e2e/configuration/library-mode/lightspeed-stack.yaml (1)

18-19: LGTM!

Trailing newline added for proper file formatting. No functional changes.

tests/e2e/features/steps/info.py (1)

113-116: Hardcoded value reduces test flexibility and creates inconsistency.

The check_shield_structure function now uses a hardcoded "openai/gpt-4o-mini" for provider_resource_id, while check_model_structure (lines 49-58) dynamically uses context.default_model and context.default_provider. This inconsistency could cause test failures if the test environment configuration changes.

Consider using the context variables for consistency:

♻️ Suggested fix
+    expected_model = f"{context.default_provider}/{context.default_model}"
-    assert found_shield["provider_resource_id"] == "openai/gpt-4o-mini", (
-        f"provider_resource_id should be 'openai/gpt-4o-mini', "
+    assert found_shield["provider_resource_id"] == expected_model, (
+        f"provider_resource_id should be '{expected_model}', "
         f"but is '{found_shield['provider_resource_id']}'"
     )

Was this hardcoding intentional to stabilize tests during the Azure credential migration? If so, please add a comment explaining the rationale.

tests/unit/models/config/test_dump_configuration.py (7)

103-103: LGTM!

The assertion correctly verifies that the new azure_entra_id field is present in the serialized configuration output.


205-206: LGTM!

The expected JSON correctly includes "azure_entra_id": None, aligning with the new optional configuration field and maintaining consistency with the assertion on line 103.


415-415: LGTM!

Consistent assertion for the new field presence in the quota limiters test.


532-533: LGTM!

Expected JSON correctly updated for the quota limiters with different values test.


749-750: LGTM!

Expected JSON correctly updated for the quota limiters with different scheduler values test.


940-941: LGTM!

Expected JSON correctly updated for the BYOK RAG configuration test.


1117-1118: LGTM!

Expected JSON correctly updated for the PostgreSQL namespace configuration test.

src/app/main.py (1)

12-12: LGTM!

Import added for the new Azure Entra ID token management functionality.

docker-compose-library.yaml (1)

23-26: LGTM with a naming consideration.

The migration from AZURE_API_KEY to separate Entra ID credentials (TENANT_ID, CLIENT_ID, CLIENT_SECRET) correctly supports dynamic token management. The comment on line 23 helpfully explains the new flow.

Consider using more specific names like AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET to avoid potential conflicts with other services or providers that might use similar generic names.

Verify that the configuration loading code expects these exact variable names and update consistently if renamed.

README.md (1)

179-180: LGTM!

Documentation updated to reflect Azure model compatibility with the new Entra ID authentication flow. The table formatting is consistent with other provider entries.

.github/workflows/e2e_tests.yaml (1)

63-63: LGTM!

The whitespace adjustment and the updated configuration source message now correctly reflect the environment-specific run YAML selection (run-${{ matrix.environment }}.yaml), aligning with the existing config selection logic.

Also applies to: 103-103

examples/lightspeed-stack-azure-entraid-lib.yaml (1)

1-30: LGTM!

This example configuration correctly demonstrates the Azure Entra ID integration for library mode, with proper environment variable references for credentials. The structure aligns with the new AzureEntraIdConfiguration model.

src/app/endpoints/query.py (2)

72-73: LGTM!

The import of AzureEntraIDManager follows the absolute import pattern as per coding guidelines.


310-311: LGTM!

The refactored client acquisition via AsyncLlamaStackClientHolder properly supports the dynamic token refresh workflow.

scripts/llama-stack-entrypoint.sh (1)

21-25: LGTM!

The .env sourcing with set -a / set +a correctly exports all variables, and the shellcheck directive is appropriately applied.

src/models/config.py (1)

1605-1605: LGTM!

The azure_entra_id field is correctly added as an optional configuration section, consistent with other optional sections like authorization and customization. Based on learnings, all config uses Pydantic models extending ConfigurationBase, which is satisfied.

test.containerfile (3)

2-2: LGTM!

Base image update is consistent with the project's image versioning approach.


6-11: LGTM!

Adding azure-identity dependency aligns with the Azure Entra ID token management feature. The directory setup and ownership configuration follow best practices for rootless container execution.


13-22: No action needed. The llama_stack_configuration.py script is invoked explicitly with the python3 interpreter in the entrypoint script (python3 /opt/app-root/llama_stack_configuration.py), so a shebang line is not required. The chmod +x is unnecessary but harmless since the file is never executed directly as a standalone executable.

src/configuration.py (2)

14-14: LGTM!

Import follows the existing pattern for configuration model imports.


344-349: LGTM!

The new azure_entra_id property follows the established pattern of other configuration accessors in this class. The Optional return type correctly reflects that Azure Entra ID configuration may not be present. The guard clause appropriately raises LogicError when configuration hasn't been loaded.

src/lightspeed_stack.py (1)

108-116: LGTM!

The core startup flow (environment setup, quota scheduler, and Uvicorn) is clean and correct. The comment clarification on line 109 improves readability.

examples/lightspeed-stack-azure-entraid-service.yaml (1)

27-30: LGTM!

The azure_entra_id section correctly wires credentials to environment variables using the ${env.VAR_NAME} syntax, consistent with Llama Stack configuration patterns.

Makefile (1)

13-18: LGTM!

Good use of ?= for configuration variables allowing command-line overrides. The run target update correctly passes the config file path.

tests/unit/test_client.py (3)

13-16: Good practice: Singleton reset fixture for test isolation.

The autouse fixture properly resets singleton state between tests, preventing test pollution.


86-131: Test coverage for update_provider_data looks good.

The test properly validates:

  • Pre-populating existing provider data
  • Merging new updates while preserving existing fields
  • Client replacement behavior

One note: accessing _custom_headers (line 102) is a private attribute access. Consider if there's a public way to set initial headers, though for test setup this is often acceptable.


133-152: Test coverage for reload_library_client is adequate.

The test verifies that:

  • Library mode is correctly identified
  • Reloading returns a new client instance
  • The holder is updated with the new client
tests/unit/test_configuration.py (2)

812-846: Good test coverage for Azure Entra ID configuration.

The test properly validates that all three required fields (tenant_id, client_id, client_secret) are loaded and accessible via get_secret_value().


849-877: Good validation error test for incomplete Azure Entra ID config.

This test ensures that partial configurations properly raise ValidationError, preventing misconfiguration at runtime.

src/app/endpoints/streaming_query.py (1)

52-52: Import follows project conventions.

The Azure token manager import aligns with the codebase's import style.

.github/workflows/e2e_tests_providers.yaml (2)

55-68: Azure Entra ID config injection step looks correct.

The script properly:

  • Only runs when matrix.environment == 'azure'
  • Uses proper escaping (\${env.VAR}) to preserve YAML env var syntax
  • Provides feedback on each modified file

190-192: Credential environment variables properly added.

The TENANT_ID, CLIENT_ID, and CLIENT_SECRET environment variables are correctly propagated to both server and library mode runs.

Also applies to: 221-223

docs/providers.md (1)

68-177: Comprehensive Azure Entra ID documentation.

The documentation covers the authentication flow thoroughly:

  • Configuration requirements with clear attribute table
  • Critical note about the ${env.AZURE_API_KEY} placeholder requirement
  • Token lifecycle for both library and service modes
  • Security considerations
tests/unit/authorization/test_azure_token_manager.py (2)

21-41: Well-structured test fixtures.

The fixtures properly handle singleton reset with autouse=True, ensuring test isolation. The dummy_config fixture provides a clean configuration object for testing.


145-158: Good dynamic expiration testing.

The time mocking approach properly validates the is_token_expired property behavior across time boundaries.

tests/unit/test_llama_stack_configuration.py (2)

28-83: Comprehensive vector_dbs section tests.

Tests properly cover empty input, preserving existing entries, adding new entries, default values, and merge behavior. Good coverage of the construct_vector_dbs_section function.


90-128: Good coverage for vector_io providers section.

Tests cover the key scenarios: empty config, preserving existing entries, adding new entries, and default provider type fallback.

src/client.py (4)

32-41: Clean separation of library vs service client initialization.

The early return pattern and clear branching between library and service modes is well-structured.


117-131: Library client reload implementation looks correct.

The reload_library_client properly reuses the stored _config_path to reinitialize the library client, ensuring env var changes (like refreshed tokens) are picked up.


133-164: Provider data update handles edge cases well.

The update_provider_data method properly handles:

  • Missing headers
  • Invalid JSON in existing header
  • Merging new values with existing provider data

This aligns with the Azure token refresh flow for service mode clients.


72-99: Use unique temporary filenames or document library-mode single-worker requirement.

The enriched config is written to a fixed path in the temp directory. If library mode is used with multiple workers (despite examples showing workers: 1), concurrent processes could overwrite each other's config files. Consider using tempfile.NamedTemporaryFile(delete=False) for unique filenames, or add validation to prevent library mode with workers > 1 and document this constraint.

docker-compose.yaml (1)

21-24: Azure Entra ID credential environment variables look correct.

The switch from AZURE_API_KEY to individual credential components (TENANT_ID, CLIENT_ID, CLIENT_SECRET) properly supports the new token management flow where the API key is obtained dynamically via client credentials grant.

Also applies to: 64-67

src/authorization/azure_token_manager.py (2)

34-43: Clean initialization and configuration setup.

The initialization properly sets default state, and set_config correctly stores the configuration with debug logging.


44-57: Properties are well-implemented.

The expiration check correctly handles both uninitialized state (_expires_on == 0) and actual expiry. Using SecretStr for the access token ensures it won't be accidentally logged.

examples/azure-run.yaml (2)

114-136: Well-structured storage configuration.

The storage section properly separates backends (kv_sqlite, sql_sqlite) from stores, with clear namespace isolation. The inference store configuration with max_write_queue_size: 10000 and num_writers: 4 provides reasonable defaults for handling concurrent writes.


101-110: LGTM!

The registered resources section provides a clean starter configuration with the rag-runtime tool group properly registered. Empty lists allow users to add their own resources as needed.

src/llama_stack_configuration.py (3)

1-19: LGTM! Module setup follows coding guidelines.

The imports use absolute paths, the logger is configured correctly using logging.getLogger(__name__), and the module docstring explains both usage modes.


46-48: Add return type annotation.

The function signature is missing the -> None return type annotation required by coding guidelines.

Proposed fix
 def setup_azure_entra_id_token(
     azure_config: dict[str, Any] | None, env_file: str
-) -> None:
+) -> None:

Actually, reviewing again - the return type is present. Disregard.


213-231: LGTM! In-place modification is documented.

The docstring clearly states ls_config is "modified in place", making the mutation behavior explicit and intentional. This is acceptable when documented.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +87 to +89
tenant_id: ${env.AZURE_TENANT_ID}
client_id: ${env.AZURE_CLIENT_ID}
client_secret: ${env.AZURE_CLIENT_SECRET}
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor inconsistency in environment variable naming.

The YAML config example (lines 87-89) uses AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, but the bash export example (lines 152-154) uses TENANT_ID, CLIENT_ID, CLIENT_SECRET without the AZURE_ prefix.

Consider aligning these for consistency, or add a note explaining the difference.

Also applies to: 151-155

🤖 Prompt for AI Agents
In `@docs/providers.md` around lines 87 - 89, The YAML example uses
AZURE_TENANT_ID / AZURE_CLIENT_ID / AZURE_CLIENT_SECRET but the bash export
example uses TENANT_ID / CLIENT_ID / CLIENT_SECRET; make them consistent by
renaming the bash exports to AZURE_TENANT_ID, AZURE_CLIENT_ID, and
AZURE_CLIENT_SECRET (or alternately rename the YAML keys to the non-prefixed
names) and update all occurrences (including the other instance noted) or add a
brief note explaining the intentional difference so readers aren’t confused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asimurka I'll help you fix this inconsistency. Based on my analysis of the codebase, the actual configuration files (docker-compose.yaml, workflow files, and example configs) all use TENANT_ID, CLIENT_ID, and CLIENT_SECRET without the AZURE_ prefix. The documentation example should be updated to match this convention.

[plan_code_changes]

Fix environment variable naming inconsistency in Azure Entra ID documentation

The YAML configuration example needs to be updated to use the same environment variable names as the rest of the codebase: TENANT_ID, CLIENT_ID, and CLIENT_SECRET (without the AZURE_ prefix).

Note: While Microsoft's official Azure Identity SDK documentation uses the AZURE_ prefix convention (AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET), your codebase has consistently adopted the shorter naming convention without the prefix across all configuration files. This change maintains consistency within your project.

Comment on lines 71 to +77
inference:
- provider_id: azure
provider_type: remote::azure
config:
api_key: ${env.AZURE_API_KEY}
- config:
api_base: https://ols-test.openai.azure.com/
api_version: 2024-02-15-preview
api_type: ${env.AZURE_API_TYPE:=}
post_training:
- provider_id: huggingface
provider_type: inline::huggingface-gpu
config:
checkpoint_format: huggingface
device: cpu
distributed_backend: null
dpo_output_dir: "."
api_key: ${env.AZURE_API_KEY}
api_version: 2025-01-01-preview
provider_id: azure
provider_type: remote::azure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Azure OpenAI API version 2025-01-01-preview latest supported version

💡 Result:

No — 2025-01-01-preview is not the latest. The Azure OpenAI docs show a more recent data‑plane preview (2025-04-01-preview) and newer control‑plane versions (preview 2025-07-01-preview, GA 2025-06-01). (learn.microsoft.com)

Citations:


Update Azure OpenAI API version to a current release.

The inference provider correctly uses environment variable substitution ${env.AZURE_API_KEY} for the API key, aligning with the Azure Entra ID token management flow. However, the API version 2025-01-01-preview is outdated. Current Azure OpenAI versions include 2025-04-01-preview (data-plane) and 2025-06-01 (control-plane GA). Update to a more recent version to benefit from latest features and stability improvements.

🤖 Prompt for AI Agents
In `@examples/azure-run.yaml` around lines 71 - 77, The YAML uses an outdated
Azure OpenAI api_version value; update the api_version field under the inference
-> config block (the entry with provider_id: azure and provider_type:
remote::azure) to a current release—e.g., set api_version to
"2025-04-01-preview" for the data-plane or "2025-06-01" if you need
control-plane GA features—so the inference provider uses a supported API
version.

Comment on lines +7 to +10
INPUT_CONFIG="${LLAMA_STACK_CONFIG:-/opt/app-root/run.yaml}"
ENRICHED_CONFIG="/opt/app-root/run.yaml"
LIGHTSPEED_CONFIG="${LIGHTSPEED_CONFIG:-/opt/app-root/lightspeed-stack.yaml}"
ENV_FILE="/opt/app-root/.env"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential path collision between INPUT_CONFIG and ENRICHED_CONFIG.

When LLAMA_STACK_CONFIG is not set, both INPUT_CONFIG and ENRICHED_CONFIG default to /opt/app-root/run.yaml. If the Python enrichment script fails (suppressed by || true on line 19), the check on line 27 (if [ -f "$ENRICHED_CONFIG" ]) will still succeed because the original input file exists at that path.

This means the script may incorrectly report "Using enriched config" even when enrichment failed.

🔧 Suggested fix using a distinct enriched config path
 INPUT_CONFIG="${LLAMA_STACK_CONFIG:-/opt/app-root/run.yaml}"
-ENRICHED_CONFIG="/opt/app-root/run.yaml"
+ENRICHED_CONFIG="/opt/app-root/run-enriched.yaml"
 LIGHTSPEED_CONFIG="${LIGHTSPEED_CONFIG:-/opt/app-root/lightspeed-stack.yaml}"
 ENV_FILE="/opt/app-root/.env"

Alternatively, you could check the enrichment script's exit code before deciding which config to use.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
INPUT_CONFIG="${LLAMA_STACK_CONFIG:-/opt/app-root/run.yaml}"
ENRICHED_CONFIG="/opt/app-root/run.yaml"
LIGHTSPEED_CONFIG="${LIGHTSPEED_CONFIG:-/opt/app-root/lightspeed-stack.yaml}"
ENV_FILE="/opt/app-root/.env"
INPUT_CONFIG="${LLAMA_STACK_CONFIG:-/opt/app-root/run.yaml}"
ENRICHED_CONFIG="/opt/app-root/run-enriched.yaml"
LIGHTSPEED_CONFIG="${LIGHTSPEED_CONFIG:-/opt/app-root/lightspeed-stack.yaml}"
ENV_FILE="/opt/app-root/.env"
🤖 Prompt for AI Agents
In `@scripts/llama-stack-entrypoint.sh` around lines 7 - 10, The issue is that
INPUT_CONFIG and ENRICHED_CONFIG can point to the same file so the existence
check for ENRICHED_CONFIG can be true even when enrichment failed; update the
script so ENRICHED_CONFIG uses a distinct default path (e.g., a different
filename) or capture the enrichment script's exit status instead of suppressing
errors with "|| true" and use that exit code to decide whether to treat the
output as enriched; refer to the variables INPUT_CONFIG and ENRICHED_CONFIG and
the enrichment invocation (the command currently followed by "|| true") and the
conditional "if [ -f \"$ENRICHED_CONFIG\" ]" when making the change.

Comment on lines +319 to +339
if provider_id == "azure":
if (
AzureEntraIDManager().is_entra_id_configured
and AzureEntraIDManager().is_token_expired
):
await AzureEntraIDManager().refresh_token()

if client_holder.is_library_client:
client = await client_holder.reload_library_client()
else:
azure_config = next(
p.config
for p in await client.providers.list()
if p.provider_type == "remote::azure"
)
client = client_holder.update_provider_data(
{
"azure_api_key": AzureEntraIDManager().access_token.get_secret_value(),
"azure_api_base": str(azure_config.get("api_base")),
}
)
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential StopIteration exception if Azure provider is not found.

The next() call on line 329-333 will raise StopIteration if no provider with provider_type == "remote::azure" exists in the providers list. This could occur if the Azure provider is configured differently or the provider list is unexpectedly empty.

Additionally, str(azure_config.get("api_base")) on line 337 will produce the string "None" if api_base is missing from the config.

🔧 Suggested fix with default handling
                 else:
-                    azure_config = next(
-                        p.config
-                        for p in await client.providers.list()
-                        if p.provider_type == "remote::azure"
-                    )
+                    azure_provider = next(
+                        (p for p in await client.providers.list()
+                         if p.provider_type == "remote::azure"),
+                        None,
+                    )
+                    if azure_provider is None:
+                        logger.warning(
+                            "Azure provider not found in providers list; skipping token update"
+                        )
+                    else:
+                        azure_config = azure_provider.config
+                        api_base = azure_config.get("api_base")
+                        if not api_base:
+                            logger.error("Azure provider config missing api_base")
+                            raise HTTPException(
+                                status_code=500,
+                                detail="Azure provider configuration missing api_base",
+                            )
-                    client = client_holder.update_provider_data(
-                        {
-                            "azure_api_key": AzureEntraIDManager().access_token.get_secret_value(),
-                            "azure_api_base": str(azure_config.get("api_base")),
-                        }
-                    )
+                        client = client_holder.update_provider_data(
+                            {
+                                "azure_api_key": AzureEntraIDManager().access_token.get_secret_value(),
+                                "azure_api_base": str(api_base),
+                            }
+                        )
🤖 Prompt for AI Agents
In `@src/app/endpoints/query.py` around lines 319 - 339, The current code uses
next(...) to find azure_config and will raise StopIteration if no provider with
provider_type == "remote::azure" exists and may convert a missing api_base into
the string "None"; change the lookup to a safe search (e.g., use next((p.config
for p in await client.providers.list() if p.provider_type == "remote::azure"),
None) or explicitly iterate and return None if not found), then check that
azure_config is not None before calling client_holder.update_provider_data
(handle the missing provider case appropriately), and when building
azure_api_base use a safe default instead of str(None) (e.g., use
azure_config.get("api_base") or "" only if not None) so you never store the
literal "None" string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will never happen in practice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asimurka, understood. I've noted that in your deployment context, the Azure provider is guaranteed to be present when provider_id == "azure", so the StopIteration scenario won't occur in practice.


✏️ Learnings added
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack codebase, when provider_id is "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list, so defensive checks for StopIteration when using next() to find the Azure provider are not required.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines +106 to 108
except (ClientAuthenticationError, CredentialUnavailableError) as e:
logger.error("Azure Entra ID: Failed to generate token: %s", e)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Authentication failures are silently swallowed.

When ClientAuthenticationError or CredentialUnavailableError occurs, the function logs an error but continues execution. Callers have no way to know authentication failed, which may lead to runtime failures later when the missing AZURE_API_KEY causes API calls to fail.

Consider either raising the exception or returning a boolean status to indicate success/failure.

Proposed fix: Re-raise after logging
     except (ClientAuthenticationError, CredentialUnavailableError) as e:
         logger.error("Azure Entra ID: Failed to generate token: %s", e)
+        raise
🤖 Prompt for AI Agents
In `@src/llama_stack_configuration.py` around lines 106 - 108, The except block
that catches ClientAuthenticationError and CredentialUnavailableError (the Azure
Entra ID token generation block that currently calls logger.error("Azure Entra
ID: Failed to generate token: %s", e)) must not silently swallow failures; after
logging the error re-raise the caught exception (raise) so callers can handle or
fail fast. Locate the except handling those two exceptions in the token
generation function and append a re-raise (or alternatively change the function
to return an explicit failure status and propagate that to callers) so
authentication failures are observable to callers.

Comment on lines +146 to +148
"vector_db_id": brag.get("vector_db_id"),
"provider_id": "byok_" + brag.get("vector_db_id", ""),
"embedding_model": brag.get("embedding_model", "all-MiniLM-L6-v2"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing vector_db_id results in malformed provider_id.

If brag.get("vector_db_id") returns None, the vector_db_id field will be None and provider_id will be "byok_" (empty suffix). Consider adding validation or using a more defensive default.

Proposed fix: Add validation
     for brag in byok_rag:
+        vector_db_id = brag.get("vector_db_id")
+        if not vector_db_id:
+            logger.warning("BYOK RAG entry missing vector_db_id, skipping")
+            continue
         output.append(
             {
-                "vector_db_id": brag.get("vector_db_id"),
-                "provider_id": "byok_" + brag.get("vector_db_id", ""),
+                "vector_db_id": vector_db_id,
+                "provider_id": f"byok_{vector_db_id}",
                 "embedding_model": brag.get("embedding_model", "all-MiniLM-L6-v2"),
                 "embedding_dimension": brag.get("embedding_dimension", 384),
             }
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"vector_db_id": brag.get("vector_db_id"),
"provider_id": "byok_" + brag.get("vector_db_id", ""),
"embedding_model": brag.get("embedding_model", "all-MiniLM-L6-v2"),
for brag in byok_rag:
vector_db_id = brag.get("vector_db_id")
if not vector_db_id:
logger.warning("BYOK RAG entry missing vector_db_id, skipping")
continue
output.append(
{
"vector_db_id": vector_db_id,
"provider_id": f"byok_{vector_db_id}",
"embedding_model": brag.get("embedding_model", "all-MiniLM-L6-v2"),
"embedding_dimension": brag.get("embedding_dimension", 384),
}
)
🤖 Prompt for AI Agents
In `@src/llama_stack_configuration.py` around lines 146 - 148, The code builds
provider_id using brag.get("vector_db_id") which can be None and yield a
malformed "byok_" value; update the logic around the vector_db_id/provider_id
assignment (where "vector_db_id", "provider_id", "embedding_model" are set) to
validate or provide a safe default for vector_db_id (e.g., use empty-string-safe
fallback or a sentinel like "unknown") and then construct provider_id from that
validated value (or skip/omit provider_id when vector_db_id is absent) so
provider_id is never "byok_" with an empty suffix.

# Switch back to the original user
USER 1001

ENTRYPOINT ["/opt/app-root/enrich-entrypoint.sh"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First enrich the LLS config, generate initial access token (required for the models.list), only after that start the service itself.

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good in overall. I'll add some comments later, but LGTM

for brag in byok_rag:
output.append(
{
"vector_db_id": brag.vector_db_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Right now the registration of vector_stores from config is not working properly. AFAIK it should but it doesn't, since llama-stack picks up the registration of the vector store from the metadata store

for brag in byok_rag:
output.append(
{
"provider_id": "byok_" + brag.vector_db_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has also changed in LLS 0.3.x

  vector_io:
  - config: # Define the storage backend for RAG
      persistence:
        namespace: vector_io::faiss
        backend: kv_default
    provider_id: faiss
    provider_type: inline::faiss

this is the config for vector_io, then you need to register a storage backend:

storage:
  backends:
    kv_default: # Define the storage backend type for RAG, in this case registry and RAG are unified i.e. information on registered resources (e.g. models, vector_stores) are saved together with the RAG chunks
      type: kv_sqlite
      db_path: ~/.llama/storage/rag/kv_store.db

"urllib3==2.6.2",
# Used for agent card configuration
"PyYAML>=6.0.0",
"azure-core",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to add a comment as other deps

@are-ces
Copy link
Contributor

are-ces commented Jan 14, 2026

LGTM overall, I have added some comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants