{AKS} feat(aks-agent): support cluster and client mode #9522

mainred · 2026-01-13T12:48:05Z

This checklist is used to make sure that common guidelines for a pull request are followed.

Related command

General Guidelines

Have you run azdev style <YOUR_EXT> locally? (pip install azdev required)
Have you run python scripts/ci/test_index.py -q locally? (pip install wheel==0.30.0 required)
My extension version conforms to the Extension version schema

For new extensions:

My extension description/summary conforms to the Extension Summary Guidelines.

About Extension Publish

There is a pipeline to automatically build, upload and publish extension wheels.
Once your pull request is merged into main branch, a new pull request will be created to update src/index.json automatically.
You only need to update the version information in file setup.py and historical information in file HISTORY.rst in your PR but do not modify src/index.json.

azure-client-tools-bot-prd · 2026-01-13T12:48:12Z

⚠️Azure CLI Extensions Breaking Change Test

⚠️aks-agent

rule cmd_name rule_message suggest_message

⚠️ 1006 - ParaAdd aks agent cmd aks agent added parameter mode

⚠️ 1006 - ParaAdd aks agent cmd aks agent added parameter namespace

⚠️ 1006 - ParaAdd aks agent-cleanup cmd aks agent-cleanup added parameter mode

⚠️ 1006 - ParaAdd aks agent-cleanup cmd aks agent-cleanup added parameter namespace

yonzhan · 2026-01-13T12:48:16Z

Thank you for your contribution! We will review the pull request and get back to you soon.

github-actions · 2026-01-13T12:48:17Z

The git hooks are available for azure-cli and azure-cli-extensions repos. They could help you run required checks before creating the PR.

Please sync the latest code with latest dev branch (for azure-cli) or main branch (for azure-cli-extensions).
After that please run the following commands to enable git hooks:

pip install azdev --upgrade
azdev setup -c <your azure-cli repo path> -r <your azure-cli-extensions repo path>

github-actions · 2026-01-13T12:51:49Z

For more info about extension versioning, please refer to Extension version schema

Copilot

Pull request overview

This pull request adds support for two deployment modes to the AKS agent extension: cluster mode (existing Helm-based deployment) and local mode (new Docker container-based local execution). The changes introduce mode selection during initialization, refactor the agent manager classes using an abstract base class, and add comprehensive telemetry tracking for different event types.

Changes:

Added interactive mode selection during az aks agent-init allowing users to choose between cluster mode and local mode
Introduced AKSAgentManagerLocal class for Docker-based local execution with local file-based configuration storage
Refactored code using abstract base class AKSAgentManagerLLMConfigBase to support both deployment modes
Modified namespace handling to require explicit namespace specification for cluster mode operations
Enhanced telemetry to track init, cleanup, and startup events with mode information

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
setup.py	Version bump from 1.0.0b14 to 1.0.0b15
custom.py	Major refactoring to support mode selection, separate LLM config and helm deployment setup, add local mode status checking
aks_agent_manager.py	Introduced abstract base class, refactored AKSAgentManager with breaking constructor changes, added AKSAgentManagerLocal class for Docker execution
llm_config_manager.py	Added LLMConfigManagerLocal class for file-based LLM configuration management
telemetry.py	Enhanced to track different event types (init, cleanup, startup) and mode information
_params.py	Added namespace and mode parameters for agent and cleanup commands
_help.py	Updated help text to document new mode parameter and local mode requirements
_consts.py	Added CONFIG_DIR constant for local configuration storage
aks.py	Improved kubelogin command to explicitly specify kubeconfig path
init.py	Exported AKSAgentManagerLocal class
test_aks_agent_manager.py	Removed deprecated test for set_aks_context method
README.rst	Updated documentation to explain both deployment modes
HISTORY.rst	Added release notes for version 1.0.0b15

Comments suppressed due to low confidence (1)

src/aks-agent/azext_aks_agent/agent/k8s/aks_agent_manager.py:1161

The newly introduced AKSAgentManagerLocal class (lines 905-1161) lacks test coverage. Given that this file has comprehensive test coverage for AKSAgentManager, similar tests should be added for the local mode implementation to verify Docker execution, configuration file handling, and error scenarios.

        # Create cluster-specific config directory to match LLMConfigManagerLocal
        self.config_dir = self.base_config_dir / subscription_id / resource_group_name / cluster_name

        # Docker image for client mode execution
        # TODO(mainred): update the docker image to the official one when available
        # self.docker_image = "mcr.microsoft.com/aks/aks-agent:89bde40-local"
        self.docker_image = "mainred/aks-agent:89bde40-local"

        self.llm_config_manager = LLMConfigManagerLocal(
            subscription_id=subscription_id,
            resource_group_name=resource_group_name,
            cluster_name=cluster_name
        )

        # Ensure custom_toolset.yaml exists
        self._ensure_custom_toolset()

    def _ensure_custom_toolset(self) -> None:
        """
        Ensure custom_toolset.yaml exists in the config directory.
        Creates the file with default MCP server configuration if it doesn't exist.
        """
        import yaml

        custom_toolset_file = self.config_dir / "custom_toolset.yaml"

        # Create config directory if it doesn't exist
        self.config_dir.mkdir(parents=True, exist_ok=True)

        # Only create if file doesn't exist
        if not custom_toolset_file.exists():
            default_config = {
                "mcp_servers": {
                    "aks-mcp": {
                        "description": "Azure MCP server exposes the Azure and Kubernetes capabilities for Azure Kubernetes Service clusters",
                        "config": {
                            "url": "http://localhost:8000/sse",
                        }
                    }
                }
            }

            try:
                with open(custom_toolset_file, 'w', encoding='utf-8') as f:
                    yaml.dump(default_config, f, default_flow_style=False, sort_keys=False)
                logger.debug("Created custom_toolset.yaml at: %s", custom_toolset_file)
            except Exception as e:
                logger.warning("Failed to create custom_toolset.yaml: %s", e)
        else:
            logger.debug("custom_toolset.yaml already exists at: %s", custom_toolset_file)

    def save_llm_config(self, provider: LLMProvider, params: dict) -> None:
        """
        Save LLM configuration using the LLMConfigManager.

        Args:
            provider: LLMProvider instance
            params: Dictionary of model parameters
        """
        self.llm_config_manager.save(provider, params)

    def get_llm_config(self) -> Dict:
        """
        Get LLM configuration from local file.

        Returns:
            Dictionary of model configurations if exists, empty dict otherwise
        """
        return self.llm_config_manager.model_list if self.llm_config_manager.model_list else {}

    def exec_aks_agent(self, command_flags: str = "") -> bool:
        """
        Execute commands on the AKS agent using Docker container.

        This method runs the AKS agent in a Docker container with the local
        LLM configuration and kubeconfig mounted.

        Args:
            command_flags: Additional flags for the aks-agent command

        Returns:
            True if execution was successful

        Raises:
            AzCLIError: If execution fails or Docker is not available
        """
        import subprocess
        import sys

        logger.info("Executing AKS agent command in Docker container with flags: %s", command_flags)

        try:
            # Check if configuration exists
            model_list_file = self.config_dir / "model_list.yaml"
            custom_toolset_file = self.config_dir / "custom_toolset.yaml"

            if not self.config_dir.exists() or not model_list_file.exists() or not custom_toolset_file.exists():
                raise AzCLIError(
                    "AKS agent configuration not found.\n"
                    "Please run 'az aks agent-init' first to initialize the agent."
                )

            # Check if Docker is available
            try:
                subprocess.run(
                    ["docker", "--version"],
                    check=True,
                    stdout=subprocess.PIPE,
                    stderr=subprocess.PIPE
                )
            except (subprocess.CalledProcessError, FileNotFoundError):
                raise AzCLIError(
                    "Docker is not available. Please install Docker to use client mode.\n"
                    "Visit https://docs.docker.com/get-docker/ for installation instructions."
                )

            # Prepare Docker run command
            docker_image = self.docker_image

            # Build volume mounts
            volumes = []
            if self.kubeconfig_path:
                volumes.extend(["-v", f"{self.kubeconfig_path}:/root/.kube/config:ro"])

            # Mount Azure config directory
            azure_config_dir = os.path.expanduser("~/.azure")
            if os.path.exists(azure_config_dir):
                volumes.extend(["-v", f"{azure_config_dir}:/root/.azure:ro"])
                logger.debug("Mounting Azure config directory: %s", azure_config_dir)
            else:
                logger.debug("Azure config directory not found, skipping mount: %s", azure_config_dir)

            # Mount LLM config files
            model_list_file = self.config_dir / "model_list.yaml"
            custom_toolset_file = self.config_dir / "custom_toolset.yaml"

            # Mount model_list.yaml
            volumes.extend(["-v", f"{model_list_file}:/etc/aks-agent/config/model_list.yaml:ro"])

            # Mount custom_toolset.yaml
            volumes.extend(["-v", f"{custom_toolset_file}:/etc/aks-agent/config/custom_toolset.yaml:ro"])

            # Build environment variables for AKS context
            env_vars = [
                "-e", f"AKS_RESOURCE_GROUP={self.resource_group_name}",
                "-e", f"AKS_CLUSTER_NAME={self.cluster_name}",
                "-e", f"AKS_SUBSCRIPTION_ID={self.subscription_id}"
            ]

            # Prepare the command
            exec_command = [
                "docker", "run",
                "--rm",
                "-it",
                *volumes,
                *env_vars,
                docker_image,
                "ask"
            ]

            # Add command flags if provided
            if command_flags:
                # Parse command_flags - it might be a quoted string with multiple args
                import shlex
                flag_parts = shlex.split(command_flags)
                exec_command.extend(flag_parts)

            logger.debug("Running Docker command: %s", " ".join(exec_command))

            # Execute the Docker container with interactive TTY
            result = subprocess.run(
                exec_command,
                stdin=sys.stdin,
                stdout=sys.stdout,
                stderr=sys.stderr
            )

            if result.returncode != 0:
                raise AzCLIError(f"Docker container exited with code {result.returncode}")

            logger.info("AKS agent command executed successfully in Docker container")
            return True

        except AzCLIError:
            raise
        except subprocess.CalledProcessError as e:
            logger.error("Failed to execute Docker command: %s", e)
            raise AzCLIError(f"Failed to execute AKS agent in Docker: {e}")
        except Exception as e:
            logger.error("Failed to execute AKS agent command: %s", e)
            raise

    def uninstall_agent(self) -> bool:
        """
        Uninstall AKS agent by removing local LLM configuration files.

        Returns:
            True if uninstallation was successful
        """
        logger.info("Removing local AKS agent configuration")

        try:
            config_files = [
                self.config_dir / "model_list.yaml",
                self.config_dir / "custom_toolset.yaml"
            ]

            removed_files = []
            for config_file in config_files:
                if config_file.exists():
                    config_file.unlink()
                    removed_files.append(str(config_file))
                    logger.debug("Removed config file: %s", config_file)

            if removed_files:
                logger.info("Successfully removed %d configuration file(s)", len(removed_files))
            else:
                logger.info("No configuration files found to remove")

            # Optionally remove the config directory if it's empty
            if self.config_dir.exists() and not any(self.config_dir.iterdir()):
                self.config_dir.rmdir()
                logger.debug("Removed empty config directory: %s", self.config_dir)

            return True

        except Exception as e:
            logger.error("Failed to remove local configuration: %s", e)
            raise AzCLIError(f"Failed to uninstall local agent configuration: {e}")

src/aks-agent/HISTORY.rst

src/aks-agent/azext_aks_agent/custom.py

src/aks-agent/azext_aks_agent/agent/k8s/aks_agent_manager.py

src/aks-agent/azext_aks_agent/custom.py

src/aks-agent/azext_aks_agent/agent/k8s/aks_agent_manager.py

src/aks-agent/azext_aks_agent/_help.py

src/aks-agent/azext_aks_agent/custom.py

src/aks-agent/azext_aks_agent/agent/k8s/aks_agent_manager.py

mainred · 2026-01-13T13:09:22Z

src/aks-agent/azext_aks_agent/agent/k8s/aks_agent_manager.py

+        self.cluster_name: str = cluster_name
+        self.subscription_id: str = subscription_id

-        self.chart_repo = "oci://mcr.microsoft.com/aks/aks-agent-chart/aks-agent"


Need to use the official one

yanzhudd · 2026-01-16T04:53:59Z

/azp run

azure-pipelines · 2026-01-16T04:54:12Z

Azure Pipelines successfully started running 2 pipeline(s).

mainred · 2026-01-19T01:58:58Z

@yanzhudd could you please help merge this PR? Thanks.

azclibot · 2026-01-20T06:11:27Z

[Release] Update index.json for extension [ aks-agent-1.0.0b15 ] : https://dev.azure.com/msazure/One/_build/results?buildId=149894816&view=results

Copilot AI review requested due to automatic review settings January 13, 2026 12:48

mainred requested review from feiskyer and nilo19 as code owners January 13, 2026 12:48

azure-client-tools-bot-prd bot added this to the February 2026 (2026-02-03) milestone Jan 13, 2026

microsoft-github-policy-service bot requested review from FumingZhang, yanzhudd, yonzhan and zhoxing-ms January 13, 2026 12:48

microsoft-github-policy-service bot assigned yanzhudd and zhoxing-ms Jan 13, 2026

microsoft-github-policy-service bot added Auto-Assign Auto assign by bot AKS labels Jan 13, 2026

Copilot started reviewing on behalf of mainred January 13, 2026 12:48 View session

feat: support cluster and client mode

3bee7fe

mainred force-pushed the client-and-contianer-mode branch from 2037264 to 3bee7fe Compare January 13, 2026 12:52

fix unbound namespace var issue

ca3714d

Copilot AI reviewed Jan 13, 2026

View reviewed changes

mainred commented Jan 13, 2026

View reviewed changes

mainred added 4 commits January 14, 2026 00:38

add security prompt

a982232

fix tests

72554d8

use official image version

e8ff940

address ai comments

8198945

yanzhudd approved these changes Jan 16, 2026

View reviewed changes

yanzhudd merged commit 65a51da into Azure:main Jan 20, 2026
24 checks passed

{AKS} feat(aks-agent): support cluster and client mode #9522

{AKS} feat(aks-agent): support cluster and client mode #9522

Uh oh!

Conversation

mainred commented Jan 13, 2026

Related command

General Guidelines

About Extension Publish

Uh oh!

azure-client-tools-bot-prd bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yonzhan commented Jan 13, 2026

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mainred Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

yanzhudd commented Jan 16, 2026

Uh oh!

azure-pipelines bot commented Jan 16, 2026

Uh oh!

mainred commented Jan 19, 2026

Uh oh!

Uh oh!

azclibot commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

azure-client-tools-bot-prd bot commented Jan 13, 2026 •

edited

Loading