Skip to content

Conversation

@mainred
Copy link
Member

@mainred mainred commented Jan 13, 2026


This checklist is used to make sure that common guidelines for a pull request are followed.

Related command

General Guidelines

  • Have you run azdev style <YOUR_EXT> locally? (pip install azdev required)
  • Have you run python scripts/ci/test_index.py -q locally? (pip install wheel==0.30.0 required)
  • My extension version conforms to the Extension version schema

For new extensions:

About Extension Publish

There is a pipeline to automatically build, upload and publish extension wheels.
Once your pull request is merged into main branch, a new pull request will be created to update src/index.json automatically.
You only need to update the version information in file setup.py and historical information in file HISTORY.rst in your PR but do not modify src/index.json.

Copilot AI review requested due to automatic review settings January 13, 2026 12:48
@azure-client-tools-bot-prd
Copy link

azure-client-tools-bot-prd bot commented Jan 13, 2026

⚠️Azure CLI Extensions Breaking Change Test
⚠️aks-agent
rule cmd_name rule_message suggest_message
⚠️ 1006 - ParaAdd aks agent cmd aks agent added parameter mode
⚠️ 1006 - ParaAdd aks agent cmd aks agent added parameter namespace
⚠️ 1006 - ParaAdd aks agent-cleanup cmd aks agent-cleanup added parameter mode
⚠️ 1006 - ParaAdd aks agent-cleanup cmd aks agent-cleanup added parameter namespace

@yonzhan
Copy link
Collaborator

yonzhan commented Jan 13, 2026

Thank you for your contribution! We will review the pull request and get back to you soon.

@github-actions
Copy link

The git hooks are available for azure-cli and azure-cli-extensions repos. They could help you run required checks before creating the PR.

Please sync the latest code with latest dev branch (for azure-cli) or main branch (for azure-cli-extensions).
After that please run the following commands to enable git hooks:

pip install azdev --upgrade
azdev setup -c <your azure-cli repo path> -r <your azure-cli-extensions repo path>

@github-actions
Copy link

@mainred mainred force-pushed the client-and-contianer-mode branch from 2037264 to 3bee7fe Compare January 13, 2026 12:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for two deployment modes to the AKS agent extension: cluster mode (existing Helm-based deployment) and local mode (new Docker container-based local execution). The changes introduce mode selection during initialization, refactor the agent manager classes using an abstract base class, and add comprehensive telemetry tracking for different event types.

Changes:

  • Added interactive mode selection during az aks agent-init allowing users to choose between cluster mode and local mode
  • Introduced AKSAgentManagerLocal class for Docker-based local execution with local file-based configuration storage
  • Refactored code using abstract base class AKSAgentManagerLLMConfigBase to support both deployment modes
  • Modified namespace handling to require explicit namespace specification for cluster mode operations
  • Enhanced telemetry to track init, cleanup, and startup events with mode information

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
setup.py Version bump from 1.0.0b14 to 1.0.0b15
custom.py Major refactoring to support mode selection, separate LLM config and helm deployment setup, add local mode status checking
aks_agent_manager.py Introduced abstract base class, refactored AKSAgentManager with breaking constructor changes, added AKSAgentManagerLocal class for Docker execution
llm_config_manager.py Added LLMConfigManagerLocal class for file-based LLM configuration management
telemetry.py Enhanced to track different event types (init, cleanup, startup) and mode information
_params.py Added namespace and mode parameters for agent and cleanup commands
_help.py Updated help text to document new mode parameter and local mode requirements
_consts.py Added CONFIG_DIR constant for local configuration storage
aks.py Improved kubelogin command to explicitly specify kubeconfig path
init.py Exported AKSAgentManagerLocal class
test_aks_agent_manager.py Removed deprecated test for set_aks_context method
README.rst Updated documentation to explain both deployment modes
HISTORY.rst Added release notes for version 1.0.0b15
Comments suppressed due to low confidence (1)

src/aks-agent/azext_aks_agent/agent/k8s/aks_agent_manager.py:1161

  • The newly introduced AKSAgentManagerLocal class (lines 905-1161) lacks test coverage. Given that this file has comprehensive test coverage for AKSAgentManager, similar tests should be added for the local mode implementation to verify Docker execution, configuration file handling, and error scenarios.
        # Create cluster-specific config directory to match LLMConfigManagerLocal
        self.config_dir = self.base_config_dir / subscription_id / resource_group_name / cluster_name

        # Docker image for client mode execution
        # TODO(mainred): update the docker image to the official one when available
        # self.docker_image = "mcr.microsoft.com/aks/aks-agent:89bde40-local"
        self.docker_image = "mainred/aks-agent:89bde40-local"

        self.llm_config_manager = LLMConfigManagerLocal(
            subscription_id=subscription_id,
            resource_group_name=resource_group_name,
            cluster_name=cluster_name
        )

        # Ensure custom_toolset.yaml exists
        self._ensure_custom_toolset()

    def _ensure_custom_toolset(self) -> None:
        """
        Ensure custom_toolset.yaml exists in the config directory.
        Creates the file with default MCP server configuration if it doesn't exist.
        """
        import yaml

        custom_toolset_file = self.config_dir / "custom_toolset.yaml"

        # Create config directory if it doesn't exist
        self.config_dir.mkdir(parents=True, exist_ok=True)

        # Only create if file doesn't exist
        if not custom_toolset_file.exists():
            default_config = {
                "mcp_servers": {
                    "aks-mcp": {
                        "description": "Azure MCP server exposes the Azure and Kubernetes capabilities for Azure Kubernetes Service clusters",
                        "config": {
                            "url": "http://localhost:8000/sse",
                        }
                    }
                }
            }

            try:
                with open(custom_toolset_file, 'w', encoding='utf-8') as f:
                    yaml.dump(default_config, f, default_flow_style=False, sort_keys=False)
                logger.debug("Created custom_toolset.yaml at: %s", custom_toolset_file)
            except Exception as e:
                logger.warning("Failed to create custom_toolset.yaml: %s", e)
        else:
            logger.debug("custom_toolset.yaml already exists at: %s", custom_toolset_file)

    def save_llm_config(self, provider: LLMProvider, params: dict) -> None:
        """
        Save LLM configuration using the LLMConfigManager.

        Args:
            provider: LLMProvider instance
            params: Dictionary of model parameters
        """
        self.llm_config_manager.save(provider, params)

    def get_llm_config(self) -> Dict:
        """
        Get LLM configuration from local file.

        Returns:
            Dictionary of model configurations if exists, empty dict otherwise
        """
        return self.llm_config_manager.model_list if self.llm_config_manager.model_list else {}

    def exec_aks_agent(self, command_flags: str = "") -> bool:
        """
        Execute commands on the AKS agent using Docker container.

        This method runs the AKS agent in a Docker container with the local
        LLM configuration and kubeconfig mounted.

        Args:
            command_flags: Additional flags for the aks-agent command

        Returns:
            True if execution was successful

        Raises:
            AzCLIError: If execution fails or Docker is not available
        """
        import subprocess
        import sys

        logger.info("Executing AKS agent command in Docker container with flags: %s", command_flags)

        try:
            # Check if configuration exists
            model_list_file = self.config_dir / "model_list.yaml"
            custom_toolset_file = self.config_dir / "custom_toolset.yaml"

            if not self.config_dir.exists() or not model_list_file.exists() or not custom_toolset_file.exists():
                raise AzCLIError(
                    "AKS agent configuration not found.\n"
                    "Please run 'az aks agent-init' first to initialize the agent."
                )

            # Check if Docker is available
            try:
                subprocess.run(
                    ["docker", "--version"],
                    check=True,
                    stdout=subprocess.PIPE,
                    stderr=subprocess.PIPE
                )
            except (subprocess.CalledProcessError, FileNotFoundError):
                raise AzCLIError(
                    "Docker is not available. Please install Docker to use client mode.\n"
                    "Visit https://docs.docker.com/get-docker/ for installation instructions."
                )

            # Prepare Docker run command
            docker_image = self.docker_image

            # Build volume mounts
            volumes = []
            if self.kubeconfig_path:
                volumes.extend(["-v", f"{self.kubeconfig_path}:/root/.kube/config:ro"])

            # Mount Azure config directory
            azure_config_dir = os.path.expanduser("~/.azure")
            if os.path.exists(azure_config_dir):
                volumes.extend(["-v", f"{azure_config_dir}:/root/.azure:ro"])
                logger.debug("Mounting Azure config directory: %s", azure_config_dir)
            else:
                logger.debug("Azure config directory not found, skipping mount: %s", azure_config_dir)

            # Mount LLM config files
            model_list_file = self.config_dir / "model_list.yaml"
            custom_toolset_file = self.config_dir / "custom_toolset.yaml"

            # Mount model_list.yaml
            volumes.extend(["-v", f"{model_list_file}:/etc/aks-agent/config/model_list.yaml:ro"])

            # Mount custom_toolset.yaml
            volumes.extend(["-v", f"{custom_toolset_file}:/etc/aks-agent/config/custom_toolset.yaml:ro"])

            # Build environment variables for AKS context
            env_vars = [
                "-e", f"AKS_RESOURCE_GROUP={self.resource_group_name}",
                "-e", f"AKS_CLUSTER_NAME={self.cluster_name}",
                "-e", f"AKS_SUBSCRIPTION_ID={self.subscription_id}"
            ]

            # Prepare the command
            exec_command = [
                "docker", "run",
                "--rm",
                "-it",
                *volumes,
                *env_vars,
                docker_image,
                "ask"
            ]

            # Add command flags if provided
            if command_flags:
                # Parse command_flags - it might be a quoted string with multiple args
                import shlex
                flag_parts = shlex.split(command_flags)
                exec_command.extend(flag_parts)

            logger.debug("Running Docker command: %s", " ".join(exec_command))

            # Execute the Docker container with interactive TTY
            result = subprocess.run(
                exec_command,
                stdin=sys.stdin,
                stdout=sys.stdout,
                stderr=sys.stderr
            )

            if result.returncode != 0:
                raise AzCLIError(f"Docker container exited with code {result.returncode}")

            logger.info("AKS agent command executed successfully in Docker container")
            return True

        except AzCLIError:
            raise
        except subprocess.CalledProcessError as e:
            logger.error("Failed to execute Docker command: %s", e)
            raise AzCLIError(f"Failed to execute AKS agent in Docker: {e}")
        except Exception as e:
            logger.error("Failed to execute AKS agent command: %s", e)
            raise

    def uninstall_agent(self) -> bool:
        """
        Uninstall AKS agent by removing local LLM configuration files.

        Returns:
            True if uninstallation was successful
        """
        logger.info("Removing local AKS agent configuration")

        try:
            config_files = [
                self.config_dir / "model_list.yaml",
                self.config_dir / "custom_toolset.yaml"
            ]

            removed_files = []
            for config_file in config_files:
                if config_file.exists():
                    config_file.unlink()
                    removed_files.append(str(config_file))
                    logger.debug("Removed config file: %s", config_file)

            if removed_files:
                logger.info("Successfully removed %d configuration file(s)", len(removed_files))
            else:
                logger.info("No configuration files found to remove")

            # Optionally remove the config directory if it's empty
            if self.config_dir.exists() and not any(self.config_dir.iterdir()):
                self.config_dir.rmdir()
                logger.debug("Removed empty config directory: %s", self.config_dir)

            return True

        except Exception as e:
            logger.error("Failed to remove local configuration: %s", e)
            raise AzCLIError(f"Failed to uninstall local agent configuration: {e}")

self.cluster_name: str = cluster_name
self.subscription_id: str = subscription_id

self.chart_repo = "oci://mcr.microsoft.com/aks/aks-agent-chart/aks-agent"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to use the official one

@yanzhudd
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@mainred
Copy link
Member Author

mainred commented Jan 19, 2026

@yanzhudd could you please help merge this PR? Thanks.

@yanzhudd yanzhudd merged commit 65a51da into Azure:main Jan 20, 2026
24 checks passed
@azclibot
Copy link
Collaborator

[Release] Update index.json for extension [ aks-agent-1.0.0b15 ] : https://dev.azure.com/msazure/One/_build/results?buildId=149894816&view=results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AKS Auto-Assign Auto assign by bot

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants