-
Notifications
You must be signed in to change notification settings - Fork 39
fix: prevent race condition deleting shared vertex/langfuse secrets #495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
- Introduced a new endpoint for retrieving MCP server statuses in the backend, allowing the frontend to display real-time integration statuses. - Updated the `SelectWorkflow` function to handle MCP status checks during workflow selection. - Enhanced the `McpIntegrationsAccordion` component to fetch and display MCP server statuses dynamically. - Removed deprecated model configuration and session creation components to streamline the codebase. These changes improve the visibility of MCP integrations and enhance user experience by providing real-time feedback on server statuses.
When multiple sessions exist in the same namespace, the operator would delete the ambient-vertex and ambient-admin-langfuse-secret when one session stopped, even if other sessions were still Running, Creating, or Pending and needed those secrets. This fix adds a check before deleting these shared secrets to count active sessions in the namespace. If any sessions are still active, the secret deletion is skipped. Fixes the MountVolume.SetUp failed error: 'secret "ambient-vertex" not found' Related to PR #494
Claude Code ReviewSummaryThis PR fixes a race condition where stopping one session would delete shared secrets ( Issues by Severity🚫 Blocker IssuesNone - The core race condition fix is sound and can be merged after addressing critical issues below. 🔴 Critical Issues
🟡 Major Issues
🔵 Minor Issues
Positive Highlights✅ Excellent Root Cause Analysis - PR description clearly explains problem and solution Verdict✅ APPROVE with changes - The core race condition fix (operator changes) is correct and well-implemented. Address critical issues 1-4 before merge. Consider extracting OAuth work to follow-up PR. Review based on: CLAUDE.md, backend-development.md, frontend-development.md, security-standards.md, k8s-client-usage.md, error-handling.md, react-query-usage.md 🔍 View AI decision process (logs available for 90 days) 📋 View memory system files loaded (click to expand)What Amber Loaded for Code ReviewAmber automatically loaded these repository standards from the memory system:
Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines. |
- Updated the `McpIntegrationsAccordion` component to display authentication status for MCP servers, including new icons and badges for authenticated and unauthenticated states. - Modified the backend to include authentication checks for known MCP servers (Google Workspace and Jira) and return relevant messages. - Added a new `webfetch` server configuration to the `.mcp.json` file for improved web content extraction. - Refactored the session API to provide clearer information on server authentication status in the response. These changes improve user awareness of MCP server states and enhance the overall integration experience.
d8a0847 to
c88998a
Compare
Claude Code ReviewSummaryThis PR fixes a critical race condition in shared secret deletion and enhances MCP integration UI with authentication status. The core fix is solid, but there are several issues that need attention before merge. Issues by Severity🚫 Blocker Issues1. Duplicate Route Registration (routes.go:93-96) // Lines 93-96 have duplicate registration
projectGroup.GET("/agentic-sessions/:sessionName/mcp/status", websocket.HandleMCPStatus)
// MCP status endpoint
projectGroup.GET("/agentic-sessions/:sessionName/mcp/status", websocket.HandleMCPStatus)Impact: Second registration overrides the first, may cause unexpected behavior or panic on startup. 🔴 Critical Issues2. Type Safety Violation (sessions.go:2147-2181) if p, ok := status["phase"].(string); ok {
phase = p
}Issue: This pattern is correct, but the unstructured access is inconsistent with project standards. phase, found, err := unstructured.NestedString(session.Object, "status", "phase")
if !found || err != nil {
phase = ""
}Fix: Use 3. Frontend Type Safety: Missing Type Definitions authenticated?: boolean;
authMessage?: string;Issue: TypeScript optional properties are authenticated?: boolean | undefined;
authMessage?: string | undefined;File: components/frontend/src/services/api/sessions.ts:24-25 🟡 Major Issues4. Error Handling Pattern Inconsistency (sessions.go:2139-2143) if err != nil {
log.Printf("Warning: failed to list sessions in namespace %s, skipping secret deletion: %v", namespace, err)
return nil // Don't delete if we can't verify no other sessions need it
}Issue: This is actually good defensive programming (safe-fail), but the pattern deviates slightly from error-handling.md. // Safe-fail: If we can't list sessions, assume some might need the secret
log.Printf("Warning: failed to list sessions in namespace %s, skipping secret deletion: %v", namespace, err)
return nil5. Missing Loading State Handling (mcp-integrations-accordion.tsx:29) const { data: mcpStatus } = useMcpStatus(projectName, sessionName)Per frontend standards (CLAUDE.md:1130): "All buttons have loading states" const { data: mcpStatus, isLoading, error } = useMcpStatus(projectName, sessionName)
if (isLoading) return <div>Loading MCP status...</div>
if (error) return <div>Error loading MCP servers</div>6. Python Type Hints Missing (main.py:287) def _check_mcp_authentication(server_name: str) -> tuple[bool | None, str | None]:Issue: Modern Python (3.10+) should use from typing import Optional
def _check_mcp_authentication(server_name: str) -> tuple[Optional[bool], Optional[str]]:🔵 Minor Issues7. Log Message Clarity (sessions.go:2160) log.Printf("Skipping %s secret deletion in namespace %s: %d active session(s) may still need it", ...)Suggestion: Add phase names for debugging: log.Printf("Skipping %s secret deletion in namespace %s: %d active session(s) (Running/Creating/Pending) still need it", ...)8. Unused Icon Import (mcp-integrations-accordion.tsx:3) import { Plug, CheckCircle2, XCircle, AlertCircle, KeyRound, KeyRoundIcon } from 'lucide-react'Issue: 9. Comment Duplication (sessions.go:2117-2118, 2173-2174) 10. Python Path Check Duplication (main.py:300-306) for cred_path in [workspace_path, secret_path]:
if cred_path.exists():
try:
if cred_path.stat().st_size > 0:Suggestion: Extract to helper function: def _check_credential_file(path: Path) -> bool:
return path.exists() and path.stat().st_size > 0Positive Highlights✅ Excellent Problem Identification: The race condition fix directly addresses a critical bug with clear root cause analysis. ✅ Safe-Fail Design: Both secret deletion functions fail-safe (skip deletion if uncertain) - this is the correct approach for shared resources. ✅ Comprehensive Solution: The fix checks all relevant session phases (Running, Creating, Pending) rather than just Running. ✅ Good Logging: Clear log messages with context (namespace, active count) aid debugging. ✅ Type-Safe Fallbacks (Frontend): The UI gracefully handles missing ✅ Clear Authentication Status: The MCP UI enhancement provides valuable user feedback about credential state. ✅ Proper Tooltip Implementation: Uses Shadcn UI components ( RecommendationsImmediate (Before Merge)
High Priority (Before Merge)
Nice-to-Have (Can Address in Follow-up)
Testing Checklist
Architecture Alignment✅ Follows operator reconciliation patterns (error-handling.md) Security Review✅ No token logging Overall Assessment: Strong fix with good architectural choices, but needs type safety improvements and duplicate route removal before merge. 🔍 View AI decision process (logs available for 90 days) 📋 View memory system files loaded (click to expand)What Amber Loaded for Code ReviewAmber automatically loaded these repository standards from the memory system:
Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines. |
- Eliminated redundant registration of the MCP status endpoint in the routes.go file, streamlining the route definitions. - This change enhances code clarity and maintainability without altering existing functionality.
Claude Code ReviewSummaryThis PR fixes a critical race condition where shared secrets ( Overall Assessment: The operator fix is well-implemented and follows established patterns. The frontend/runner changes have minor issues that should be addressed. Issues by Severity🚫 Blocker IssuesNone - No blocking issues found. 🔴 Critical Issues1. Frontend: Missing Type Definition in API Types
// Current (violates standard)
export type McpServer = {
// ...
authenticated?: boolean; // These should be part of the type definition
authMessage?: string;
}
// Should use 'type' consistently throughout (already using type, so this is OK)
// But note: The actual file uses 'type' so this is NOT a blocker
🟡 Major Issues1. Operator: Code Duplication in Secret Deletion Functions
// Suggested refactor
func countActiveSessions(ctx context.Context, namespace string) (int, error) {
gvr := types.GetAgenticSessionResource()
sessions, err := config.DynamicClient.Resource(gvr).Namespace(namespace).List(ctx, v1.ListOptions{})
if err \!= nil {
return -1, err
}
activeCount := 0
for _, session := range sessions.Items {
status, _, _ := unstructured.NestedMap(session.Object, "status")
phase := ""
if status \!= nil {
if p, ok := status["phase"].(string); ok {
phase = p
}
}
if phase == "Running" || phase == "Creating" || phase == "Pending" {
activeCount++
}
}
return activeCount, nil
}
func deleteSharedSecretIfSafe(ctx context.Context, namespace, secretName string) error {
// Check if secret was copied (has the label)
// Check active session count
// Delete if safe
}2. Runner: Hardcoded Paths in Authentication Check
GOOGLE_WORKSPACE_SECRET_PATH = os.getenv("GOOGLE_WORKSPACE_SECRET_PATH", "/app/.google_workspace_mcp/credentials")
GOOGLE_WORKSPACE_WORKSPACE_PATH = os.getenv("GOOGLE_WORKSPACE_WORKSPACE_PATH", "/workspace/.google_workspace_mcp/credentials")3. Runner: Incomplete Error Handling in File Existence Check
try:
if cred_path.stat().st_size > 0:
return True, "Google OAuth credentials available"
except OSError as e:
logger.warning(f"Failed to check credentials at {cred_path}: {e}")
continue # Try next path🔵 Minor Issues1. Operator: Inconsistent Logging Verbosity
2. Frontend: Unused Import
3. Runner: Missing Type Hints
def _check_mcp_authentication(server_name: str) -> tuple[bool | None, str | None]:
4. Frontend: Redundant Conditional Logic
Positive Highlights✅ Excellent Operator Fix
✅ Proper Phase Checking
✅ Good Logging
✅ Follows Security Pattern
✅ Frontend UX Improvements
✅ MCP Auth Check Logic
RecommendationsPriority 1 (Address Before Merge)
Priority 2 (Address Soon)
Priority 3 (Nice to Have)
Architecture Alignment✅ Backend/Operator Standards: Follows established patterns
✅ Frontend Standards: Mostly follows guidelines
Testing NotesThe PR mentions stage cluster testing:
Recommended additional testing:
ConclusionRecommendation: APPROVE with minor changes The core fix for the race condition is solid and follows all critical backend/operator patterns. The code duplication should be addressed to prevent future maintenance issues, but doesn't block the merge. The frontend and runner changes are good UX improvements with minor cleanup needed. Priority fixes before merge:
Can be addressed in follow-up:
🔍 View AI decision process (logs available for 90 days) 📋 View memory system files loaded (click to expand)What Amber Loaded for Code ReviewAmber automatically loaded these repository standards from the memory system:
Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines. |
Problem
When multiple sessions exist in the same namespace (project), stopping one session would delete the
ambient-vertexandambient-admin-langfuse-secretsecrets from the namespace, even if other sessions were still Running, Creating, or Pending and needed those secrets.This caused the error:
Root Cause
The
deleteAmbientVertexSecretanddeleteAmbientLangfuseSecretfunctions were called during session cleanup without checking if other sessions in the same namespace still needed these shared secrets.Solution
Added a check before deleting these shared secrets to count active sessions (Running, Creating, or Pending) in the namespace. If any active sessions exist, the secret deletion is skipped with a log message.
Testing
Related to PR #494