-
Notifications
You must be signed in to change notification settings - Fork 37
fix: Create placeholder Google OAuth secret for MCP integration #483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Create a placeholder Google OAuth secret if it doesn't exist, ensuring the volume mount is always present for K8s to sync credentials post-OAuth. - Log creation of the placeholder secret and its mounting to the ambient-code-runner container. - Maintain existing functionality to mount the Google OAuth secret, allowing for updates once the backend populates the credentials.
This comment has been minimized.
This comment has been minimized.
- Use .initial_prompt_sent marker file instead of .claude/state - Create marker file after successfully sending initial prompt - More reliable detection of session resume vs fresh start
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Claude Code ReviewSummaryThis PR implements a clever solution to the Google OAuth timing issue by creating a placeholder secret that ensures the volume mount is always present in the pod spec, allowing Kubernetes to sync credentials after OAuth completion. The implementation is generally solid and follows project patterns, but there are several important issues that should be addressed. Issues by Severity🚫 Blocker IssuesNone - No blocking issues that would prevent merge. 🔴 Critical Issues1. Missing OwnerReference APIVersion Field (operator/internal/handlers/sessions.go:1396) OwnerReferences: []v1.OwnerReference{
{
APIVersion: "vteam.ambient-code/v1", // ❌ Should be v1alpha1
Kind: "AgenticSession",
Name: currentObj.GetName(),
UID: currentObj.GetUID(),
Controller: boolPtr(true),
},
},Issue: APIVersion is set to Impact: OwnerReference won't match the actual AgenticSession resource, preventing automatic cleanup and potentially causing resource leaks. Fix: Change to Reference: CLAUDE.md line 447: "REQUIRED: Set OwnerReferences on all child resources" 2. Using Service Account Without User Authorization Check (operator/internal/handlers/sessions.go:1382) if _, err := config.K8sClient.CoreV1().Secrets(sessionNamespace).Get(context.TODO(), googleOAuthSecretName, v1.GetOptions{}); errors.IsNotFound(err) {
// Create empty placeholder secret
if _, createErr := config.K8sClient.CoreV1().Secrets(sessionNamespace).Create(context.TODO(), placeholderSecret, v1.CreateOptions{}); createErr != nil {Issue: The operator uses the service account client directly without verifying the user who created the session has permission to create secrets. While this is operator code (not handler code), the pattern deviates from the principle of least privilege. Impact: Medium - Operator has elevated permissions, but this creates a secret on behalf of user without explicit validation that the user should have this capability. Recommendation: This is acceptable for operator code since it's managing resources on behalf of the session lifecycle. However, document this as an intentional escalation in the code comments. 🟡 Major Issues3. No Error Handling for OwnerReference Field Access (operator/internal/handlers/sessions.go:1398-1400) Name: currentObj.GetName(),
UID: currentObj.GetUID(),Issue: Direct access to currentObj fields without checking if currentObj is nil (though unlikely in this context, defensive programming is important). Impact: Potential nil pointer dereference if currentObj is somehow nil (race condition, deleted during processing). Fix: Add nil check or add error return to function signature. Reference: error-handling.md line 123: "Resource Deleted During Processing" pattern 4. Silent Failure on Placeholder Secret Creation (operator/internal/handlers/sessions.go:1410) if _, createErr := config.K8sClient.CoreV1().Secrets(sessionNamespace).Create(context.TODO(), placeholderSecret, v1.CreateOptions{}); createErr != nil {
log.Printf("Warning: Failed to create placeholder Google OAuth secret %s: %v", googleOAuthSecretName, createErr)
}Issue: If secret creation fails, the code logs a warning but continues execution. The volume mount will still be added, but it will reference a non-existent secret (even with Optional: true, this could cause confusion). Impact: Google OAuth will silently fail with no clear indication to the user why MCP tools aren't available. Recommendation: Consider updating the session status with a condition indicating MCP may not be available, or return an error to prevent job creation. 5. Hardcoded APIVersion String (operator/internal/handlers/sessions.go:1396) Issue: Uses hardcoded string Better approach: APIVersion: currentObj.GetAPIVersion(), // Uses actual API version from objectReference: k8s-client-usage.md and CLAUDE.md emphasize consistent patterns. 6. Missing Test Coverage Issue: No tests added for the new placeholder secret creation logic. The existing test file (sessions_test.go) has patterns for testing secret operations. Impact: Changes to secret handling logic could break functionality without detection. Recommendation: Add test case covering:
Example test pattern available at: components/operator/internal/handlers/sessions_test.go:23 7. Context Usage - Using context.TODO() (operator/internal/handlers/sessions.go:1382, 1409) config.K8sClient.CoreV1().Secrets(sessionNamespace).Get(context.TODO(), ...)Issue: Uses Impact: Operations could hang indefinitely if Kubernetes API is slow or unresponsive. Better approach: ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
// Use ctx instead of context.TODO()🔵 Minor Issues8. Inconsistent Logging Verbosity Issue: Logs "Found existing Google OAuth secret" at info level, which isn't particularly useful in production. Recommendation: Consider debug level or remove entirely since this is expected behavior. 9. Magic String "credentials.json" (operator/internal/handlers/sessions.go:1406) Issue: Hardcoded key name with no constant definition. Recommendation: Define as constant at package level: const GoogleOAuthCredentialsKey = "credentials.json"10. Import Organization (main.py:10) from pathlib import PathIssue: Recommendation: Follow Python conventions - stdlib imports should be together at top. Move to line 7 after 11. Marker File Documentation (main.py:101-103) Good: Comment explains why custom marker file is used. Enhancement: Consider adding comment about cleanup strategy - when/if this marker should be deleted. Positive Highlights✅ Excellent Problem Analysis - The PR description clearly explains the root cause and solution flow. ✅ Good Use of Optional Field - Using ✅ OwnerReferences Set - Follows project pattern of setting OwnerReferences for automatic cleanup (though APIVersion needs fix). ✅ Detailed Logging - Good log messages that will help debugging in production. ✅ Marker File Pattern - Smart solution to prevent re-sending INITIAL_PROMPT on session resume. ✅ Comments Explain Intent - Code includes helpful comments explaining the "why" not just the "what". RecommendationsPriority 1 (Must Fix Before Merge)
Priority 2 (Should Fix Before Merge)
Priority 3 (Nice to Have)
Overall Assessment: This is a well-thought-out solution to a tricky timing problem. The core approach is sound, but there are important issues with OwnerReferences and error handling that should be addressed before merge. With the recommended fixes, this will be a solid implementation. Estimated Effort to Address: 30-60 minutes for Priority 1 items. 🔍 View AI decision process (logs available for 90 days) 📋 View memory system files loaded (click to expand)What Amber Loaded for Code ReviewAmber automatically loaded these repository standards from the memory system:
Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines. |
When PARENT_SESSION_ID is set, this is a child session continuing from a parent - skip the initial prompt since the conversation already has context from the parent session.
Claude Code ReviewSummaryThis PR fixes a critical timing issue with Google Workspace MCP OAuth integration. The solution is elegant: create a placeholder secret at Job creation time to ensure the volume mount exists, allowing Kubernetes to sync credentials after the user completes OAuth. Overall Assessment: ✅ Good implementation with excellent comments, but has one blocker security issue and several important improvements needed before merge. Issues by Severity🚫 Blocker Issues1. Missing OwnerReference Validation (CRITICAL SECURITY) Location: Issue: Using Why this matters: If the AgenticSession is deleted between the initial fetch and secret creation, the OwnerReference will be invalid, and the secret won't be cleaned up automatically. Fix required: // BEFORE creating secret, verify currentObj still exists
verifyObj, err := config.DynamicClient.Resource(types.GetAgenticSessionResource()).
Namespace(sessionNamespace).Get(context.TODO(), name, v1.GetOptions{})
if errors.IsNotFound(err) {
log.Printf("AgenticSession %s no longer exists, skipping secret creation", name)
return nil // Session deleted, abort
}
if err != nil {
return fmt.Errorf("failed to verify session exists: %w", err)
}
// Use fresh object for OwnerReference
placeholderSecret := &corev1.Secret{
ObjectMeta: v1.ObjectMeta{
// ... other fields
OwnerReferences: []v1.OwnerReference{
{
APIVersion: verifyObj.GetAPIVersion(),
Kind: verifyObj.GetKind(),
Name: verifyObj.GetName(),
UID: verifyObj.GetUID(),
Controller: boolPtr(true),
},
},
},
// ...
}Reference: CLAUDE.md line 753-761, error-handling.md line 116-135 🔴 Critical Issues2. Non-Idempotent Secret Creation Location: Issue: Secret creation lacks idempotency check. If operator reconciles twice (watch reconnection, restart), it will fail on the second attempt and log a warning, but continue. This is technically safe but violates the reconciliation pattern. Why this matters: Operator reconciliation should be idempotent. Multiple reconciliations of the same event should produce the same outcome without errors. Recommendation: if _, err := config.K8sClient.CoreV1().Secrets(sessionNamespace).Get(context.TODO(), googleOAuthSecretName, v1.GetOptions{}); errors.IsNotFound(err) {
// Create secret
if _, createErr := config.K8sClient.CoreV1().Secrets(sessionNamespace).Create(context.TODO(), placeholderSecret, v1.CreateOptions{}); createErr != nil {
// Check if already exists (race condition from duplicate watch)
if errors.IsAlreadyExists(createErr) {
log.Printf("Google OAuth secret %s already exists (race condition), continuing", googleOAuthSecretName)
} else {
log.Printf("Warning: Failed to create placeholder Google OAuth secret %s: %v", googleOAuthSecretName, createErr)
}
} else {
log.Printf("Created placeholder Google OAuth secret %s (will be populated after user OAuth)", googleOAuthSecretName)
}
}Reference: CLAUDE.md line 773-777 3. Error Handling Inconsistency Location: Issue: Secret creation failure logs a warning and continues. While this allows the Job to proceed, it means the Google OAuth volume mount will fail (secret doesn't exist), and the operator won't retry. Why this matters: If secret creation fails due to a transient error (API server timeout, RBAC issue), the session will start but Google MCP integration will be permanently broken for that session. Recommendation: Return an error to trigger retry instead of just logging: if _, createErr := config.K8sClient.CoreV1().Secrets(sessionNamespace).Create(context.TODO(), placeholderSecret, v1.CreateOptions{}); createErr != nil && !errors.IsAlreadyExists(createErr) {
return fmt.Errorf("failed to create Google OAuth placeholder secret: %w", createErr)
}Reference: error-handling.md line 145-170 🟡 Major Issues4. Missing Context Timeout Location: Issue: Using Recommendation: ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
if _, err := config.K8sClient.CoreV1().Secrets(sessionNamespace).Get(ctx, googleOAuthSecretName, v1.GetOptions{}); errors.IsNotFound(err) {
// ...
if _, createErr := config.K8sClient.CoreV1().Secrets(sessionNamespace).Create(ctx, placeholderSecret, v1.CreateOptions{}); createErr != nil {
// ...
}
}Reference: Go best practices for K8s client operations 5. Python: Unused Import at Top Level Location: Issue: Why this matters: While not a functional issue, the diff shows you removed it from within the function (line 83 in old code) and added it at the top. However, it's still only used in those two functions, so the import organization is inconsistent. Recommendation: Either:
Current approach is acceptable - top-level imports are more idiomatic in Python. Just flagging that it changed. 🔵 Minor Issues6. Missing APIVersion Validation Location: Issue: Hardcoded APIVersion Fix: APIVersion: "vteam.ambient-code/v1alpha1", // Match actual CRD versionReference: CLAUDE.md line 335 states API version is 7. Log Message Clarity Location: Current: Suggestion: Add context about when updates will sync: log.Printf("Created placeholder Google OAuth secret %s (K8s will sync credentials ~60s after user completes OAuth)", googleOAuthSecretName)Why: Helps with debugging when users wonder why credentials aren't available immediately. 8. Python: Potential Race in Marker File Creation Location: Issue: Marker file creation happens AFTER successful API call. If the process crashes between line 180 and 189, the marker won't be created, and the initial prompt will be re-sent on restart. Impact: Low (unlikely scenario, and re-sending is not catastrophic) Recommendation: Consider moving marker creation before the API call, or using a try/finally: # Create marker BEFORE sending to prevent double-send on crash
try:
marker_path.parent.mkdir(parents=True, exist_ok=True)
marker_path.write_text(f"sending_at={asyncio.get_event_loop().time()}\n")
logger.info(f"Created initial prompt marker: {marker_path}")
except Exception as marker_err:
logger.warning(f"Failed to create initial prompt marker: {marker_err}")
# Now send the prompt
async with session.post(url, json=payload, headers=headers, timeout=aiohttp.ClientTimeout(total=30)) as resp:
if resp.status == 200:
result = await resp.json()
logger.info(f"INITIAL_PROMPT auto-execution started: {result}")
# Update marker with completion time
try:
marker_path.write_text(f"sent_at={asyncio.get_event_loop().time()}\n")
except Exception:
passPositive Highlights✅ Excellent Problem Analysis: The PR description clearly explains the race condition and why the placeholder secret approach solves it. ✅ Good Use of Comments: Code comments explain the "why" (e.g., "This ensures the volume mount is always present so K8s can sync credentials after OAuth completion"). ✅ Proper OwnerReference Pattern: Sets ✅ Correct Secret Volume Configuration: ✅ Python Marker Logic: The addition of marker file logic in the runner is clean and follows the existing pattern for resume detection. ✅ Backwards Compatible: Always mounting the secret (with Optional: true) means this works whether the secret exists or not, making it safe to deploy. Recommendations (Prioritized)Must Fix Before Merge (Blockers)
Should Fix (Critical)
Nice to Have (Major/Minor)
Testing RecommendationsBefore merge, verify:
Compliance with Repository Standards✅ K8s Client Usage: Correct use of service account client for resource creation (k8s-client-usage.md) ✅ Error Handling: Mostly follows patterns, but needs improvement on secret creation failure (error-handling.md) ✅ Logging: Good structured logs with context (namespace, resource name) Overall: This is a well-thought-out fix with clear intent. The core logic is sound, but the OwnerReference race condition is a blocker. Fix that plus the API version issue, and this will be ready to merge. 🔍 View AI decision process (logs available for 90 days) 📋 View memory system files loaded (click to expand)What Amber Loaded for Code ReviewAmber automatically loaded these repository standards from the memory system:
Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines. |
…N_ID Remove marker file approach. Now only check PARENT_SESSION_ID to determine if we should skip the initial prompt. If a parent session exists, this is a continuation and we skip the initial prompt.
Claude Code ReviewSummaryThis PR fixes a Google OAuth MCP integration issue where credentials wouldn't sync to running pods. The solution creates a placeholder secret upfront and always mounts it, allowing Kubernetes to sync updates after OAuth completion. Additionally, it improves the INITIAL_PROMPT skip logic to use PARENT_SESSION_ID. Issues by Severity🚫 Blocker IssuesNone - Code is ready to merge from a blocking perspective. 🔴 Critical Issues1. Missing Context Variable (
2. Error Handling - Silent Failure on Secret Creation
🟡 Major Issues3. OwnerReferences API Version Mismatch
4. Idempotency Check Missing
🔵 Minor Issues5. Logging Improvement - Add Namespace Context
6. Code Organization - Large Function
7. Python - Slice Bounds Check
Positive Highlights✅ Excellent Problem Analysis - PR description clearly explains the race condition and solution flow ✅ Proper OwnerReference Usage - Sets ✅ Security Best Practice - Uses ✅ Good Label Usage - Proper labels for secret tracking ( ✅ Cleaner Logic - INITIAL_PROMPT skip logic simplified from marker file to PARENT_SESSION_ID (much better!) ✅ Follows Patterns - Secret structure matches existing patterns in codebase ✅ Documentation - Inline comments explain the sync timing and purpose Recommendations (Prioritized)Must Fix Before Merge
Should Fix Before Merge
Nice-to-Have (Can Address Later)
Testing RecommendationsBeyond the testing checklist in the PR description, verify:
Overall AssessmentQuality Score: 7.5/10 The PR solves a real problem with a sound architectural approach. The main issues are around following established patterns from CLAUDE.md:
Once the critical issues are addressed, this will be a solid contribution. The INITIAL_PROMPT simplification is a particularly good improvement. Review generated by Claude Code following project standards from CLAUDE.md, .claude/context/, and .claude/patterns/ 🔍 View AI decision process (logs available for 90 days) 📋 View memory system files loaded (click to expand)What Amber Loaded for Code ReviewAmber automatically loaded these repository standards from the memory system:
Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines. |
Summary
Fixes an issue where Google Workspace MCP integration wouldn't work if the user completed OAuth after starting a session.
Problem
Previously, the operator only mounted the Google OAuth secret if it existed when creating the Job:
Solution
Now the operator:
Optional: trueThis ensures:
refresh_google_credentials()detects when content appearsFlow After Fix
Testing