fix: resolve intermittent invalid_grant error in cluster-level Google OAuth #496

Gkrumbach07 · 2026-01-07T14:35:11Z

Summary

Fixes intermittent invalid_grant errors when connecting Google Drive via the integrations page.

Problem

The OAuth2 callback handler was exchanging the authorization code twice for cluster-level OAuth flows:

First exchange happened in the generic HandleOAuth2Callback handler (line 282)
Then it parsed the state to detect cluster-level OAuth
Called HandleGoogleOAuthCallback which tried to exchange the same code again

Since OAuth authorization codes are single-use, the second exchange failed with invalid_grant.

Solution

Moved the cluster-level OAuth detection before any token exchange:

Parse the state parameter first to detect if it's a cluster-level flow
If cluster-level, route directly to HandleGoogleOAuthCallback (which does its own exchange) and return
If legacy session-specific, continue with the existing flow

This ensures the authorization code is only exchanged once.

Testing

Deployed to dev cluster and verified Google OAuth connection works consistently
Previously failed intermittently, now succeeds reliably

…ration - Moved error handling to the beginning of the OAuth2 callback process for improved clarity and efficiency. - Introduced early detection of cluster-level OAuth flows, allowing for specific handling of Google OAuth callbacks. - Consolidated callback data creation to reduce redundancy and enhance readability. - Updated legacy session-specific OAuth flow handling to ensure compatibility with new cluster-level logic. This refactor enhances the overall structure and maintainability of the OAuth2 callback handling, aligning with recent improvements in session management and Google OAuth integration.

github-actions · 2026-01-07T14:38:38Z

Claude Code Review

Summary

This PR successfully fixes the intermittent invalid_grant error in cluster-level Google OAuth by preventing double authorization code exchange. The fix moves cluster-level OAuth detection before token exchange, which is the correct approach since OAuth codes are single-use. The refactoring improves code organization and maintainability.

Issues by Severity

🚫 Blocker Issues

None - the PR is safe to merge.

🔴 Critical Issues

None identified.

🟡 Major Issues

1. Silent Error Handling for State Parsing (oauth.go:262-286)

stateBytes, err := base64.URLEncoding.DecodeString(strings.Split(state, ".")[0])
if err == nil {  // ⚠️ Silently ignores decode errors
    if jsonErr := json.Unmarshal(stateBytes, &stateMap); jsonErr == nil {  // ⚠️ Silently ignores unmarshal errors
        // ...
    }
}

Issue: Errors during base64 decoding or JSON unmarshaling are silently ignored. This makes debugging difficult if the state format is corrupted or tampered with.

Recommendation: Add logging for parsing failures:

stateBytes, err := base64.URLEncoding.DecodeString(strings.Split(state, ".")[0])
if err != nil {
    log.Printf("Warning: failed to decode state parameter: %v (continuing with legacy flow)", err)
} else {
    if jsonErr := json.Unmarshal(stateBytes, &stateMap); jsonErr != nil {
        log.Printf("Warning: failed to parse state JSON: %v (continuing with legacy flow)", jsonErr)
    } else {
        // Check for cluster-level OAuth...
    }
}

2. Missing Bounds Check on strings.Split (oauth.go:262)

strings.Split(state, ".")[0]  // ⚠️ Assumes at least 1 element

Issue: If state is empty or doesn't contain a ".", strings.Split will return a slice, but accessing [0] is still safe. However, this could be more explicit.

Recommendation: Add explicit check or comment explaining why this is safe:

stateParts := strings.Split(state, ".")
if len(stateParts) > 0 {
    stateBytes, err := base64.URLEncoding.DecodeString(stateParts[0])
    // ...
}

🔵 Minor Issues

1. Missing Unit Tests
The OAuth handler code lacks unit tests. While the fix is straightforward, adding tests would prevent future regressions.

Recommendation: Add tests for:

Cluster-level OAuth flow (code exchange happens once)
Legacy session-specific OAuth flow (code exchange happens once)
Invalid state format handling
Error cases (missing code, invalid state)

2. Code Duplication in HTML Responses (oauth.go:273-282)
The success/error HTML responses are duplicated. Consider extracting to helper functions:

func renderOAuthSuccess(c *gin.Context) {
    c.Data(http.StatusOK, "text/html; charset=utf-8", []byte(
        "<html><body><h1>Authorization Successful!</h1>...",
    ))
}

3. Error Response Uses StatusOK Instead of Error Code (oauth.go:273)

c.Data(http.StatusOK, "text/html; charset=utf-8", []byte(
    "<html><body><h1>Authorization Error</h1>...",
))

Issue: Error responses return HTTP 200 OK. While this may be intentional for user-facing OAuth flows (to render the error page), it's worth documenting why.

Recommendation: Add a comment explaining this is intentional for user experience, or consider using 4xx/5xx status codes if the frontend can handle them gracefully.

Positive Highlights

✅ Excellent Problem Diagnosis: The PR description clearly explains the root cause (double code exchange) and solution.

✅ Correct Fix Strategy: Moving cluster-level detection before token exchange is the right architectural approach.

✅ Clear Code Comments: The IMPORTANT comment on line 259-260 explains why the order matters, which is invaluable for future maintainers.

✅ Error Handling: The fix properly handles errors from HandleGoogleOAuthCallback and returns user-friendly messages while logging details server-side.

✅ Backward Compatibility: The legacy session-specific OAuth flow remains intact, ensuring no breaking changes.

✅ Security: No tokens are logged, following established security patterns.

Recommendations

High Priority

Add logging for state parsing failures (see Major Issue Outcome: Reduce Refinement Time with agent System #1)
Add unit tests to prevent future regressions

Medium Priority

Document the HTTP 200 for error responses or reconsider status codes
Extract HTML response helpers to reduce duplication

Low Priority

Consider adding integration test that verifies code is only exchanged once

Architecture Alignment

✅ Follows Error Handling Patterns: Errors are logged with context, generic messages returned to users
✅ Follows Security Standards: No tokens in logs, appropriate error messages
✅ Code Organization: Handler logic properly structured
⚠️ Testing: Missing test coverage for this critical OAuth flow

Final Assessment

Status: ✅ APPROVED - Safe to Merge

This is a well-executed bug fix that solves a real production issue. The refactoring improves code organization and the fix is architecturally sound. The main gaps are in observability (silent error handling) and test coverage, which can be addressed in follow-up work without blocking this PR.

The fix prevents a critical OAuth failure and should be merged to resolve the intermittent invalid_grant errors affecting users.

🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

CLAUDE.md - Master project instructions, development standards
backend-development.md - Go backend, K8s integration patterns
frontend-development.md - NextJS, Shadcn UI, React Query patterns
security-standards.md - Auth, RBAC, token handling
k8s-client-usage.md - User token vs service account patterns
error-handling.md - Consistent error patterns
react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

Gkrumbach07 merged commit be90485 into ambient-code:main Jan 7, 2026
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: resolve intermittent invalid_grant error in cluster-level Google OAuth #496

fix: resolve intermittent invalid_grant error in cluster-level Google OAuth #496

Uh oh!

Gkrumbach07 commented Jan 7, 2026

Uh oh!

github-actions bot commented Jan 7, 2026 •

edited

Loading

What Amber Loaded for Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: resolve intermittent invalid_grant error in cluster-level Google OAuth #496

fix: resolve intermittent invalid_grant error in cluster-level Google OAuth #496

Uh oh!

Conversation

Gkrumbach07 commented Jan 7, 2026

Summary

Problem

Solution

Testing

Uh oh!

github-actions bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review

Summary

Issues by Severity

🚫 Blocker Issues

🔴 Critical Issues

🟡 Major Issues

🔵 Minor Issues

Positive Highlights

Recommendations

High Priority

Medium Priority

Low Priority

Architecture Alignment

Final Assessment

What Amber Loaded for Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Jan 7, 2026 •

edited

Loading