CG-10827: Fix language detection bug in monorepos with mixed languages #1063
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes a bug where the
--language pythonflag is ignored when initializing a codebase in a monorepo with both TypeScript and Python code.Problem
When a user explicitly specifies a language (e.g.,
--language python), the language detection logic still runs and overrides the user's choice in a monorepo context where there are more TypeScript files than Python files.Solution
Modified
ProjectConfig.from_path()andProjectConfig.from_repo_operator()to respect the explicitly provided language parameter and only run language detection if no language is specified.Improved the language detection in
_determine_language_by_git_file_count()to properly respect the specified subfolder path when counting files.Fixed the language parameter handling in
Codebaseconstructor to properly handle both string and enum values.Added unit tests to verify that:
Testing
Added unit tests in
tests/unit/codegen/sdk/codebase/test_language_detection.pyto verify the fixes.Fixes #479
💻 View my work • About Codegen