Java: Add new quality query to detect String#replaceAll with non-regex first argument#19115
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds a new quality query to detect cases where String#replaceAll is used with a non-regular expression as the first argument, recommending the use of String#replace for better performance.
- Introduces a test case to validate the new query.
- Adds a change note to document the new quality query.
- Provides a performance guideline detailing the issue and recommendation.
Reviewed Changes
Copilot reviewed 3 out of 6 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| java/ql/test/query-tests/StringReplaceAllWithNonRegex/Test.java | Added test cases demonstrating compliant and non-compliant usage. |
| java/ql/src/change-notes/2025-03-25-string-replace-all-with-non-regex.md | Added change note for the new quality query. |
| java/ql/src/Performance/StringReplaceAllWithNonRegex.md | Added performance guideline explaining why String#replace is preferable when a non-regex literal is used. |
Files not reviewed (3)
- java/ql/src/Performance/StringReplaceAllWithNonRegex.ql: Language not supported
- java/ql/test/query-tests/StringReplaceAllWithNonRegex/StringReplaceAllWithNonRegex.expected: Language not supported
- java/ql/test/query-tests/StringReplaceAllWithNonRegex/StringReplaceAllWithNonRegex.qlref: Language not supported
Tip: Copilot code review supports C#, Go, Java, JavaScript, Markdown, Python, Ruby and TypeScript, with more languages coming soon. Learn more
| ## References | ||
|
|
||
| - Java SE Documentation: [String.replaceAll](https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/lang/String.html#replaceAll(java.lang.String,java.lang.String)). | ||
| - Common Weakness Enumeration: [CWE-1176](https://cwe.mitre.org/data/definitions/1176.html) |
There was a problem hiding this comment.
Minor: a . is missing from the end of line.
| //only contains characters that could be a simple string | ||
| firstArg.getValue().regexpMatch("^[a-zA-Z0-9]+$") | ||
| select replaceAllCall, | ||
| "This call to 'replaceAll' should be a call `replace` as its $@ is not a regular expression.", |
There was a problem hiding this comment.
Does ` work in the alert message? replaceAll and replace should probably be quoted the same way.
There was a problem hiding this comment.
In VS Code, ` doesn't seem to do anything, so I've changed it to '.
|
I ran MRVA. There are 417 results in the top 1,000 repos. 114 repos had at least one result. The most in one repo is 86 but that's an outlier - the next highest is 15. All the results look like TPs to me. |
|
I have validated autofixes. As I expected, they are all good (though we tend to fix multiple instances at once rather than just the specified one). So I think this is ready to merge now. |
|
@knewbury01 for awareness. |
tamasvajk
left a comment
There was a problem hiding this comment.
Should this query be added to the ccr.qls code-quality.qls?
|
I will review this for Docs today. |
mchammer01
left a comment
There was a problem hiding this comment.
@owen-mc 👋🏻 - Approving on behalf of Docs ✨
Highlighted minor things + asked some questions for my own understanding.
|
|
||
| ## Overview | ||
|
|
||
| The underlying implementation of `String#replaceAll` uses `Pattern#compile` and expects a regular expression as its first argument. However in cases where the argument could be represented by just a plain `String` that does not represent an interesting regular expression, a call to `String#replace` may be more performant as it does not need to compile the regular expression. |
There was a problem hiding this comment.
For my own understanding: what is an "interesting" regular expression? 🤔
Also the clause inside "however" is a bit long. Would it be possible to shorten/simplify it?
There was a problem hiding this comment.
Here, "interesting" means "uses any of the special syntax allowed in regular expressions to do more than just match text verbatim". I will think about how to improve.
|
|
||
| ## References | ||
|
|
||
| - Java SE Documentation: [String.replaceAll](https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/lang/String.html#replaceAll(java.lang.String,java.lang.String)). |
There was a problem hiding this comment.
Minor. I don't see String.replaceAll mentioned at all in the link provided. This seems to link to the String class.
There was a problem hiding this comment.
It should go to the replaceAll method in the page on the String class. It seems to work for me.
| /** | ||
| * @id java/string-replace-all-with-non-regex | ||
| * @name Use of `String#replaceAll` with a first argument which is not a regular expression | ||
| * @description Using `String#replaceAll` is less performant than `String#replace` when the first |
There was a problem hiding this comment.
As you know, I am not a developer, so I would be grateful if you could explain what the problem is with this being less performant. Does it take more time to execute?
Would something like be correct/acceptable?
(Sorry I hadn't hear of the adjective "performant" before. Discovered since that it is indeed a word in the engineering/computing area 😅 )
Using String#replaceAll instead of String#replace when the first argument is not a regular expression causes performance issues.
Feel free to ignore if you think that developers will understand what this query does with the description as-is.
mchammer01
left a comment
There was a problem hiding this comment.
@owen-mc - thanks for the tweaks, LGTM ⚡
| ## Overview | ||
|
|
||
| The underlying implementation of `String#replaceAll` uses `Pattern#compile` and expects a regular expression as its first argument. However in cases where the argument could be represented by just a plain `String` that does not represent an interesting regular expression, a call to `String#replace` may be more performant as it does not need to compile the regular expression. | ||
| The `String#replaceAll` method is designed to work with regular expressions as its first parameter. When you use a simple string without any regex patterns (like special characters or syntax), it's more efficient to use `String#replace` instead. This is because `replaceAll` has to compile the input as a regular expression first, which adds unnecessary overhead when you are just replacing literal text. |
There was a problem hiding this comment.
Love it, it is much clearer to me ✨
There was a problem hiding this comment.
The credit should go to Claude 3.7 Sonnet 😆 .
There was a problem hiding this comment.
Glad to see that someone is using Copilot 🤣
| * @name Use of `String#replaceAll` with a first argument which is not a regular expression | ||
| * @description Using `String#replaceAll` is less performant than `String#replace` when the first | ||
| * argument is not a regular expression. | ||
| * @description Using `String#replaceAll` with a first argument which is not a regular expression |
There was a problem hiding this comment.
👍🏻 for replacing "performant" with "more efficient"
CWE-1176: Inefficient CPU Computation
de3d3e4 to
acfcc6d
Compare
|
I think this is now ready to merge. |
| String s1 = "test"; | ||
| s1 = s1.replaceAll("t", "x"); // NON_COMPLIANT | ||
| s1 = s1.replaceAll(".*", "x"); // COMPLIANT |
There was a problem hiding this comment.
Just a drive-by comment after this is already merged - would it be better to label these cases as GOOD and BAD as is more common in qhelp docs; and also include a case that uses replace?
There was a problem hiding this comment.
We've been using COMPLIANT and NON_COMPLIANT for all the queries we are porting. If we do want to change it, the most efficient way would be to wait until they're all ported and then do a find-replace. Otherwise we have to remember to do it 40 times.
I can see a small benefit in including something like the following in the example (and tests, I guess). Is that what you mean?
s1 = s1.replace("t", "x"); // COMPLIANT
Added a new quality query,
java/string-replace-all-with-non-regex, to detect uses ofString#replaceAllwhen the first argument is not a regular expression, which should useString#replaceinstead for better performance.