remove query comments #560 by dev-lew · Pull Request #749 · pgdogdev/pgdog

dev-lew · 2026-02-04T22:08:06Z

This PR addresses #560

It adds the remove_comments function, which will remove all comments from a query string except those specified by 0 or more regexes.

If there is a cache miss, we do another cache lookup after removing comments except for those used for pgdog metadata (pgdog_shard, etc) in the hope that we can increase cache hit rate.

Two associated helper functions were added as well.

One minor change was made to existing code in comments.rs to export the compiled regexes.

The previous approach is incorrect because we need a str representing the query instead of just a ScanResult. This approach reconstructs the query with string slicing by using the indices where comments are present as returned by the scanner.

…ments-#560

This version will accept a list of regexs to keep

Add ability to keep pgdog metadata comments anywhere in the string

Not preserving these would mean queries with comments would never match their commentless counterparts

…ments-#560

CLAassistant · 2026-02-04T22:08:19Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

dev-lew seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

dev-lew · 2026-02-04T22:09:05Z

pgdog/src/frontend/router/parser/comment.rs

+        .iter()
+        .any(|st| st.token == Token::CComment as i32 || st.token == Token::SqlComment as i32))
+}
+


It is possible that this function is over-generalized and may not need the except parameter.

dev-lew · 2026-02-04T22:10:38Z

pgdog/src/frontend/router/parser/cache/cache_impl.rs

            }
        }

+        if comment::has_comments(query.query(), ctx.sharding_schema.query_parser_engine)? {


The check is here because we might as well short circuit if there are no comments. There is no need to tokenize the query and iterate through it again for no reason.

dev-lew · 2026-02-04T22:25:28Z

pgdog/src/frontend/router/parser/comment.rs

+        let start = st.start as usize;
+        let end = st.end as usize;
+
+        out.push_str(&query[cursor..start]);


The code involving cursor keeps non token characters (between and at the end of all tokens) like spaces to preserve the original query. We don't want to do anything extra like normalize it or something.

levkk · 2026-02-05T03:07:56Z

pgdog/src/frontend/router/parser/comment.rs


+pub fn has_comments(query: &str, engine: QueryParserEngine) -> Result<bool, Error> {
+    let result = match engine {
+        QueryParserEngine::PgQueryProtobuf => scan(query),


One small concern I have here is we are calling scan multiple times. It's actually a pretty expensive call, because we have to go all the way to the Postgres parser, so we try to minimize it as much as possible (i.e. that's why we have the cache). It would be great if we could do this all inside fn comment, somehow, calling scan only once. Maybe we need to move comment detection a little higher in the call stack?

Worst case we call scan 3 times, one for has_comments, one for remove_comments, and one for comment if we get to with_context call and create an Ast. A simple solution is to make has_comments return the Option<Vec<ScanToken>> so that remove_comments can use it, that would make the worst case 2 scan calls.

If we wanted to call scan a single time in comment, maybe we have a hashmap that maps query to tokens and populate it on every call to comment. Then, has_comments and remove_comments can operate on the tokens in the hashmap for a given query. If its not in the hashmap, we can actually skip the entire block and go right to the with_context call. This should reduce the worst case to a single scan.

The memory usage of that hashmap is unclear to me though, its probably unbounded as long as pgdog is running...

I think an easier way to do this is to move comment detection up here:

pgdog/pgdog/src/frontend/router/parser/cache/cache_impl.rs

Line 122 in 73819ec

let entry = Ast::with_context(query, ctx, prepared_statements)?;

comment will return a String without the comments, iff the string had comments. Then you can check the cache again, and if it's a hit, return the cached Ast.

Else, pass the comment in as an argument for Ast struct creation, so you don't have to scan twice.

codecov · 2026-02-05T03:10:27Z

Codecov Report

❌ Patch coverage is 86.95652% with 9 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...dog/src/frontend/router/parser/cache/cache_impl.rs	62.50%	6 Missing ⚠️
pgdog/src/frontend/router/parser/comment.rs	94.33%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Logic for comment parsing happens before Ast creation resulting in fewer calls to scan.

dev-lew added 9 commits January 30, 2026 15:16

Create initial comment removal function

96ee4b6

Merge remote-tracking branch 'upstream' into dev-lew/Remove-query-com…

8f0000e

…ments-#560

Rename variable for clarity

9b5a0bc

Upload untested except version

d4dbe41

This version will accept a list of regexs to keep

Simplify removal code

0319387

Add ability to keep pgdog metadata comments anywhere in the string

Preserve non token characters

212e143

Not preserving these would mean queries with comments would never match their commentless counterparts

Merge remote-tracking branch 'upstream' into dev-lew/Remove-query-com…

1ace2e8

…ments-#560

Add retry logic in cache_impl.rs

f80d1e7

dev-lew commented Feb 4, 2026

View reviewed changes

levkk reviewed Feb 5, 2026

View reviewed changes

Refactor according to comments

fe745e1

Logic for comment parsing happens before Ast creation resulting in fewer calls to scan.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove query comments #560#749

remove query comments #560#749
dev-lew wants to merge 10 commits intopgdogdev:mainfrom
dev-lew:dev-lew/Remove-query-comments-#560

dev-lew commented Feb 4, 2026

Uh oh!

CLAassistant commented Feb 4, 2026

Uh oh!

dev-lew Feb 4, 2026

Uh oh!

dev-lew Feb 4, 2026

Uh oh!

dev-lew Feb 4, 2026

Uh oh!

levkk Feb 5, 2026

Uh oh!

dev-lew Feb 5, 2026 •

edited

Loading

Uh oh!

levkk Feb 5, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dev-lew commented Feb 4, 2026

Uh oh!

CLAassistant commented Feb 4, 2026

Uh oh!

dev-lew Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

dev-lew Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

dev-lew Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

levkk Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

dev-lew Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

levkk Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 5, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dev-lew Feb 5, 2026 •

edited

Loading

levkk Feb 5, 2026 •

edited

Loading