Skip to content

Commit 362fcdf

Browse files
h3n4lclaude
andcommitted
feat(mongodb): add MongoDB Shell (mongosh) parser
Add ANTLR grammar for parsing MongoDB shell syntax with Go code generation. Supported features: - Shell commands: show dbs, show databases, show collections - Database statements: db.collection.method(...) with method chains - Collection access: dot notation, bracket notation, getCollection() - Read methods: find(), findOne() - Cursor modifiers: sort(), limit(), skip(), projection(), project() - Helper functions as distinct AST nodes: ObjectId(), ISODate(), Date(), UUID(), Long(), NumberLong(), Int32(), NumberInt(), Double(), Decimal128(), NumberDecimal(), Timestamp(), RegExp() - Document syntax: unquoted/quoted keys, nested documents, arrays, trailing commas - Regex literals: /pattern/flags - Literals: strings (single/double quoted), numbers, booleans, null - Comments: line (//) and block (/* */) The 'new' keyword is intentionally not supported for helper functions. When users write 'new ObjectId()', the parser generates a helpful error message using ANTLR's NotifyErrorListeners mechanism. Also adds mongodb to the CI workflow parser list. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 12c6f17 commit 362fcdf

22 files changed

+10684
-1
lines changed

.github/workflows/tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ jobs:
3434
id: set-matrix
3535
run: |
3636
# List of all available parsers
37-
ALL_PARSERS="redshift postgresql cql snowflake tsql doris starrocks trino plsql googlesql mysql partiql tidb mariadb cosmosdb"
37+
ALL_PARSERS="redshift postgresql cql snowflake tsql doris starrocks trino plsql googlesql mysql partiql tidb mariadb cosmosdb mongodb"
3838
# Add more parsers here as they are added to the repository
3939
# ALL_PARSERS="redshift mysql postgresql"
4040

mongodb/Makefile

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
all: build test
2+
3+
build:
4+
@echo "Building MongoDB Shell parser..."
5+
antlr -Dlanguage=Go -package mongodb -visitor -o . MongoShellLexer.g4 MongoShellParser.g4
6+
7+
test:
8+
go test -v -run TestMongoShellParser
9+
10+
.PHONY: all build test

mongodb/MongoShellLexer.g4

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
/*
2+
* MongoDB Shell (mongosh) Lexer Grammar
3+
* For use with ANTLR 4
4+
*/
5+
6+
lexer grammar MongoShellLexer;
7+
8+
// Keywords
9+
SHOW: 'show';
10+
DBS: 'dbs';
11+
DATABASES: 'databases';
12+
COLLECTIONS: 'collections';
13+
DB: 'db';
14+
NEW: 'new';
15+
TRUE: 'true';
16+
FALSE: 'false';
17+
NULL: 'null';
18+
GET_COLLECTION: 'getCollection';
19+
GET_COLLECTION_NAMES: 'getCollectionNames';
20+
21+
// Helper function names (recognized as distinct tokens)
22+
OBJECT_ID: 'ObjectId';
23+
ISO_DATE: 'ISODate';
24+
DATE: 'Date';
25+
UUID: 'UUID';
26+
LONG: 'Long';
27+
NUMBER_LONG: 'NumberLong';
28+
INT32: 'Int32';
29+
NUMBER_INT: 'NumberInt';
30+
DOUBLE: 'Double';
31+
DECIMAL128: 'Decimal128';
32+
NUMBER_DECIMAL: 'NumberDecimal';
33+
TIMESTAMP: 'Timestamp';
34+
REG_EXP: 'RegExp';
35+
36+
// Cursor modifiers (methods)
37+
FIND: 'find';
38+
FIND_ONE: 'findOne';
39+
SORT: 'sort';
40+
LIMIT: 'limit';
41+
SKIP_: 'skip';
42+
PROJECTION: 'projection';
43+
PROJECT: 'project';
44+
45+
// Punctuation
46+
LPAREN: '(';
47+
RPAREN: ')';
48+
LBRACE: '{';
49+
RBRACE: '}';
50+
LBRACKET: '[';
51+
RBRACKET: ']';
52+
COLON: ':';
53+
COMMA: ',';
54+
DOT: '.';
55+
SEMI: ';';
56+
57+
// Operators (for query operators like $gt, $lt, etc.)
58+
DOLLAR: '$';
59+
60+
// Comments - must come before REGEX_LITERAL to properly capture /* ... */
61+
LINE_COMMENT
62+
: '//' ~[\r\n]* -> channel(HIDDEN)
63+
;
64+
65+
BLOCK_COMMENT
66+
: '/*' .*? '*/' -> channel(HIDDEN)
67+
;
68+
69+
// Regex literal
70+
REGEX_LITERAL
71+
: '/' REGEX_BODY '/' REGEX_FLAGS?
72+
;
73+
74+
fragment REGEX_BODY
75+
: REGEX_CHAR+
76+
;
77+
78+
fragment REGEX_CHAR
79+
: ~[/\r\n\\]
80+
| '\\' .
81+
;
82+
83+
fragment REGEX_FLAGS
84+
: [gimsuy]+
85+
;
86+
87+
// Numbers
88+
NUMBER
89+
: '-'? INT ('.' [0-9]+)? EXPONENT?
90+
| '-'? '.' [0-9]+ EXPONENT?
91+
;
92+
93+
fragment INT
94+
: '0'
95+
| [1-9] [0-9]*
96+
;
97+
98+
fragment EXPONENT
99+
: [eE] [+-]? [0-9]+
100+
;
101+
102+
// Strings - both single and double quoted
103+
DOUBLE_QUOTED_STRING
104+
: '"' (ESC | ~["\\])* '"'
105+
;
106+
107+
SINGLE_QUOTED_STRING
108+
: '\'' (ESC | ~['\\])* '\''
109+
;
110+
111+
fragment ESC
112+
: '\\' (["\\/bfnrt] | UNICODE | '\'')
113+
;
114+
115+
fragment UNICODE
116+
: 'u' HEX HEX HEX HEX
117+
;
118+
119+
fragment HEX
120+
: [0-9a-fA-F]
121+
;
122+
123+
// Identifiers - for unquoted keys, collection names, method names
124+
// Allows $-prefixed identifiers for MongoDB operators like $gt, $in, etc.
125+
IDENTIFIER
126+
: [$_a-zA-Z] [$_a-zA-Z0-9]*
127+
;
128+
129+
// Whitespace
130+
WS
131+
: [ \t\r\n]+ -> channel(HIDDEN)
132+
;

0 commit comments

Comments
 (0)