Skip to content

Commit 8833e62

Browse files
authored
feat(perf): Add Unicode support to SQL parser (#59115)
This is rare, but happens sometimes. Closes JAVASCRIPT-2NQW again.
1 parent b3a134e commit 8833e62

File tree

2 files changed

+8
-2
lines changed

2 files changed

+8
-2
lines changed

static/app/views/starfish/utils/sqlish/SQLishParser.spec.tsx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,9 @@ describe('SQLishParser', function () {
1818
'columns AS `tags[column]`', // ClickHouse backtics
1919
'SELECT * FROM #temp', // Temporary tables
2020
'# Fetches', // Comments
21-
'\r\n', // Windows newlinse
21+
'\r\n', // Windows newlines
22+
'✌🏻', // Emoji
23+
'ă', // Unicode
2224
'SELECT id, nam*', // Truncation
2325
'AND created >= :c1', // PHP-Style I
2426
'LIMIT $2', // PHP-style II

static/app/views/starfish/utils/sqlish/sqlish.pegjs

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,5 +35,9 @@ CollapsedColumns
3535
Whitespace
3636
= Whitespace:[\n\t\r ]+ { return { type: 'Whitespace', content: Whitespace.join("") } }
3737

38+
// \u00A0-\uFFFF is the entire Unicode BMP _including_ surrogate pairs and
39+
// unassigned code points, which aren't parse-able naively. A more precise
40+
// approach would be to define all valid Unicode ranges exactly but for
41+
// permissive parsing we don't mind the lack of precision.
3842
GenericToken
39-
= GenericToken:[a-zA-Z0-9"'`_\-.=><:,*;!\[\]?$%|/\\@#&~^+{}]+ { return { type: 'GenericToken', content: GenericToken.join('') } }
43+
= GenericToken:[a-zA-Z0-9\u00A0-\uFFFF"'`_\-.=><:,*;!\[\]?$%|/\\@#&~^+{}]+ { return { type: 'GenericToken', content: GenericToken.join('') } }

0 commit comments

Comments
 (0)