-
-
Notifications
You must be signed in to change notification settings - Fork 11
fix: better align parsing with elixir's matched/unmatched/no-parens #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
31f6b5a to
def1e3d
Compare
| {:error, | ||
| [ | ||
| {:->, [line: 1, column: 3], [[{:a, [line: 1, column: 1], nil}], {:__block__, [], []}]}, | ||
| {:->, [line: 1, column: 3], [[{:a, [line: 1, column: 1], nil}], nil]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The elixir parser allows nil here, this PR changes that to align with the elixir parser so this recovery scenario test needed to be updated
This comment was marked as resolved.
This comment was marked as resolved.
961c494 to
feefaa5
Compare
feefaa5 to
7f8b273
Compare
| {ast, parser} | ||
| end | ||
|
|
||
| rhs = build_block_nr(exprs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbh, I have completely forgotten what _nr was supposed to mean 😭
| end | ||
| end | ||
|
|
||
| # Dot expression for struct types - never parses call arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this like %foo.bar{} ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
Supercedes #87
I had originally intended to break down this into multiple PRs like #87, but changing one part of the parser often broke another or didn't fully fix the issues, so I apologize for the large diff. Tackling the root cause ended up being a larger change but it fixed a whole family of parser issues that would have taken 100 PRs applying bandaids. This is also the reason this PR includes some tests from that PR.
This PR brings the
with_contexthelper from that PR to help scope the parser context to specific parsing contexts.The main issue this PR tackles is how matched, unmatched and no-parens expressions are handled.
The elixir grammar distinguishes three kinds of expressions:
"matched": expressions with clear boundaries, like literals, parenthesized calls, tuples, lists, maps, identifiers, modules.
The key property of these is that they can safely appear as an operand without ambiguity.
"unmatched": expressions with do-end blocks. These are "unmatched" because their boundary is the end keyword, not a closing delimiter.
You cannot nest an unmatched expression inside another unmatched expression without parentheses:
if if true do true else false end do- invalidif(if true do true else false end) do- validOperators can take unmatched expressions as their RHS (e.g., x = if true do 1 end), which is why
unmatched_op_exprexists separately frommatched_op_exprin elixir_parser.yrlfoo a, b, c. These cannot be nested.No-parens expressions are further subdivided:
no_parens_one: single unambiguous arg (f a, f key: val)no_parens_many: multiple args (f a, b, c)no_parens_one_ambig: single arg that is itself a no_parens_many (f g a, b -> f(g(a, b)))The grammar enforces that operators with a no_parens RHS can only combine with matched/unmatched LHS — not another no_parens expression.
In Spitfire we don't work with grammar rules but we rather consume tokens as we go, so to achieve the same we have to resort to flags like
nesting,in_map,stab_stateetc on top of precedence checks). This PR expands that to mimic the grammar rules behavior in elixir. I clarify this because to explain the parser behavior I need to refer to the elixir grammar, but when you look at the diff in this PR it's mostly flags and extra checks, not explicit grammar rules.Some examples of the issues this fixes:
No-parens calls tended to consume too many tokens
Given
%{foo a, b => c}, Elixir parses this as%{foo(a, b) => c}becausefoo a, bis ano_parend_manyand=>is an operator that expectsmatched_expron both sides.Spitfire would parse this as
{foo(a, b =>c)}because it parsed no-parens calls at the@lowestprecedence regardless of context.The fix uses
assoc_opprecedence whenis_map == trueto forcefoo(a, b)to complete before continuing parsing the rhs of=>->would be parsed as infix in the wrong contexts->should only be allowed when wrapped around parens,do-endorfn-end, but never as a regular infix operator.Spitfire would treat
->as an infix operator controlled byis_stabbut this flag was not scoped.In code like
fn a, b -> (c -> d) end,is_stabfrom thefnbody leaked into the grouped(c -> d)or viceversa.Adding
stop_before_stab_op?and scoping it withwith_contexteach boundary is managed independentlykeyword lists not stopping at stab boundary
There is a grammar rule
call_args_no_parens_kw_expr -> kw_eol matched_expr, it basically says the value of a keyword pair in no-parens must be a matched expression, so in this code:fn key: expr -> body endexprstops at->because->can't appear in a matched expressionSpitfire would continue consuming tokens after
->because it wasn't aware of stabs in that context. This PR adds stab-aware functions to deal with this scenario. Also, differently to #87, this does so more generally so this takes care of the%{a do :ok end | b c, d => e, f => g}scenario that is broken in that PR.Some other issues that aren't strictly related to matched/unmatched/no-parens but were still necessary:
.no-parens calls didn't parse argsfoo.bar 1, 2is a valid no-parens call, but Spitfire parsed it asfoo.bar()followed by dangling1, 2. Checkingno_parensmetadata like elixir does inbuild_identifier) fixes thatblock_identifiernot treated as terminalIn the Elixir grammar,
block_identifier(else,rescue,catch,after) only appears in rules within do-blocks. In Spitfire,block_identifierwasn't in@terminalsor@peeksso the parser could consumeelse/rescueas expression tokens instead of stopping at block boundaries. For example, a stab body inside a do-block could run pastelse.A lot of those changes don't have corresponding tests in the PR specifically. I was working on fixing the issues in property tests added in #78, #80 and #81. The way I tested all this was to copy the new tests from those files into my branch and running them locally.
After the changes here: