Support non ASCII characters in MR Sets by gorcha · Pull Request #351 · WizardMac/ReadStat

gorcha · 2025-12-23T01:57:07Z

From tidyverse/haven#788 - the new MR set changes only support ascii characters in the set name in the ragel parser, but SPSS uses the file code page/UTF-8 for these.

This PR updates the parser to allow for non-ASCII characters, and runs text from the MR set through readstat_convert() to make sure the character encoding comes in correctly.

evanmiller · 2025-12-23T13:07:58Z

src/spss/readstat_sav_parse_mr_name.rl

    }

-    nc = (alnum | '_' | '.' ); # name character (including dots)
+    nc = ([^ =]); # name character (all characters except space and equals)


This seems excessively lax?

My thinking was that we're just using this for pattern matching to extract the components of the field rather than actually validating the contents so it doesn't matter too much if it lines up with actual allowed characters.

The attempt to peg it to actual allowed characters in the current version is what's causing the error in tidyverse/haven#788 (and it would be a massive pain to build a character class that matches all potential valid chars from UTF-8 and other encodings). Can tighten it up a bit if there are particular things you're worried about?

Hey @evanmiller, just following up - keen to get this merged so we can pull the change into haven. Thanks!

gorcha · 2026-01-24T04:23:25Z

Thanks @evanmiller!

Support non ASCII character sets in MR Sets

4e41124

evanmiller reviewed Dec 23, 2025

View reviewed changes

evanmiller merged commit a4984d5 into WizardMac:dev Jan 23, 2026
11 of 12 checks passed

gorcha mentioned this pull request Feb 2, 2026

Allow MR sets with no variables #358

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support non ASCII characters in MR Sets#351

Support non ASCII characters in MR Sets#351
evanmiller merged 1 commit intoWizardMac:devfrom
gorcha:sav-mrset-convert

gorcha commented Dec 23, 2025

Uh oh!

evanmiller Dec 23, 2025

Uh oh!

gorcha Dec 31, 2025

Uh oh!

gorcha Jan 23, 2026

Uh oh!

Uh oh!

gorcha commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gorcha commented Dec 23, 2025

Uh oh!

evanmiller Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

gorcha Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gorcha Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gorcha commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants