Skip to content

Conversation

@jimhester
Copy link

This file has some quoted fields with escaped quotes inside of them,
which causes problems when doing multi-threaded reading in readr 2.0.0.

Forcing only a single thread allows the file to be parsed as intended.

This file has some quoted fields with escaped quotes inside of them,
which causes problems when doing multi-threaded reading in readr 2.0.0.

Forcing only a single thread allows the file to be parsed as intended.
@marineleroi
Copy link

marineleroi commented Jul 13, 2021

@jimhester I've updated the readr package, I do see a warning message when using the spir_indicator() function, but the function runs and the data frame looks good. Do you know by any chance why escaped quotes would result in such multi-threading issues?

@jimhester
Copy link
Author

jimhester commented Jul 13, 2021

You need to use the current development version of readr to see the issue. The way the multi-threading works the other threads don't know if they are in a quoted field or not. There is automatic detection with a fallback to single threaded mode if an unexpected newline is encountered, but in this case the issue is with the embedded quotation, not a newline so the fallback does not trigger.

@marineleroi
Copy link

Thanks for the details! Wouldn't it be cleaner to patch readr to detect embedded quotations as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants