-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
pr: uniformly scan for form feed and newline chars #10303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Merging this PR will degrade performance by 3.26%
Performance Changes
Comparing Footnotes
|
|
GNU testsuite comparison: |
1 similar comment
|
GNU testsuite comparison: |
Remove unit test for `pr` that was enforcing incorrect behavior. These tests were ostensibly designed to match corresponding ones in the GNU test suite, but they don't match the current behavior of GNU `pr`, so it is not valuable to keep them.
Fix the way form feed characters are interpreted by changing the way lines and pages are found. Before this commit, a file comprising two form feed characters (`/f/f`) would result in too few trailing newlines at the end of the second page. After this change, each page is produced with the correct number of lines. This commit changes the way files are read, replacing complex iterators with a loop-based approach, iteratively scanning for newline or form feed characters. The `memchr` library is used to efficiently scan for these two characters. One downside of this implementation is that it currently reads the entire input file into memory; this can be improved in subsequent merge requests.
|
GNU testsuite comparison: |
| // TODO Read incrementally. | ||
| let buf = if path == "-" { | ||
| let mut f = stdin(); | ||
| let mut buf = vec![]; | ||
| f.read_to_end(&mut buf)?; | ||
| buf | ||
| } else { | ||
| std::fs::read(path)? | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated code with get_file_line_groups
please fix it in a different pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ) -> Result<Self, FromUtf8Error> { | ||
| // TODO Don't read bytes to String just to directly write them | ||
| // out again anyway. | ||
| let line_content = String::from_utf8(buf.to_vec())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't work with buff directly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to accommodate some existing code in this file that expects a String. It should be okay to just use the bytes, I will fix that in a new merge request. Thanks
Fix the way form feed characters are interpreted by changing the way
lines and pages are found. Before this commit, a file comprising two
form feed characters (
/f/f) would result in too few trailing newlinesat the end of the second page. After this change, each page is produced
with the correct number of lines.
This merge request changes the way files are read, replacing complex iterators
with a loop-based approach, iteratively scanning for newline or form
feed characters. The
memchrlibrary is used to efficiently scan forthese two characters. One downside of this implementation is that it
currently reads the entire input file into memory; this can be improved
in subsequent merge requests. The behavior of uutils
pris still quite different from GNUpr, so hopefully this is acceptable as a temporary debt to make many more straightforward improvements.This merge request also removes a unit test that was enforcing incorrect behavior. These
tests were ostensibly designed to match corresponding ones in the GNU
test suite, but most of them don't match the current behavior of GNU
pr, so itdoes not seem valuable to keep them. As we improve uutils
pr, we can add more targeted and minimal test cases to replace them.