Skip to content

Conversation

@xal-0
Copy link
Member

@xal-0 xal-0 commented Dec 5, 2025

The return value for the jl_fs_write/read/sendfile functions is an int, even through they all take size_t length arguments, resulting in potentially truncated counts for large file operations. Instead, on success (including partial read/write), we should return the uv_fs_t.result field.

I have added a test for this but have left it disabled because it allocates a 5 GiB Vector to read into.

One consequence of the truncated return values is that we previously mangled large files while copying them with sendfile. If the return value is truncated to a random negative value, it manifests as #30723 or #39868. If it is truncated to a positive value, sendfile will create a very large destination file with thousands of repeated sections of the source file, causing #56537. The bug will only manifest if LLVM decides to zero-extend the return value.

This also changes jl_fs_write to return the actual number of bytes written, rather than the number of bytes it was asked to write. If a write returns EAGAIN (usually because the file is O_NONBLOCK/O_NODELAY), we never see the actual number of bytes written.

@xal-0 xal-0 requested a review from vtjnash December 5, 2025 02:20
@xal-0 xal-0 added io Involving the I/O subsystem: libuv, read, write, etc. bugfix This change fixes an existing bug labels Dec 5, 2025
The return value for the `jl_fs_write/read/sendfile` functions is an `int`, even
through they all take `size_t` length arguments, resulting in potentially
truncated counts for large file operations.  Instead, on success (including
partial read/write), we should return the `uv_fs_t.result` field.

I have added a test for this but have left it disabled because it allocates a 5
GiB Vector to read into.

One consequence of the truncated return values is that we previously mangled
large files while copying them with sendfile.  If the return value is truncated
to a random negative value, it manifests as JuliaLang#30723 or JuliaLang#39868.  If it is
truncated to a positive value, sendfile will create a very large destination
file with thousands of repeated sections of the source file, causing JuliaLang#56537.
The bug will only manifest if LLVM decides to zero-extend the return value.

This also changes `jl_fs_write` to return the actual number of bytes written,
rather than the number of bytes it was asked to write.  If a write returns
EAGAIN (usually because the file is O_NONBLOCK/O_NODELAY), we never see the
actual number of bytes written.
@xal-0 xal-0 force-pushed the raw-file-io-retval branch from 633307e to 0dae450 Compare December 5, 2025 02:31
xal-0 added 3 commits December 5, 2025 09:08
We use raw Base.Filesystem.File IO on PTYs in the REPL precompile script.  Due
to JuliaLang#24440, spawning a subprocess using the REPL's shell mode clears the
O_NONBLOCK flag set on the PTY, resulting in deadlocks when libuv later reads or
writes to it from Julia code.  I haven't identified what caused this to become
more of a problem recently, but I don't think some extra compilation latency
when starting shell mode is worth it if it's causing mystery timeouts in CI.
@xal-0
Copy link
Member Author

xal-0 commented Dec 5, 2025

WIP -- I grabbed e977d6f from #60326 to see if this change broke something on x86_64-darwin or if it is the known problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix This change fixes an existing bug io Involving the I/O subsystem: libuv, read, write, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant