Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PosixFileStream for files on POSIX #1855

Merged
merged 9 commits into from
Jan 9, 2025

Conversation

BCSharp
Copy link
Member

@BCSharp BCSharp commented Dec 30, 2024

This PR replaces FileStream with PosixFileStream on .NET/POSIX (not Mono), which offers unbuffered access to the file though its actual file descriptor. This addresses the issue #1846 on .NET/POSIX.

I decided to write own implementation of the stream, rather than encapsulating UnixStream. Not only the implementation had similar amount of work involved and complexity as a proxy class to UnixStream, but also it allowed me to make different implementation choices than UnixStream, some of which I would call problematic. The best example is UnixStream.Close(), which retries syscall to close if interrupted. This will result in error EBADF at best, and close an unrelated file descriptor at worst. It's because, on most systems (including Linux and macOS), close always closes the descriptor, even it the call fails with an error, and retry is not appropriate. See CPython source code.

Another effect of this PR is that for .NET/POSIX, there are no more "double streams", meaning that operations on file descriptors behave as expected in all cases (modulo bugs and missing pieces).

The performance of this implementation is roughly the same as CPython's. When I first tested it, I shockingly discovered that it was 2.5 times slower, but then noticed that StreamBox always calls Flush after each write and that was slowing things down significantly. I included a hack to avoid it for PosixFileStream and it helped.

Module mmap is due for some cleanup, which is outside of the scope of this PR.

Unfortunately I had to leave Mono behind, that is, on FileStream. Between the limitations/bugs of MemoryMappedFile interface and FileStream, using PosixFileStream on Mono (or even UnixStream) would create a regression in mmap.

This PR also makes sure that all file descriptors created via open or io.open on .NET/POSIX are non-inheritable, in line with #1225.


// a hack but it does improve perfomance a lot if flushing is not needed
_writeNeedsFlush = writeStream switch {
FileStream => true, // FileStream is buffered by default, used on Windows and mostly Mono
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory the FileStream should already be unbuffered since the FileIO.OpenFile helper uses a buffer size of 1. I'm not sure if it's true in practice, but it looks like at least .NET (Core) has code to handle it...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just checked the source code of .NET Framework 4.8 and it also performs unbuffered writes when buffer size is 1. I suspect it is the same on 4.6.2. On Mono, however, buffer bypass only happens if the data length is > buffer size, meaning that single-byte writes still land in the 1-byte buffer, not on the disk. We could then optimize flushing by only flushing if IsMono and data length is 1.

But FileIO.OpenFile is not the only way of creating FileIO. One can use os.open to get a file descriptor and then open a file. Currently, PythonNT.open, when creating FileStream, uses buffer of 4K (DefaultBufferSize), which is defined since the initial commit 11 years ago. I will change it to 1 to get the consistent unbuffered behaviour, unless you have counter arguments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also open with a buffer size of 0 if that would help with Mono...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Mono, .NET, and .NET Framework all throw for buffer size <= 0.

Copy link
Contributor

@slozier slozier Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, in the FileStreamOptions.BufferSize docs it says 0 or 1 to disable buffering. I was assuming it'd simply pass along the value to the other constructor. Anyway, it looks like .NET Framework does indeed throw for <= 0 so I guess it doesn't help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, .NET doesn't throw with 0, I was speaking from memory, must have gotten confused with another framework. Or maybe it was throwing in the past (older framework versions).

Anyway, the issue is with Mono, and it does throw at 0, and it does buffer at 1.

_writeNeedsFlush = writeStream switch {
FileStream => true, // FileStream is buffered by default, used on Windows and mostly Mono
PosixFileStream => false, // PosixFileStream is unbuffered, used on .NET/Posix
UnixStream => false, // UnixStream is unbuffered, used sometimes on Mono
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is causing the test failures since it'll try to load the Mono.Posix assembly (which isn't packaged along with Windows builds).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it makes sense... The failure on Linux is a bit more interesting as it is failing when dealing with a file opened in the "a" mode for simultaneous logging.

@BCSharp BCSharp marked this pull request as draft January 1, 2025 00:27
Copy link
Contributor

@slozier slozier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you changed it to draft. Did you have additional changes you wanted to do?

@@ -15,6 +15,8 @@
using System.Numerics;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Runtime.Serialization;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Runtime.Serialization is VS trying to be helpful?

@BCSharp
Copy link
Member Author

BCSharp commented Jan 4, 2025

mmap needs more work. Coming soon.

@BCSharp
Copy link
Member Author

BCSharp commented Jan 6, 2025

It was a non-trivial merge. mmap is still not ready (e.g. resize and file descriptor life cycle is incorrect), and added tests should expose some problems. @slozier, I see you are also currently working on mmap, I hope we won't step on each other toes 😄

@BCSharp BCSharp marked this pull request as ready for review January 8, 2025 01:12
@BCSharp
Copy link
Member Author

BCSharp commented Jan 8, 2025

Ready to merge. Any spellling errors? 😃

Copy link
Contributor

@slozier slozier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Just a few comments.

stream = new Mono.Unix.UnixStream(res, ownsHandle: true);
// Elaborate workaround on Mono to avoid UnixStream as out
stream = new Mono.Unix.UnixStream(res, ownsHandle: false);
FileAccess fileAccess = stream.CanRead ? stream.CanWrite ? FileAccess.ReadWrite : FileAccess.Read : FileAccess.Write;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know the UnixStream details, but this seems to say that not being readable implies writable? Could there be odd cases where this isn't true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think it is possible. It's because there is actually no flag "readable". O_RDONLY is actually lack of any flags, i.e. 0b00. One can only request a writable file by setting flags 0b01 (non-readable writable) or 0b10 (readable writable). These two flags are mutually-exclusive, i.e. setting 0b11 results in EINVAL.

However, in other places I have the test inverted, i.e. stream.CanWrite ? stream.CanRead ?, which better reflects how those flags operate, so for consistency and clarity I'm going to change it here too. In other words, not being writable implies readable.

// Use PosixFileStream to operate on fd directly
// On Mono, we must use FileStream due to limitations in MemoryMappedFile
Stream s = PosixFileStream.Open(path, flags, unchecked((uint)mode), out int fd);
//Stream s = PythonIOModule.FileIO.OpenFilePosix(path, flags, mode, out int fd);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the OpenFilePosix comment still relevant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I missed that one. I kept it around to switch back and forth between implementations when hunting for regressions.

// Use PosixFileStream to operate on fd directly
// On Mono, we must use FileStream due to limitations in MemoryMappedFile
var stream = PosixFileStream.Open(name, flags, 0b_110_110_110, out int fd); // mode: rw-rw-rw-
//var stream = OpenFilePosix(name, flags, 0b_110_110_110, out int fd);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the OpenFilePosix comment still relevant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I missed that one too…

}
if (RuntimeInformation.IsOSPlatform(OSPlatform.Linux) || RuntimeInformation.IsOSPlatform(OSPlatform.OSX)) {
// On POSIX, register the file descriptor with the file manager right after file opening
// Because the .NET case is already handled above before `switch`, this branch runs only on Mono
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add Debug.Assert(ClrModule.IsMono)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I tent to make assertions when the asserted assumptions not only are true but have to be true for the code to behave correctly. In other words, to catch the violation of the assumption early rather than risking failures later with potentially obscured errors. This code will behave correctly even if .NET enters it, e.g. the if before the switch gets (temporarily) commented out or disabled in any other way (bug workarounds for any .NET versions?). The comment was meant as informational to prevent anyone (incl. myself) in the future from trying to "optimize away" this block by deleting it.

Anyway, I have no problem of adding the assert here if you think that will be helpful. Or I can change the comment from "only on Mono" to "needed for Mono"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updating the comment works for me too.

(if any), plus `Dispose` of it if it owned the file stream.

`SafeFileHandle` closes the underlying file descriptor on dispose and sets it to invalid. It is OK
to close the handle several times, or even if it add-reffed and in use somewhere else. The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to merge. Any spellling errors? 😃

If you insist... assuming add-reffed should only have one f? But more importantly ass below. 😆

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, like to reef -> reefed? But "ee" is a long vowel. To maintain the short "e", I doubled the consonant, like in "to pet" -> "petted". I am inventing a new verb here so it's an uncharted territory.

But more importantly, I'm sorry to let "closed ass" go... 😄

if (Mono.Unix.Native.Syscall.fstat((int)handle.DangerousGetHandle(), out Mono.Unix.Native.Stat status) == 0) {
size = status.st_size;
} else {
Mono.Unix.Native.Errno errno = Mono.Unix.Native.Stdlib.GetLastError();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't use the PythonNT.GetLastUnixError here?

Copy link
Member Author

@BCSharp BCSharp Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PythonNT.GetLastUnixError cannot be used here (yet) because it does not have proper platform guards (yet). Adding the platform guards to PythonNT.GetLastUnixError has a cascading effect to the code. I am saving it for a separate PR (soon), not to mix up too much with this one.

_view = _file.CreateViewAccessor(_offset, newsize, _fileAccess);
return;
} catch {
close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this close might be problematic (it won't release resources since _refCount is 2 at this point). Anyway, the Windows version has the same issue which I still need to figure out for #1866 so we can leave it as-is for now. (Unless the "proper" solution is obvious to you and you feel like taking it on).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is problematic. I didn't look into the solutions, but the way close and CloseWorker work perplexes me. If the code calls close while _refCount is > 1, then mmap gets marked as closed but its fields are never properly disposed? That can't be right.

And yes, I'd rather leave it as it is, since it is not a regression, and I don't want to make this PR too much about mmap. It was supposed to be about PusixFileStream w/o regressions. As I was reading through the mmap code and trying to figure out the rules of engagement and lifietime management of various things, I have noticed a few other suspicious pre-existing pieces, which I will need to examine more carefully, and PR if needed.

@slozier
Copy link
Contributor

slozier commented Jan 9, 2025

Hmm, GH ubuntu runner not wanting to install mono-vbnc... The fun never ends. 😢

@BCSharp
Copy link
Member Author

BCSharp commented Jan 9, 2025

I see that GitHub CI ubuntu-latest got upgraded to Noble Numbat (24.04). It does not have Mono anymore. We can temporarily switch back to Ubuntu Jammy Jellyfish (22.04) by changing workflow YAML to use runs-on: ubuntu-22.04 GitHub CI supports two latest LTS Ubuntu versions, so Ubuntu Jammy will still be maintained for over a year, till April 2026. In the meantime, we can figure out what to do with Mono on newer systems. Actually, the latest Jammy release is 22.04.5, I wonder if it is available as a CI image.

@BCSharp
Copy link
Member Author

BCSharp commented Jan 9, 2025

#1867. Apparently tag 22.04 is enough to select 22.04.5.

@slozier
Copy link
Contributor

slozier commented Jan 9, 2025

There are just a few tests that require mono-vbnc and those could be converted to use dotnet instead (see #1663). But if the Mono runtime is completely gone I guess that complicates testing...

@BCSharp BCSharp merged commit c02f027 into IronLanguages:main Jan 9, 2025
8 checks passed
@BCSharp BCSharp deleted the open_file_posix branch January 9, 2025 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants