-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use PosixFileStream for files on POSIX #1855
Conversation
|
||
// a hack but it does improve perfomance a lot if flushing is not needed | ||
_writeNeedsFlush = writeStream switch { | ||
FileStream => true, // FileStream is buffered by default, used on Windows and mostly Mono |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory the FileStream
should already be unbuffered since the FileIO.OpenFile
helper uses a buffer size of 1
. I'm not sure if it's true in practice, but it looks like at least .NET (Core) has code to handle it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just checked the source code of .NET Framework 4.8 and it also performs unbuffered writes when buffer size is 1. I suspect it is the same on 4.6.2. On Mono, however, buffer bypass only happens if the data length is > buffer size, meaning that single-byte writes still land in the 1-byte buffer, not on the disk. We could then optimize flushing by only flushing if IsMono and data length is 1.
But FileIO.OpenFile
is not the only way of creating FileIO
. One can use os.open
to get a file descriptor and then open a file. Currently, PythonNT.open
, when creating FileStream
, uses buffer of 4K (DefaultBufferSize
), which is defined since the initial commit 11 years ago. I will change it to 1 to get the consistent unbuffered behaviour, unless you have counter arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can also open with a buffer size of 0 if that would help with Mono...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Mono, .NET, and .NET Framework all throw for buffer size <= 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, in the FileStreamOptions.BufferSize
docs it says 0
or 1
to disable buffering. I was assuming it'd simply pass along the value to the other constructor. Anyway, it looks like .NET Framework does indeed throw for <= 0 so I guess it doesn't help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, .NET doesn't throw with 0, I was speaking from memory, must have gotten confused with another framework. Or maybe it was throwing in the past (older framework versions).
Anyway, the issue is with Mono, and it does throw at 0, and it does buffer at 1.
_writeNeedsFlush = writeStream switch { | ||
FileStream => true, // FileStream is buffered by default, used on Windows and mostly Mono | ||
PosixFileStream => false, // PosixFileStream is unbuffered, used on .NET/Posix | ||
UnixStream => false, // UnixStream is unbuffered, used sometimes on Mono |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is causing the test failures since it'll try to load the Mono.Posix
assembly (which isn't packaged along with Windows builds).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it makes sense... The failure on Linux is a bit more interesting as it is failing when dealing with a file opened in the "a" mode for simultaneous logging.
a5e3975
to
64d0d4a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you changed it to draft. Did you have additional changes you wanted to do?
Src/IronPython.Modules/mmap.cs
Outdated
@@ -15,6 +15,8 @@ | |||
using System.Numerics; | |||
using System.Runtime.CompilerServices; | |||
using System.Runtime.InteropServices; | |||
using System.Runtime.Serialization; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
System.Runtime.Serialization
is VS trying to be helpful?
|
It was a non-trivial merge. |
Ready to merge. Any spellling errors? 😃 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Just a few comments.
Src/IronPython.Modules/nt.cs
Outdated
stream = new Mono.Unix.UnixStream(res, ownsHandle: true); | ||
// Elaborate workaround on Mono to avoid UnixStream as out | ||
stream = new Mono.Unix.UnixStream(res, ownsHandle: false); | ||
FileAccess fileAccess = stream.CanRead ? stream.CanWrite ? FileAccess.ReadWrite : FileAccess.Read : FileAccess.Write; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know the UnixStream
details, but this seems to say that not being readable implies writable? Could there be odd cases where this isn't true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think it is possible. It's because there is actually no flag "readable". O_RDONLY
is actually lack of any flags, i.e. 0b00
. One can only request a writable file by setting flags 0b01
(non-readable writable) or 0b10
(readable writable). These two flags are mutually-exclusive, i.e. setting 0b11
results in EINVAL
.
However, in other places I have the test inverted, i.e. stream.CanWrite ? stream.CanRead ?
, which better reflects how those flags operate, so for consistency and clarity I'm going to change it here too. In other words, not being writable implies readable.
Src/IronPython.Modules/nt.cs
Outdated
// Use PosixFileStream to operate on fd directly | ||
// On Mono, we must use FileStream due to limitations in MemoryMappedFile | ||
Stream s = PosixFileStream.Open(path, flags, unchecked((uint)mode), out int fd); | ||
//Stream s = PythonIOModule.FileIO.OpenFilePosix(path, flags, mode, out int fd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the OpenFilePosix
comment still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I missed that one. I kept it around to switch back and forth between implementations when hunting for regressions.
Src/IronPython/Modules/_fileio.cs
Outdated
// Use PosixFileStream to operate on fd directly | ||
// On Mono, we must use FileStream due to limitations in MemoryMappedFile | ||
var stream = PosixFileStream.Open(name, flags, 0b_110_110_110, out int fd); // mode: rw-rw-rw- | ||
//var stream = OpenFilePosix(name, flags, 0b_110_110_110, out int fd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the OpenFilePosix comment still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I missed that one too…
Src/IronPython/Modules/_fileio.cs
Outdated
} | ||
if (RuntimeInformation.IsOSPlatform(OSPlatform.Linux) || RuntimeInformation.IsOSPlatform(OSPlatform.OSX)) { | ||
// On POSIX, register the file descriptor with the file manager right after file opening | ||
// Because the .NET case is already handled above before `switch`, this branch runs only on Mono |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add Debug.Assert(ClrModule.IsMono)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I tent to make assertions when the asserted assumptions not only are true but have to be true for the code to behave correctly. In other words, to catch the violation of the assumption early rather than risking failures later with potentially obscured errors. This code will behave correctly even if .NET
enters it, e.g. the if
before the switch gets (temporarily) commented out or disabled in any other way (bug workarounds for any .NET versions?). The comment was meant as informational to prevent anyone (incl. myself) in the future from trying to "optimize away" this block by deleting it.
Anyway, I have no problem of adding the assert here if you think that will be helpful. Or I can change the comment from "only on Mono" to "needed for Mono"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, updating the comment works for me too.
Src/IronPython.Modules/mmap.cs
Outdated
(if any), plus `Dispose` of it if it owned the file stream. | ||
|
||
`SafeFileHandle` closes the underlying file descriptor on dispose and sets it to invalid. It is OK | ||
to close the handle several times, or even if it add-reffed and in use somewhere else. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready to merge. Any spellling errors? 😃
If you insist... assuming add-reffed should only have one f? But more importantly ass below. 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, like to reef -> reefed? But "ee" is a long vowel. To maintain the short "e", I doubled the consonant, like in "to pet" -> "petted". I am inventing a new verb here so it's an uncharted territory.
But more importantly, I'm sorry to let "closed ass" go... 😄
if (Mono.Unix.Native.Syscall.fstat((int)handle.DangerousGetHandle(), out Mono.Unix.Native.Stat status) == 0) { | ||
size = status.st_size; | ||
} else { | ||
Mono.Unix.Native.Errno errno = Mono.Unix.Native.Stdlib.GetLastError(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't use the PythonNT.GetLastUnixError
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PythonNT.GetLastUnixError
cannot be used here (yet) because it does not have proper platform guards (yet). Adding the platform guards to PythonNT.GetLastUnixError
has a cascading effect to the code. I am saving it for a separate PR (soon), not to mix up too much with this one.
_view = _file.CreateViewAccessor(_offset, newsize, _fileAccess); | ||
return; | ||
} catch { | ||
close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this close
might be problematic (it won't release resources since _refCount
is 2
at this point). Anyway, the Windows version has the same issue which I still need to figure out for #1866 so we can leave it as-is for now. (Unless the "proper" solution is obvious to you and you feel like taking it on).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is problematic. I didn't look into the solutions, but the way close
and CloseWorker
work perplexes me. If the code calls close
while _refCount
is > 1, then mmap
gets marked as closed but its fields are never properly disposed? That can't be right.
And yes, I'd rather leave it as it is, since it is not a regression, and I don't want to make this PR too much about mmap
. It was supposed to be about PusixFileStream
w/o regressions. As I was reading through the mmap
code and trying to figure out the rules of engagement and lifietime management of various things, I have noticed a few other suspicious pre-existing pieces, which I will need to examine more carefully, and PR if needed.
Hmm, GH ubuntu runner not wanting to install |
I see that GitHub CI |
#1867. Apparently tag 22.04 is enough to select 22.04.5. |
There are just a few tests that require |
This PR replaces
FileStream
withPosixFileStream
on .NET/POSIX (not Mono), which offers unbuffered access to the file though its actual file descriptor. This addresses the issue #1846 on .NET/POSIX.I decided to write own implementation of the stream, rather than encapsulating
UnixStream
. Not only the implementation had similar amount of work involved and complexity as a proxy class toUnixStream
, but also it allowed me to make different implementation choices thanUnixStream
, some of which I would call problematic. The best example isUnixStream.Close()
, which retries syscall toclose
if interrupted. This will result in error EBADF at best, and close an unrelated file descriptor at worst. It's because, on most systems (including Linux and macOS),close
always closes the descriptor, even it the call fails with an error, and retry is not appropriate. See CPython source code.Another effect of this PR is that for .NET/POSIX, there are no more "double streams", meaning that operations on file descriptors behave as expected in all cases (modulo bugs and missing pieces).
The performance of this implementation is roughly the same as CPython's. When I first tested it, I shockingly discovered that it was 2.5 times slower, but then noticed that
StreamBox
always callsFlush
after each write and that was slowing things down significantly. I included a hack to avoid it forPosixFileStream
and it helped.Module
mmap
is due for some cleanup, which is outside of the scope of this PR.Unfortunately I had to leave Mono behind, that is, on
FileStream
. Between the limitations/bugs ofMemoryMappedFile
interface andFileStream
, usingPosixFileStream
on Mono (or evenUnixStream
) would create a regression inmmap
.This PR also makes sure that all file descriptors created via
open
orio.open
on .NET/POSIX are non-inheritable, in line with #1225.