Add test cases for the chomp option, fix for undef $/, improve some docs #20

HaraldJoerg · 2018-12-12T17:19:31Z

This is a pull request for the CPAN Pull Request Challenge.

The main plan was to add some tests for the chomp option of read_file, which are included. But as ever so often, this lead to other discoveries...

read_file chomps the results only if they have been split into "lines" according to the current value of $/. So I tried some possible values.

read_file has never worked when $/ is undefined, where the split on an empty value would result in an array with two elements per character of input, every other of them undefined. The PR changes this so that with undefined $/, the whole file content will be returned as the first and only element of the array. The test cases no.8 and no.9 exercise that, and fail in the master branch.

Processing fixed length records with e.g. $/ = \1024 doesn't work either, as it tries to split on a regex like SCALAR(0x55c41c65fba0). Apparently nobody has missed it so far, therefore in the current state of the PR I just added to the docs that it isn't supported. If desired, I'd work on it - it isn't too difficult.

Finally, I fixed the docs where it states that the options can be given as a flattened hash, which isn't true for all of the writing operations.

perhunter · 2018-12-12T17:50:03Z

On 12/12/18 12:19 PM, Harald Jörg wrote: This is a pull request for the CPAN Pull Request Challenge. The main plan was to add some tests for the chomp option of |read_file|, which are included. But as ever so often, this lead to other discoveries... |read_file| chomps the results only if they have been split into "lines" according to the current value of |$/|. So I tried some possible values. |read_file| has never worked when |$/| is undefined, where the split on an empty value would result in an array with two elements per character of input, every other of them undefined. The PR changes this so that with undefined |$/|, the whole file content will be returned as the first and only element of the array. The test cases no.8 and no.9 exercise that, and fail in the master branch.

this is not a good case IMO. if you want the whole file in a scalar, just call read_file in scalar context. there is no need to ever set $/ to undef for that. if you want the whole file in the first element of an array use the scalar function: my @array = (scalar read_file( 'foo' ), scalar read_file( 'bar' ) ) ;

Processing fixed length records with e.g. |$/ = \1024| doesn't work either, as it tries to split on a regex like |SCALAR(0x55c41c65fba0)|. Apparently nobody has missed it so far, therefore in the current state of the PR I just added to the docs that it isn't supported. If desired, I'd work on it - it isn't too difficult.

read_file is to read the whole file. processing in fixed length records is not a whole file. just use read for that. uri

HaraldJoerg · 2018-12-12T19:16:31Z

I agree! Neither of the settings for $/ provides an interesting use case for File::Slurp. However, it is in these cases where read_file behaves different than plain Perl I/O, different in a way which is impossible to predict without inspecting the code: For example, they return arrays which contain undefined elements. I think that the behavior should either be made similar to Perl I/O (which I did for undefined $/), or at least documented (which I did for integer references), or read_file should even defend itself against these values by throwing an error. I didn't want to throw an error because of the long long list of reverse dependencies of this module. File::Slurp is one of the modules which can easily be used by Perl beginners, who might not yet be aware of the dangers lurking when global variables are changed at some place in the code....

Of course, I'm open to requests for improvement or different decisions by the owners and will adjust the Pull Request as requested.

-- haj

perhunter · 2018-12-12T19:21:23Z

On 12/12/18 2:16 PM, Harald Jörg wrote: I agree! Neither of the settings for |$/| provides an interesting use case for |File::Slurp|. However, it is in these cases where |read_file| behaves different than plain Perl I/O, different in a way which is impossible to predict without inspecting the code: For example, they return arrays which contain undefined elements. I think that the behavior should either be made similar to Perl I/O (which I did for undefined |$/|), or at least documented (which I did for integer references), or read_file should even defend itself against these values by throwing an error. I didn't want to throw an error because of the long long list of reverse dependencies of this module. File::Slurp is one of the modules which can easily be used by Perl beginners, who might not yet be aware of the dangers lurking when global variables are changed at some place in the code....

but read_file is not supposed to be like perl i/o as it just reads a whole file. perl i/o can do all sorts of things with setting $/ and context and one line at a time. you can't compare them. the best i would say is to document that you should not change $/ and expect perl i/o behavior. read_file only uses the default $/ to split on lines and nothing else. change $/ and you are on your own. more doc warnings are ok by me. changing any behavior with regards to $/ is not ok. uri

HaraldJoerg · 2018-12-12T20:37:02Z

read_file only uses the default $/ to split on lines and nothing else. change $/ and you are on your own.

Well, at least paragraph mode ($/ = '') has been supported and documented for a long time.
The docs say that "In list context it will return a list of lines (using the current value of $/ as the separator including support for paragraph mode when it is set to '')."

This documentation can, of course, be adjusted to discourage any current value of $/ different from newline and the empty string, and I'll remove the handling of undefined $/ together with the corresponding tests.

perhunter · 2018-12-12T21:23:34Z

On 12/12/18 3:37 PM, Harald Jörg wrote: read_file only uses the default $/ to split on lines and nothing else. change $/ and you are on your own. Well, at least paragraph mode (|$/ = ''|) has been supported and documented for a long time. The docs say that /"In list context it will return a list of lines (using the current value of $/ as the separator including support for paragraph mode when it is set to '')."/ This documentation can, of course, be adjusted to discourage any /current value of |$/|/ different from newline and the empty string, and I'll remove the handling of undefined |$/| together with the corresponding tests.

yep. so adding a warning to not change $/ to anything but a normal line ending or to '' for paragraph mode would be a good thing. uri

karenetheridge · 2018-12-13T00:27:54Z

If the code permits a particular usecase, it should be tested for. If a particular usecase is wrong, then the code should prohibit it by dying. Documentation warnings are not enough. Murphy will always contrive the strangest usecases possible.

…

On Wed, Dec 12, 2018 at 1:23 PM perhunter ***@***.***> wrote: On 12/12/18 3:37 PM, Harald Jörg wrote: > > read_file only uses the default $/ to split on lines and nothing > else. change $/ and you are on your own. > > Well, at least paragraph mode (|$/ = ''|) has been supported and > documented for a long time. > The docs say that /"In list context it will return a list of lines > (using the current value of $/ as the separator including support for > paragraph mode when it is set to '')."/ > > This documentation can, of course, be adjusted to discourage any > /current value of |$/|/ different from newline and the empty string, > and I'll remove the handling of undefined |$/| together with the > corresponding tests. > > yep. so adding a warning to not change $/ to anything but a normal line ending or to '' for paragraph mode would be a good thing. uri — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#20 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASfy1z5596ir8bP6f6lPNJ_b9Yl2Srvks5u4XPWgaJpZM4ZQE8Q> .

perhunter · 2018-12-13T03:23:19Z

On 12/12/18 7:27 PM, Karen Etheridge wrote: If the code permits a particular usecase, it should be tested for. If a particular usecase is wrong, then the code should prohibit it by dying. Documentation warnings are not enough. Murphy will always contrive the strangest usecases possible.

how can you tell if the user really wants to mung $/ and use read_file? you can't test or check for every possible idiotic use of code. sometimes it is better to let the user learn it on their own. documenting the issue is so you can always say rtfm. is setting $/ to undef and slurping in a file to a scalar wrong? it should work as $/ is not looked at there. paragraph mode only makes sense in a list context anyway. there are too many possible rabbitholes to handle all possible cases and know what the user really wants. uri

HaraldJoerg added 3 commits December 11, 2018 17:24

Fix documentation: No writing function allows a flat hash of options

3385849

Tests for chomp, and some fixes to the underlying line splitting

243f079

Add a test plan to chomp tests

6c1f10e

Don't change $/ treatment, as requested by uri

1479a4f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test cases for the chomp option, fix for undef $/, improve some docs #20

Add test cases for the chomp option, fix for undef $/, improve some docs #20

HaraldJoerg commented Dec 12, 2018

perhunter commented Dec 12, 2018 via email

HaraldJoerg commented Dec 12, 2018

perhunter commented Dec 12, 2018 via email

HaraldJoerg commented Dec 12, 2018

perhunter commented Dec 12, 2018 via email

karenetheridge commented Dec 13, 2018 via email

perhunter commented Dec 13, 2018 via email

Add test cases for the chomp option, fix for undef $/, improve some docs #20

Are you sure you want to change the base?

Add test cases for the chomp option, fix for undef $/, improve some docs #20

Conversation

HaraldJoerg commented Dec 12, 2018

perhunter commented Dec 12, 2018 via email

HaraldJoerg commented Dec 12, 2018

perhunter commented Dec 12, 2018 via email

HaraldJoerg commented Dec 12, 2018

perhunter commented Dec 12, 2018 via email

karenetheridge commented Dec 13, 2018 via email

perhunter commented Dec 13, 2018 via email