Giter Club home page Giter Club logo

file-slurp-sane's People

Contributors

dolmen avatar leont avatar mephinet avatar revhippie avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

file-slurp-sane's Issues

WIthout utf8_strict, write_text fails with certain long texts

So, straight to the problem. Let's say I have a utf-8 Perl string (for example, a literal string constant in a file with use utf8;). Let's say its utf-8 byte representation is longer than 4096 bytes, and the 4096 byte threshold breaks the byte representation of a character into two parts.

Under these conditions, if I use write_text to write this string, and I don't have PerlIO::utf8_strict installed, I get the following errors:

"\x{00e2}" does not map to utf8 at /run/current-system/sw/lib/perl5/site_perl/5.24.3/File/Slurper.pm line 73.
Close with partial character at /run/current-system/sw/lib/perl5/site_perl/5.24.3/File/Slurper.pm line 73.

I'm attaching the Perl file that manifests this problem. I'm running perl 5, version 24, subversion 3 (v5.24.3) built for x86_64-linux-thread-multi, File::Slurper version 0.010, under Nixos Linux.

The underlying problem seems to lie inside Perl's IO layers that are employed by write_text when utf_strict is not available and encoding is not specified explicitly. In this case, :raw:encoding(utf-8) is used as an IO layer when writing the file, and it fails to handle this kind of output properly. However, if I specify utf8 (without a dash) as an encoding in write_text call, or just directly do output with :raw:encoding(utf8), then this problem seems to disappear. (By the way, :raw doesn't influence the behaviour here, so it can be taken out of equation.)

While this seems to be an issue with core Perl, it's probably appropriate to address it in File::Slurper, either by defaulting to utf8 instead of utf-8, or by declaring utf8_strict as a required dependency.

write_binary always uses layers :raw:encoding('latin-1'), should be just :raw.

The write_binary code looks like:
sub write_binary { return write_text(@_[0,1], 'latin-1'); }
so passes an encoding of 'latin-1', with a hyphen.

write_text calls _text_layers which uses encoding and crlf to determine the encoding layer.

The relevant part of _text layers is:
if ($encoding =~ /^(latin|iso-8859-)1$/i) { return $crlf ? ':unix:crlf' : ':raw'; }

That matches "latin1" but not "latin-1". write_binary does not write using ':raw', but instead ":raw:encoding('latin-1')".

This produces a "wide character" warning for each non-ascii line written.

To fix, either change your regex, maybe /^(latin-?|iso-8859-)1$/i, or change write_binary to: return write_text(@_[0,1], 'latin1'); , no hyphen.

I could branch and create a pull request but that seems silly for a 1-character fix.

read_text returns empty list for an empty file

Calling read_text in list context normally returns a scalar as expected, unless the file in question is empty.

$ perl -MFile::Slurper=read_text -E 'my @foo = (read_text("/dev/null")); say 0+@foo'
0

So a statement like localise(read_text($file, "UTF-8"), $locale) passes $locale as the first argument if and only if $file is empty, assuming no prototype on the sub.

The documentation strongly implies that the return value is always a scalar

Reads file $filename into a scalar

read_lines problem in a long running process

I recently had to remove read_lines from a little server I am developing (Phoebe, commit). It used read_lines to read a list of "pages" from an "index" file. When I started or restarted the server, everything worked as intended, the index showed one item per page. After a few hours, the behaviour changed: read_lines returned a single item containing all the pages concatenated with newlines.

Normal operations with an index containing three files:

  1. A
  2. B
  3. C

After a few hours, with the same index:

  1. A\nB\nC

And when I restart the process, it is back to three items.

Sadly, I have no idea how this is possible. I never pass $encoding, $crlf, or $skip_chomp, and read_text also calls _text_layers just like read_lines. I'm just confused about the situation right now and don't know how I should debug or log the issue.

Provide read_*_if_exists() functions

A lot of workflows do not treat nonexistence of a file as an exceptional circumstance. While it’s commonplace for such code to -e $path before reading, this sets up a race condition.

It would be useful for File::Slurper to expose read functions that return undef if the given path is nonexistent.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.