Giter Club home page Giter Club logo

Comments (8)

johnkerl avatar johnkerl commented on May 22, 2024

The RFC-compliant CSV reader handles embedded separators within double quotes; readers for the other formats do not at all. This is a non-ideal situation, for sure.

This is a dup of #52 and all four of the current on-deck or active tasks are what I'm actively working on for v2.2.0.

from miller.

rbroemeling avatar rbroemeling commented on May 22, 2024

@johnkerl While proper quoting can solve this issue, I'm not sure it is the only solution and I'm not certain that it is preferable in this case. Having miller make a best-effort to parse unquoted-but-still-parsable data is a bug, IMO.

Given the log file that I am currently working with, for example: adding quotes to all the fields would make it a LOT harder to read (as a human reading plaintext, I mean).

from miller.

johnkerl avatar johnkerl commented on May 22, 2024

Good feedback. I'll dig harder into your request this evening.

from miller.

rbroemeling avatar rbroemeling commented on May 22, 2024

Thanks for looking into it further, @johnkerl -- I have not spent long looking into this, but I'm wondering if the problem isn't in /c/input/lrec_reader_mmap_dkvp.c within the lrec_parse_mmap_dkvp function: whenever it matches on ips, it changes the ips to a null and then sets the value to the byte after the ips. This seemingly would result in the behavior that is being seen (i.e. the key ends at the first ips, and the value begins at the last ips).

I wonder if this couldn't be fixed by tweaking that code to ensure that you only match on ips once per field, which should result in only the first ips being matched.

from miller.

johnkerl avatar johnkerl commented on May 22, 2024

My apologies for the hasty read; I've got double-quotes on the brain & assigned too much weight to the double-quotes in your data. :^/

This is (was) definitely a bug; fixed in c2e11c0.

Thank you for the report!! :)

from miller.

rbroemeling avatar rbroemeling commented on May 22, 2024

No problem at all, @johnkerl -- thanks very much for the quick fix!

from miller.

rbroemeling avatar rbroemeling commented on May 22, 2024

Nice fix, @johnkerl -- have confirmed that with it in-place, miller can now deal very nicely with things like a web-server access.log file in dkvp format (using = as the PS now works well, even though it is used in some field values as well).

from miller.

johnkerl avatar johnkerl commented on May 22, 2024

Awesome!!

from miller.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.