Giter Club home page Giter Club logo

Comments (2)

ohthehugemanatee avatar ohthehugemanatee commented on August 25, 2024

Really? How will you figure out which row is the real header?

I just posted an issue in the parent project ( halloffame#23) asking for a variable where we can select the number of header rows. That allows for files with 0 (just show the first row example data to choose columns), and files with several. We know that n-1=header row.

automatic would be awesome, but I can't imagine how to do it without big assumptions.

I wanted to submit a patch, but the project doesn't build on Linux (unsupported platform for fsevents, apparently). Thanks for all your work!

from ynab-csv.

cjhowald avatar cjhowald commented on August 25, 2024

@ohthehugemanatee I made one assumption: the CSV will not parse properly if an incorrect header row is used. Then, the trick is just defining what "properly" means and extracting that info from the parse result. Turns out, Papaparse does a nice job of recording errors it encounters while parsing. For our purposes, I'm defining parse failure as the first row of data throwing a FieldMismatch error (too many or too few values). This could be too strict for some, but I've found DB always sets the right number of values for data rows. Extraneous header rows, on the other hand, have far fewer "fields", so the parsing will definitely fail according to my criteria. Then, it's just a matter or detecting the failure, stripping off the first row (the bad header) and trying again. This is how I fixed #3.

For this issue specifically, it appears my failure criteria are actually not strict enough. If 2 consecutive "extra" header rows have the same number of "fields", then it will actually register as a success and continue parsing. Rows 2 and 3 in my sample data each have just one field (no delimiters), which causes this to happen. I've just solved this by also imposing a rule that there must be at least 3 fields (date, amount, payee). There are still cases where a false positive success is possible (e.g. 2 "extra" header rows have 4 fields each), but this is sufficiently unlikely for me and does not appear in any of my imports in any case.

Perhaps this approach feels too fragile for you (or the parent maintainer), and I agree a user-specified header number is generally more robust. However, it works well for me and prevents an extra step from the user.

I hope this answers your questions.

from ynab-csv.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.