Example headers: <div class="snippet-clipboard-content notranslate position-relati

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Credit card CSV extra headers not auto stripped about ynab-csv HOT 2 CLOSED

cjhowald commented on August 25, 2024

Credit card CSV extra headers not auto stripped

from ynab-csv.

Comments (2)

ohthehugemanatee commented on August 25, 2024

Really? How will you figure out which row is the real header?

I just posted an issue in the parent project ( halloffame#23) asking for a variable where we can select the number of header rows. That allows for files with 0 (just show the first row example data to choose columns), and files with several. We know that n-1=header row.

automatic would be awesome, but I can't imagine how to do it without big assumptions.

I wanted to submit a patch, but the project doesn't build on Linux (unsupported platform for fsevents, apparently). Thanks for all your work!

from ynab-csv.

cjhowald commented on August 25, 2024

@ohthehugemanatee I made one assumption: the CSV will not parse properly if an incorrect header row is used. Then, the trick is just defining what "properly" means and extracting that info from the parse result. Turns out, Papaparse does a nice job of recording errors it encounters while parsing. For our purposes, I'm defining parse failure as the first row of data throwing a FieldMismatch error (too many or too few values). This could be too strict for some, but I've found DB always sets the right number of values for data rows. Extraneous header rows, on the other hand, have far fewer "fields", so the parsing will definitely fail according to my criteria. Then, it's just a matter or detecting the failure, stripping off the first row (the bad header) and trying again. This is how I fixed #3.

For this issue specifically, it appears my failure criteria are actually not strict enough. If 2 consecutive "extra" header rows have the same number of "fields", then it will actually register as a success and continue parsing. Rows 2 and 3 in my sample data each have just one field (no delimiters), which causes this to happen. I've just solved this by also imposing a rule that there must be at least 3 fields (date, amount, payee). There are still cases where a false positive success is possible (e.g. 2 "extra" header rows have 4 fields each), but this is sufficiently unlikely for me and does not appear in any of my imports in any case.

Perhaps this approach feels too fragile for you (or the parent maintainer), and I agree a user-specified header number is generally more robust. However, it works well for me and prevents an extra step from the user.

I hope this answers your questions.

from ynab-csv.

Credit card CSV extra headers not auto stripped about ynab-csv HOT 2 CLOSED

Comments (2)

Related Issues (16)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent