Giter Club home page Giter Club logo

Comments (16)

missinglink avatar missinglink commented on August 17, 2024 1

yep, re: 6) as I expected, your shell is reading the whole stream into memory because you're running it in a sub shell using $().

the same thing happens if you take all the data away and just do:

/dev/null < $(cat /dev/urandom)

you'll see your system memory filling up until eventually your shell runs out of heap

the same doesn't apply to:

cat /dev/urandom > /dev/null

or

cat /dev/urandom | cat > /dev/null

from interpolation.

missinglink avatar missinglink commented on August 17, 2024

hi @kylebarron it sounds like there is a problem with the .0sv file, could you please post the file size and first 10 lines?

the pbf streets command wasn't originally written to support such large extracts, I just added missinglink/pbf#15 this morning which should make it possible to cut your us-midwest extract on a laptop.

I'll also change that fatal error to a warning, if it's only 1 or 2 streets that are missing their names then we can probably just skip over those and warn the user they are malformed, rather than fatally erroring.

from interpolation.

kylebarron avatar kylebarron commented on August 17, 2024

Thanks @missinglink; sorry, which is the .0sv file? I don't seem to have any files with that extension, or is that the same as the .osm.pbf file?

from interpolation.

missinglink avatar missinglink commented on August 17, 2024

Sorry, I mean the .polylines file A.K.A. .0sv (nullbyte separated file) which has a nicer ring to it :)

from interpolation.

missinglink avatar missinglink commented on August 17, 2024

Closing issue, still interested to hear what happened. I suspect the .polylines file is empty or contains invalid lines.

You can also run the import again now with the latest code from master and it'll spit out the offending polyline for debugging purposes.

from interpolation.

kylebarron avatar kylebarron commented on August 17, 2024

I read my initial bug report again, and I should've specified that the error arises from a specific line around ~128000. Above I cut out some of the output to be concise; the full output to console is below. The file size of us-midwest-latest.polylines is 121MB. With your PR changing the response from an error to a warning, this should proceed fine.

> ./interpolate polyline street.db < us-midwest-latest.polylines
0	0/sec
1000	1000/sec
3000	2000/sec
4000	1000/sec
10000	6000/sec
14000	4000/sec
16000	2000/sec
20000	4000/sec
24000	4000/sec
28000	4000/sec
32000	4000/sec
32000	0/sec
35000	3000/sec
38000	3000/sec
41000	3000/sec
45000	4000/sec
48000	3000/sec
48000	0/sec
50000	2000/sec
54000	4000/sec
57000	3000/sec
60000	3000/sec
64000	4000/sec
64000	0/sec
66000	2000/sec
69000	3000/sec
73000	4000/sec
76000	3000/sec
79000	3000/sec
80000	1000/sec
82000	2000/sec
85000	3000/sec
87000	2000/sec
90000	3000/sec
93000	3000/sec
96000	3000/sec
96000	0/sec
98000	2000/sec
101000	3000/sec
103000	2000/sec
106000	3000/sec
109000	3000/sec
112000	3000/sec
112000	0/sec
113000	1000/sec
116000	3000/sec
118000	2000/sec
121000	3000/sec
123000	2000/sec
125000	2000/sec
127000	2000/sec
/disk/agedisk1/medicare.work/doyle-DUA18266/barronk/npi-geocode/lib/interpolation/lib/Street.js:24
  if( !Array.isArray( names ) || !names.length ){ throw new Error( 'invalid names array' ); }
                                                  ^

Error: invalid names array
    at Street.setNames (/disk/agedisk1/medicare.work/doyle-DUA18266/barronk/npi-geocode/lib/interpolation/lib/Street.js:24:57)
    at DestroyableTransform._transform (/disk/agedisk1/medicare.work/doyle-DUA18266/barronk/npi-geocode/lib/interpolation/stream/street/augment.js:25:12)
    at DestroyableTransform.Transform._read (/disk/agedisk1/medicare.work/doyle-DUA18266/barronk/npi-geocode/lib/interpolation/node_modules/through2/node_modules/readable-stream/lib/_stream_transform.js:184:10)
    at DestroyableTransform.Transform._write (/disk/agedisk1/medicare.work/doyle-DUA18266/barronk/npi-geocode/lib/interpolation/node_modules/through2/node_modules/readable-stream/lib/_stream_transform.js:172:83)
    at doWrite (/disk/agedisk1/medicare.work/doyle-DUA18266/barronk/npi-geocode/lib/interpolation/node_modules/through2/node_modules/readable-stream/lib/_stream_writable.js:418:64)
    at writeOrBuffer (/disk/agedisk1/medicare.work/doyle-DUA18266/barronk/npi-geocode/lib/interpolation/node_modules/through2/node_modules/readable-stream/lib/_stream_writable.js:407:5)
    at DestroyableTransform.Writable.write (/disk/agedisk1/medicare.work/doyle-DUA18266/barronk/npi-geocode/lib/interpolation/node_modules/through2/node_modules/readable-stream/lib/_stream_writable.js:334:11)
    at DestroyableTransform.ondata (/disk/agedisk1/medicare.work/doyle-DUA18266/barronk/npi-geocode/lib/interpolation/node_modules/through2/node_modules/readable-stream/lib/_stream_readable.js:619:20)
    at emitOne (events.js:116:13)
    at DestroyableTransform.emit (events.js:211:7)
> head us-midwest-latest.polylines
cmeuoAnmljgD?udAcBi{IaBisI�West State Park Road
in_eqAv`gviD}LnCiM~@ke@v@�Sellery Street
mzknmA|eifvDpSJ�Bison Circle
ufqhnAprk`oD|FaKvCqFhG}R~Ho]dGsd@x@oObAgX^}u@RsgFZuyH{AazBQiVgAmoEy@of@Yk[a@aSc@aS}@oz@Mua@_@ohAFojAz@ifAn@wlBEmaBy@ieBcBmzAoEkbCEkk@Fo|@A}b@_@i_AKqfA?w`AOk\u@avA`@cdCi@{aCUmaDCoj@JyvCy@cnEEeg@ZicCu@{l@}Cky@e@gLgJubAkVu}ActCkjOkJ{m@wGkq@kC_\gBk`@s@c[?oKCen@VgbC@kUBuoA?kjBBky@nC_uCRcTAwc@AcwAEqmBEkyEoAcaAu@aTkCq[oH_f@iRg|@mTkk@{Qu_@q\gi@{MmSgXwa@a@q@cTo_@iHcOiAeDiKo[oC{IcJs^cEeVaDuT_C{YyA}b@M{a@vAk~FGmpAz@etA@iJfAmrAn@kxAAwPh@ic@x@umA~@gzCj@gx@tBufEYsnBa@mp@Qo\cAktBaBexCeCk|DwB{dBqB{wEY{sESspABiw@@o[MyZsPsp@cfBskIucAo|EeF}U{RifA{_@ioBs]wlBcb@szBq[q`B}TonAyUkpAy[gaBkN_w@qG_^eFqa@eCye@_@e\Ewx@C_KI}k@EihCCsc@}@mmAiAkoBMm`@Gwv@CasCPgiHN_o@^miALsc@z@i_CE}aAoE}fFqAgwA?iU~@gS~@kNl@uG`@cEfBkLlCeQxCyMxE_PdlAcfEhBsGbLud@`FuZhCmX|AwSr@iSJySEiUcEyfC_PixJeA_YyCa[qD{ViE{UqEeRsFuRaFgPyDwMoQ_n@yHgVyK_^kEmRiCeNqC{PkBsOeBkSsA{Ri@iVu@u}GAeYcAanBg@ctB{BywGqA_vDG{T[kO{BipAKwPE_HSmd@?a}@g@}mC@ym@y@s|Ai@gvD?ys@J_Wg@eFaA{CaBqAgB{@uFSo\e@oq@]eSQ{EmAmEmCwCsCsBeEkBmFcCqKcIka@{Oez@gNsw@uB}NoDg`@qFwv@eCkZqFqi@oAeLg@yLg@ax@W_qBA_aEU_jDAw`BGga@UqbCi@aiGQwTAeOv@KtJcCrpDe@rHSlG}AlC_CxCyDpB{G`@wLAcnDVg_AAoUAeDGiRMe}BJ{cCSaf@I_jALqr@U_z@?i`ATuHj@cE|@qCbBiD~C_CnFuAp_@Wj{@Bvr@o@`ZAlOArGKrC_BlCaEl@uINm`@k@uoFSgyAMmjCTioAEwYKyr@CyNCyWI_l@Gua@Gwe@EaTKir@CiYcBctEw@auCn@avBEk]kCsbJi@i_FF_fEEq[i@gzDa@}xCYifDEcq@i@af@gAaLiE}WqJ}]}JgU_RkW{FgFsEeE_[sQci@aX{FwD}YoRyHeIaMqSoIuQgJwZ{Fu_@eBoVe@k^Mar@HglAKioAc@mv@g@_YuA{MyBmNuBcLk@uCyHsV}HcQ{M}QyMyPsu@yy@iq@mv@}MsPcLsWeC_GkJga@gDu]yAyZgDmiAuCmx@uB{Z_Fc]iEkVqAiFkQwr@iO{k@{C{LoAcFgSmw@_\}nAaSiw@yDqOoPkp@oc@_cBqLsg@qGgd@eD{YqE{y@_Aka@eFumCgDw}Aa@_OM}IIwE]eSqBi~@gBw{@GyRDeTx@}UrBaXdEg\VkBfHo\|Jq_@|Lk[pK_SxPmXzQ_TpzCmvCtWwXbFmFj^c]fm@mk@ba@sa@xNkQtOsVnIuPbEgI|H_UrGsVtFsVbEyWnEm]vT{qB~Iyw@`Hgp@pBeTpAeNfAwTD_HG{PQaQOsGsAaSi@aFoMmnAsBaRkJa}@iDo]_C_ZiAgScAqRaHcjAeL{qByFudAGcA}KokB_Cob@cTcrD}H}uA{Gy_AiEeg@sGsp@cScpAsWm|A_m@clDuVexAuKkn@sGy]aK}j@_Kyk@gLar@aDuQgOqz@cCsMeB}MeEg`@mBwYmAyTg@mScAcm@YmQ_D}nBeCgeByCa}Bm@kXGa]\iX`AoTfBoTtBaOpCcQnGi[xIoYpU_r@rJ{YnN{a@xN_b@fPqg@pv@{}B|EiObUuq@f_@aiAnZ{~@nIe\`FaZrEi[|Dkb@`Ckf@Xc\wBspCO_Rs@az@y@_`@_BkTiC{X_G}e@eFqW{EqSuI{Y{JkWuHgR}MyXmUs_@u_Am~AqKeUuJmU{KmYeUqt@wKof@sG__@sJmq@aZqyCcQmsAwJae@sBaJwO}i@yVyv@muAcdEaPyc@gZcy@sGeQunA}cD�County Highway F52
guxknAnrk}qDdEnApDrBfCbCbCnDjBjExSnn@l@jBrCvKhBdKpAbLt@`NdAb_@hBnn@n@pNn@zIjAvKjKp{@l@jDz@fDnBfF~BbEvn@`~@fC|DvFjKdE|JpCrHbM|\jDdMrEjP|FnT�Wilden Drive
kfvknAz`i}qDwHaI_IaKeGoJa@s@qFkKiF}LsGkPqGcNkFuJ�Wilden Drive
khsknAhjy{qDsEgOgBgJsAmJ{@uJUiEKsEEiE?oDFeERqETaENkGnAkK�Wilden Drive
{xrknAl{ywqDt@uD`@gF?cGa@yEu@}DwAeE_BsCo]mg@_BoC_BcEa@yAc@{Bm@aFoCu`@i@iEkAgEmBsDqB}BmCiB{CaAkDU�Wilden Drive
i`vknAhcfwqDyDpIiDvKgAtFoAdOIfG?vL?xB?jBArM?bITjHn@jHbA`H|AvGnBfGtCxG|^fk@|JvOlD|Fn`@vo@�Wilden Drive
{owknA~hs|qDrDrCvG~EvD~BlC~@rEhAnKxC|Dx@jCt@hA~@x@tBv@dA|@~@v@J~@?d@[l@cAJaAD{@Y_Bu@mAaAUcA?qA\wBh@w@P�Wilden Drive

from interpolation.

kylebarron avatar kylebarron commented on August 17, 2024

I updated from master and it did spit out the warning:

street has no valid names, check your 0sv file:
oee}mAthsagDN|Jd@hDhBpB|CfAvHxAnJxChHdDzE|DxF|G|FhJlGlPhEbT~@bJb@`LB~Fb@pHdA`IhLd]

It seems to be progressing fine otherwise.

To clarify since I couldn't find this in the documentation, is there a way to append to the sqlite db, or should I be doing

./interpolate polyline street.db < us-midwest-latest.polylines \
    us-northeast-latest.polylines \
    us-south-latest.polylines \
    us-west-latest.polylines

to get a full US interpolation dataset.

from interpolation.

missinglink avatar missinglink commented on August 17, 2024

ok cool, looks like the commit I pushed is now allowing the importer to skip over the offending line(s).

could you please copy->paste the output of this grep command so I can get a copy of the exact line that's causing it?

grep -aC5 'oee}mAthsagDN' *.polylines

I don't think the syntax you posted is doing what you expect (try https://explainshell.com).

My understanding is that the first (and only the first) filename after the < will be piped to stdin of the ./interpolate script, I'm not sure what happens to the rest of them.

What you can do instead is to use the cat command to concatenate them and pipe them to ./interpolate as such:

cat us-midwest-latest.polylines \
    us-northeast-latest.polylines \
    us-south-latest.polylines \
    us-west-latest.polylines \
 | ./interpolate polyline street.db

from interpolation.

missinglink avatar missinglink commented on August 17, 2024

you could also write it like this (assuming you want to import all the .polylines files):

./interpolate polyline street.db <(cat *.polylines)

or

cat *.polylines | ./interpolate polyline street.db

AFAIK these two syntaxes are equivalent

from interpolation.

kylebarron avatar kylebarron commented on August 17, 2024

Yes you're right. I use | and > often, but rarely <.

> grep -aC5 'oee}mAthsagDN' *.polylines
us-midwest-latest.polylines-m~v~iAzvchrDQpuAm@lfC?tPCjtABj~@Lvb@I~yBI`m@\fZ?z_C?~ESjr@?`ZQpy@Lfm@]n_@Efq@]npAIxo@MnEm@pE{@pEoAvEqBzEkCjFqj@|dAcYbh@yPbXkEzF{KlMwQ|RgIlKaFzJaCbJsAnLwDbi@�West 126th Street
us-midwest-latest.polylines-szfeiAlqs|sDRiHjBwHpDeGdDmDv[iVbC{ChB{CzB{Hr@wDToJ?_dA?iPg@oHoAmF{BqF_JgOyCaF{BqF_AmF[aHp@gEpAqCNqBm@cBoAo@qAJs@pAOhBT|An@~Bx@xD�West 126th Street
us-midwest-latest.polylines-c`qfkAjzomcDg@cDa@gBa@gCE]c@sCi@kEYeCQkDKoCCoCCkFOmbBc@ueDe@gcC]i{Bw@_fAQswAcA{~BMy|@o@euAe@oxAQo]DsANqBb@yCTeArAuFLi@�West 126th Street
us-midwest-latest.polylines-et{~kAbljnyCyApL_Hhj@�Statler Addition
us-midwest-latest.polylines-_w{~kAtyjnyCfpAfa@�Statler Addition
us-midwest-latest.polylines:oee}mAthsagDN|Jd@hDhBpB|CfAvHxAnJxChHdDzE|DxF|G|FhJlGlPhEbT~@bJb@`LB~Fb@pHdA`IhLd]�\
us-midwest-latest.polylines-_q~zeArjikwDm^cFk}AhDanC?a{@oTumAnKkpLfD�265th Road
us-midwest-latest.polylines-yk{akAhdzxtDS~dCxDlkb@�265th Road
us-midwest-latest.polylines-gd{akA|soutD_C_G{@}H`@cvOSoG{AiG�265th Road
us-midwest-latest.polylines-si{akA`|kxtDgA}g]OwnEg@gtV�265th Road
us-midwest-latest.polylines-okeamApcfc|D}AymCm@ep@GyGuAylCcAym@YyfAm@mbAMeGWiMsAk{BQmZsAe`BmA}qCkA{tBy@o{A_@qV�265th Road

from interpolation.

missinglink avatar missinglink commented on August 17, 2024

@kylebarron I'm still very confused what's happening, there should be a nullbyte (\0) character marking the delimiter between the encoded polyline and the name(s).

For example here is another .0sv file on my computer (I'm replacing the nullbyte with underscores so we can see it):

$ head -n1 file.0sv | sed 's/\x0/___/g'
kgionAlvsegCdb@oJpaAkU___10th Avenue

but when I try this on the output you posted there is no null byte:

$ head -n1 error.0sv | sed 's/\x0/___/g'
m~v~iAzvchrDQpuAm@lfC?tPCjtABj~@Lvb@I~yBI`m@\fZ?z_C?~ESjr@?`ZQpy@Lfm@]n_@Efq@]npAIxo@MnEm@pE{@pEoAvEqBzEkCjFqj@|dAcYbh@yPbXkEzF{KlMwQ|RgIlKaFzJaCbJsAnLwDbi@�West 126th Street

so either there is a copy->paste error or something on your computer is removing them?

edit: I do see a character, what operating system are you using?

from interpolation.

missinglink avatar missinglink commented on August 17, 2024

The format is:

encodedpolyline\0name\0name\0name\n

from interpolation.

kylebarron avatar kylebarron commented on August 17, 2024

image

> cat /etc/*-release
LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Scientific Linux release 6.9 (Carbon)
Scientific Linux release 6.9 (Carbon)

from interpolation.

kylebarron avatar kylebarron commented on August 17, 2024

And that screenshot in text form:

> grep -aC5 'oee}mAthsagDN' *.polylines | sed 's/\x0/___/g'
us-midwest-latest.polylines-m~v~iAzvchrDQpuAm@lfC?tPCjtABj~@Lvb@I~yBI`m@\fZ?z_C?~ESjr@?`ZQpy@Lfm@]n_@Efq@]npAIxo@MnEm@pE{@pEoAvEqBzEkCjFqj@|dAcYbh@yPbXkEzF{KlMwQ|RgIlKaFzJaCbJsAnLwDbi@___West 126th Street
us-midwest-latest.polylines-szfeiAlqs|sDRiHjBwHpDeGdDmDv[iVbC{ChB{CzB{Hr@wDToJ?_dA?iPg@oHoAmF{BqF_JgOyCaF{BqF_AmF[aHp@gEpAqCNqBm@cBoAo@qAJs@pAOhBT|An@~Bx@xD___West 126th Street
us-midwest-latest.polylines-c`qfkAjzomcDg@cDa@gBa@gCE]c@sCi@kEYeCQkDKoCCoCCkFOmbBc@ueDe@gcC]i{Bw@_fAQswAcA{~BMy|@o@euAe@oxAQo]DsANqBb@yCTeArAuFLi@___West 126th Street
us-midwest-latest.polylines-et{~kAbljnyCyApL_Hhj@___Statler Addition
us-midwest-latest.polylines-_w{~kAtyjnyCfpAfa@___Statler Addition
us-midwest-latest.polylines:oee}mAthsagDN|Jd@hDhBpB|CfAvHxAnJxChHdDzE|DxF|G|FhJlGlPhEbT~@bJb@`LB~Fb@pHdA`IhLd]___\
us-midwest-latest.polylines-_q~zeArjikwDm^cFk}AhDanC?a{@oTumAnKkpLfD___265th Road
us-midwest-latest.polylines-yk{akAhdzxtDS~dCxDlkb@___265th Road
us-midwest-latest.polylines-gd{akA|soutD_C_G{@}H`@cvOSoG{AiG___265th Road
us-midwest-latest.polylines-si{akA`|kxtDgA}g]OwnEg@gtV___265th Road
us-midwest-latest.polylines-okeamApcfc|D}AymCm@ep@GyGuAylCcAym@YyfAm@mbAMeGWiMsAk{BQmZsAe`BmA}qCkA{tBy@o{A_@qV___265th Road

from interpolation.

kylebarron avatar kylebarron commented on August 17, 2024

Just FYI I'm not sure that

cat *.polylines | ./interpolate polyline street.db

actually uses all files. At the least, on the following step,

> cat ../../data/openaddresses/us/ak/anchorage.csv | ./interpolate oa address3.db full-us-street.db
0       0/sec
> cat ../../data/openaddresses/us/**/*.csv | ./interpolate oa address4.db full-us-street.db
0       0/sec
> md5sum address3.db address4.db
541954acebe2cbe74a3533795b02e5a1  address3.db
541954acebe2cbe74a3533795b02e5a1  address4.db

When I do

./interpolate oa address.db street.db < $(cat ../../data/openaddresses/us/**/*.csv)

I get an error zsh: fatal error: out of heap memory, so that is at least doing something different.

from interpolation.

missinglink avatar missinglink commented on August 17, 2024

heya, so...

  1. regarding the chars, this seems to be some sort of copy->paste error.

  2. looking at that line ending in Ld]___\ I can see that there is no valid name, it's called '\'. I tried running the import with your data and still couldn't get the error message, anyway I'm going to let this one go as a one-off.

  3. I'm pretty sure the cat *.polylines command works as expected, you can confirm with a command such as ls *.polylines to check the glob pattern is correctly expanded.

  4. You can query the sqlite files directly to see what's in them and get a count of the records in order to confirm if your globbing pattern is inserting more records:

$ sqlite3 street.db 
SQLite version 3.22.0 2018-01-22 18:45:57
Enter ".help" for usage hints.

sqlite> .tables
names         rtree         rtree_parent
polyline      rtree_node    rtree_rowid 

sqlite> select count(*) from polyline;
10

sqlite> select count(*) from names;
10

sqlite> .exit
  1. using the cat command on a file format which has a header (like .csv) will cause the header to be included multiple times in the output stream, this will cause issues.

I have included a script in this repo to handle concatenation of openaddresses csv files:

try this:

export OAPATH=../../data/openaddresses/us
./script/concat_oa.sh

you should get a stream of all the files combined, deduplicated and only containing the header line once.

  1. the error zsh: fatal error: out of heap memory is specific to zsh, I'm not 100% sure on why this is happening but could be something to do with the < operator, try using a pipe on the left hand side instead to ensure that you're ingesting a stream incrementally and not trying to buffer the whole stream in to memory before sending it to the process.

from interpolation.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.