Giter Club home page Giter Club logo

Comments (15)

joto avatar joto commented on July 22, 2024

Did you check whether the MD5 matches (see planet-210524.osm.bz2.md5)?

from planet-dump-ng.

zerebubuth avatar zerebubuth commented on July 22, 2024

I figured that if the file was corrupt, it would be very unlikely for bzip2 to output anything other than garbage. But playing around with it now, it does seem as if a corrupt bz2 file can decompress into something that isn't completely noise.

Unhelpfully, it seems that bzcat doesn't stop output when it senses a CRC error, but just outputs a warning to stderr and exits with a non-zero code after processing the rest of the file. So if you're not checking stderr or the exit code, it would be easy to think it had succeeded.

I started testing the original file on the planet server, but it is taking a very, very long time. I'll update here when it's finished.

from planet-dump-ng.

gartenkralle avatar gartenkralle commented on July 22, 2024

Did you check whether the MD5 matches (see planet-210524.osm.bz2.md5)?

Yes, did match.

from planet-dump-ng.

zerebubuth avatar zerebubuth commented on July 22, 2024

This is a bit weird - the planet file on the server looks completely fine. I grepped it for the way ID you mention, and the result is:

<way id="933805767" timestamp="2021-04-22T09:46:48Z" version="1" changeset="103400299" user="lipsigal" uid="438670">
  <nd ref="8654953875"/>
  <nd ref="8654953876"/>
  <nd ref="8654953877"/>
  ...

with no chaer type= or skipping into the relations section.

So if the file on the server is OK, and the MD5sum matches, and it matches your downloaded file too, does that mean that whatever problem is occurring must be during or after decompression? How are you decompressing? Using bzcat on the fly, or bunzip2, or something else?

from planet-dump-ng.

gartenkralle avatar gartenkralle commented on July 22, 2024

I have used 7-zip file manager version 19 under windows 10 x64.

I will try another decompressor. Thanks for investigating so far.

from planet-dump-ng.

gartenkralle avatar gartenkralle commented on July 22, 2024

This time I tried to uncompress with another tool (https://github.com/philr/bzip2-windows/releases) but same result.

Any more guesses?

from planet-dump-ng.

joto avatar joto commented on July 22, 2024

Looks to me like you (@gartenkralle) might have a problem with your hardware, faulty memory or so. I suggest running a memory tester.

from planet-dump-ng.

zerebubuth avatar zerebubuth commented on July 22, 2024

I think it's unlikely that a hardware fault would affect the decompression in exactly the same way with two different programs (with different memory layouts, etc...).

@gartenkralle are you decompressing the whole file? (In other words, you have a file called planet-210524.osm which is not compressed? Please could you tell me how big it is, and what the MD5sum is of the decompressed file?

from planet-dump-ng.

gartenkralle avatar gartenkralle commented on July 22, 2024

Did a 2 cycle memory check. No faulty memory found.

Yes I decompressed the whole file. Decompressing again and then run MD5sum on it. Results I will report in some days...

from planet-dump-ng.

gartenkralle avatar gartenkralle commented on July 22, 2024

Size: 1.542.302.591.588 Bytes

MD5 now running...

from planet-dump-ng.

gartenkralle avatar gartenkralle commented on July 22, 2024

MD5 checksum: dfdff2778d0dfad6569ecc2b3613fbb4

from planet-dump-ng.

zerebubuth avatar zerebubuth commented on July 22, 2024

Here's what I got, for the same input file (our MD5s match for the .osm.bz2) - I guess the computer I was using was much slower!

MD5: 2cf5fcca63685b13440902f0f1fa24e6
Size: 1,542,302,591,588

We get the same size, but different MD5s. I think something might be going wrong because it's a 1.4TiB file, and that might be pushing the limits of what the decompression software has been tested with (perhaps some subtle bugs when the file length / offset exceeds 40 bits?)

It might be worth trying some other software. I'm using bzip2, a block-sorting file compressor. Version 1.0.8, 13-Jul-2019 on Linux, so it might be worth trying to replicate that (either a virtual machine, or Windows Subsystem for Linux).

Alternatively, is it possible to do what you wanted without decompressing the whole file? If whatever is parsing the OSM file is capable of streaming (e.g: SAX or event parser) then you could bzcat planet.osm.bz2 | whatever and not need to uncompress the whole thing.

Finally, if all those things won't work, then it might be worth rewriting your parser to use the PBF binary file. The data inside is exactly the same, but the PBF is about half the size of the XML and 10 or more times quicker to parse. @joto's excellent https://github.com/osmcode/libosmium is a well-tested and fast library for parsing PBFs, and there's a suite of utilities (https://github.com/osmcode/osmium-tool) for common tasks such as making geographic extracts and filtering by tags. (I think it builds on Windows, but I don't know enough about Windows to say for sure.)

from planet-dump-ng.

gartenkralle avatar gartenkralle commented on July 22, 2024

Thanks for all your tips. Even with bzip2 under cygwin I got wrong MD5 checksum. Maybe a very low level bug or file system bug. Now I try doing on linux and transfering file to windows. Otherwise I will go with the PBF.

from planet-dump-ng.

mmd-osm avatar mmd-osm commented on July 22, 2024

@gartenkralle : do you have any updates on this? Can this issue be closed now?

from planet-dump-ng.

gartenkralle avatar gartenkralle commented on July 22, 2024

Yes, issue can be closed.

The tool which calculated the checksum after decompression was wrong. I did a mistake in my parsing method. In the xml file there are relations which has no members. I have not considered that case. Additionally I did not consider that utf-8 has variable sized chars. After fixing it worked fine.

from planet-dump-ng.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.