Giter Club home page Giter Club logo

osmpbf's Introduction

osmpbf

A Rust library for reading the OpenStreetMap PBF file format (*.osm.pbf). It strives to offer the best performance using parallelization and lazy-decoding with a simple interface while also exposing iterators for items of every level in a PBF file.

Build status Build status Crates.io Documentation

Usage

Add this to your Cargo.toml:

[dependencies]
osmpbf = "0.2"

Here's a simple example that counts all the ways in a file:

use osmpbf::{ElementReader, Element};

let reader = ElementReader::from_path("tests/test.osm.pbf")?;
let mut ways = 0_u64;

// Increment the counter by one for each way.
reader.for_each(|element| {
    if let Element::Way(_) = element {
        ways += 1;
    }
})?;

println!("Number of ways: {ways}");

In this second example, we also count the ways but make use of all cores by decoding the file in parallel:

use osmpbf::{ElementReader, Element};

let reader = ElementReader::from_path("tests/test.osm.pbf")?;

// Count the ways
let ways = reader.par_map_reduce(
    |element| {
        match element {
            Element::Way(_) => 1,
            _ => 0,
        }
    },
    || 0_u64,      // Zero is the identity value for addition
    |a, b| a + b   // Sum the partial results
)?;

println!("Number of ways: {ways}");

Build Features

  • rust-zlib (default) -- use the pure Rust zlib implementationminiz_oxide
  • zlib -- use the widely available zlib library
  • zlib-ng -- use the zlib-ng library for better performance.

The PBF format

To effectively use the more lower-level features of this library it is useful to have an overview of the structure of a PBF file. For a more detailed format description see here or take a look at the .proto files in this repository.

The PBF format as a hierarchy (square brackets [] denote arrays):

Blob[]
├── HeaderBlock
└── PrimitiveBlock
    └── PrimitiveGroup[]
    	├── Node[]
    	├── DenseNodes
    	├── Way[]
        └── Relation[]

At the highest level a PBF file consists of a sequence of blobs. Each Blob can be decoded into either a HeaderBlock or a PrimitiveBlock.

Iterating over blobs is very fast, but decoding might involve a more expensive decompression step. So especially for larger files it is advisable to parallelize at the blob level as each blob can be decompressed independently. (See the reader module in this library for parallel methods)

Usually the first Blob of a file decodes to a HeaderBlock which holds global information for all following PrimitiveBlocks, such as a list of required parser features.

A PrimitiveBlock contains an array of PrimitiveGroups. Each PrimitiveGroup only contains one element type: Node, Way, Relation or DenseNodes. A DenseNodes item is an alternative and space-saving representation of a Node array. So, do not forget to check for DenseNodes when aggregating all nodes in a file.

Elements reference each other using integer IDs. Corresponding elements could be stored in any blob, so finding them can involve iterating over the whole file. Some files declare an optional feature "Sort.Type_then_ID" in the HeaderBlock to indicate that elements are stored sorted by their type and then ID. This can be used to dramatically reduce the search space.

License

This project is licensed under either of

at your option.

osmpbf's People

Contributors

b-r-u avatar nyurik avatar danlarkin avatar nikmikov avatar nvarner avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.