openstreetmap / osmdbt Goto Github PK

OSM Database Replication Tools

License: GNU General Public License v3.0

CMake 3.01% C++ 83.74% Shell 1.64% PLpgSQL 10.66% C 0.64% Ruby 0.32%

osmdbt's Issues

state.txt

I do realize that this repo is still in an early stage. Do you have plans to provide a state.txt file in the future, and if so to what extent would you be able to replicate today’s file contents?

Docker compose cluster

Hi,

Great work on this tool. We're beginning to use it in production in a few months, hopefully it will have a stable release by then.

I wanted to contribute back to this project so I'm posting the Docker files here for osmdbt.

We have the full cluster for OSM, osmcgimap, nginx proxy for the previous, osmdbt, postgres, planet-dump-ng, imposm3, renderd+mapnik, apache+mod_tile but I'm not sure if I was to break it in parts and post it here or post it as a whole to the openstreetmap wiki. They are containerized individually, as they should be (except for renderd and render_expire, since render_expire cannot work over TCP).

Our workflow is to have a bit of the whole environment (osm-website + tiles) as a private server for our clients.

This is the docker-compose thing I've patched up using docker mult-stage build.

docker-compose.yml

services:
  osmpostgres:
    environment:
      - POSTGRES_PASSWORD
    volumes:
      - osmpostgres:/var/lib/postgresql/data
  osmdbt:
    build: build/osmdbt
    environment:
      - PGDATABASE
      - PGHOST
      - PGPASSWORD
      - PGPORT
      - PGUSER
    volumes:
    - osmdbt:/data:rw
  osmdbt-builder:
    build: build/osmdbt-builder
  osmchanges:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - osmdbt:/usr/share/nginx/html:ro

volumes:
    osmdbt:
    osmpostgres:

Building osmdbt

build/osmdbt-builder/Dockerfile

FROM debian:buster-slim

ARG OSMDBT_SRC=https://github.com/openstreetmap/osmdbt/archive/master.tar.gz

RUN set -eu; \
    apt-get update; \
    apt-get install -y --no-install-recommends \
        build-essential \
        ca-certificates \
        cmake \
        curl \
        gettext-base \
        gosu \
        libboost-filesystem-dev \
        libboost-program-options-dev \
        libbz2-dev \
        libexpat1-dev \
        libosmium2-dev \
        libpqxx-dev \
        libprotozero-dev \
        libyaml-cpp-dev \
        pandoc \
        zlib1g-dev \
        postgresql-common \
        postgresql-server-dev-all; \
    useradd -r -u 999 osmdbt; \
    mkdir /osmdbt-src /osmdbt; \
    curl -sL "$OSMDBT_SRC" | tar -xz -C /osmdbt-src --strip-components=1; \
    chown -R osmdbt /osmdbt-src /osmdbt; \
    cd /osmdbt; \
    gosu osmdbt cmake /osmdbt-src; \
    gosu osmdbt cmake --build .; \
    rm -rf /osmdbt-src; \
    apt-get clean; \
    rm -rf /var/lib/apt/lists/*; 

WORKDIR /osmdbt/src

In binaries are avalable at /osmdbt/src

Running osmdbt in Docker

This yacron for a proper crontab in docker. It's based on the builder image.

build/osmdbt/Dockerfile

FROM osmdbt-builder

FROM debian:buster-slim

ARG YACRON_VERSION=0.11.2/yacron-0.11.2-x86_64-unknown-linux-gnu

RUN set -eu; \
    useradd -r -u 999 osmdbt; \
    mkdir -p /opt/osmdbt/data; \
    chown -R osmdbt:osmdbt /opt/osmdbt; \
    apt-get update; \
    apt-get install -y --no-install-recommends \
        ca-certificates \
        curl \
        libexpat1 \
        libboost-filesystem1.67.0 \
        libboost-program-options1.67.0 \
        libpqxx-6.2 \
        libyaml-cpp0.6 \
        postgresql-client-common \
        gettext \
        zlib1g; \
    curl -sL https://github.com/gjcarneiro/yacron/releases/download/$YACRON_VERSION -o /usr/local/bin/yacron; \
    chmod +x /usr/local/bin/yacron

COPY --from=osmdbt-builder /osmdbt/src/osmdbt-* /usr/local/bin/
COPY entrypoint /usr/local/bin/entrypoint
COPY --chown=osmdbt:osmdbt *.yaml* /opt/osmdbt/

VOLUME /data
WORKDIR /opt/osmdbt

USER osmdbt
ENTRYPOINT ["entrypoint"]
CMD ["run"]

build/osmdbt/entrypoint.sh

#!/bin/sh

set -eu

envsubst < osmdbt-config.yaml.tpl > osmdbt-config.yaml

case "$1" in
    enable) osmdbt-enable-replication;;
    disable) osmdbt-disable-replication;;
    test) osmdbt-testdb;;
    run) exec yacron -c yacron.yaml;;
    *) exec "$@" ;;
esac

build/osmdbt/osmdbt-config.yml.template

---
database:
    host: "$PGHOST"
    port: "$PGPORT"
    dbname: "$PGDATABASE"
    user: "$PGUSER"
    password: "$PGPASSWORD"
    replication_slot: "$PGSLOT"
log_dir: /data/logs
changes_dir: /data/changes
tmp_dir: /data/tmp

build/osmdbt/yacron.yaml

jobs:
  - name: replicate
    command: |
      mkdir -p /data/logs /data/changes /data/tmp
      osmdbt-get-log --catchup --quiet
      osmdbt-catchup --quiet
      osmdbt-create-diff --quiet --max-changes=50000
    schedule: "* * * * *"
  - name: cleanup
    command: find /data/logs/ -name '*.log.done' -mtime +7 -delete
    schedule: "0 0 * * 0"

Postgres

image based on osmdbt-builder and postgis images. Both are based on debian:buster-slim

FROM osmdbt-builder

FROM postgis/postgis:11-3.1
COPY --from=osmdbt-builder --chown=root:root /osmdbt/postgresql-plugin/osm-logical.so /usr/lib/postgresql/11/lib/osm-logical.so

CMD ["postgres", "--wal_level=logical", "--max_replication_slots=5"]

osmdbt and first minute diff

When I try run sequence of commands first time, I see error:

0:00] Started osmdbt-create-diff
[ 0:00] osmdbt version 0.6
[ 0:00] libosmium version 2.18.0
[ 0:00] Reading config from '/etc/osmdbt-config.yaml'
[ 0:00] Config:
[ 0:00] Database:
[ 0:00] Host: localhost
[ 0:00] Port: 5432
[ 0:00] Name: osm
[ 0:00] User: openstreetmap
[ 0:00] Password: (not shown)
[ 0:00] Replication Slot: osm_repl
[ 0:00] Directory for log files: /tmp/
[ 0:00] Directory for change files: /planet/replication/minute/data/
[ 0:00] Directory for tmp files: /tmp/
[ 0:00] Directory for run files: /tmp/
[ 0:00] No log files on command line. Looking for log files in log directory...
[ 0:00] 1 log files to read.
[ 0:00] Connecting to database...
[ 0:00] Database version: PostgreSQL 14.11 (Ubuntu 14.11-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
[ 0:00] Reading log file '/tmp/osm-repl-2024-03-09T15:54:16Z-lsn-17-67B2B088.log'...
[ 0:00] Got 6 nodes, 1 ways, 0 relations.
[ 0:00] Populating changeset cache...
[ 0:00] Got 1 changesets.
[ 0:00] Opening output file '/tmp/new-change.osc.gz'...
[ 0:00] Opening output file '/tmp/new-change.osc.pbf'...
[ 0:00] Processing 6 nodes, 1 ways, 0 relations...
[ 0:00] Wrote and synced output file.
[ 0:00] Writing state file '/tmp/new-state.txt'...
Missing state file: '/planet/replication/minute/data/state.txt'

How I can create file 'state.txt' first time?

Can anybody help with this?

Limit total number of changes per diff generation (osmosis compatibility)

Osmosis has a somewhat complex logic to determine ranges of transaction xids. They are used as a baseline for one diff generation run. The calculation includes the current maximum xid as well as information from the previous state.txt file. Due to (perceived) SQL query length restrictions, those xid ranges have an upper bound, which effectively limits the total number of transactions to be extracted in one go.

Long story short, the effect of those limits is, that osmosis never creates diff files that exceed a certain number of underlying database transactions. I should note that xid is a global Postgresql transaction counter, that gets increased by any Rails changes, Database backups, probably vacuuming, etc. Depending on overall system load, that might translate into something like 5-15 minutes (that's my current best guess).

Why do I bring this up? In the (unlikely) event that replication processing stops and needs to be restarted, say 0.5 - 1 hour later, we want to have some control over the size of the generated diff files. It is better to more evenly distribute those changes over a period of time, rather than overwhelm data consumers with one single huge diff file.

IMHO the most difficult task is to find the number of changes for logical replication, which best matches the current osmosis logic. Possibly, a configurable parameter would be best? The idea is to pass this value to pg_logical_slot_get_changes as upto_nchanges parameter (instead of NULL).

Some pointers to the relevant osmosis code as reference:

The effect can be seen here: https://gist.github.com/mmd-osm/b7347d23c73aa869d19841d081710b8c
e.g. on 2014-04-18 after a 13 hours outage with several minutes of ~450000 OPL line items (excludes start/end of transaction marker)

And right now here: https://planet.openstreetmap.org/replication/minute/004/095/ , starting at file 147.

Unit test for relations table

I didn’t find a unit test covering the relations table. Is this still missing, and could we add it to have full unit test coverage for all of our tables?

fake-log, isolation level and concurrent txns

pqxx::work transaction assumes an isolation level read_committed, which could result in both non-repeatable and phantom reads (see https://www.postgresql.org/docs/9.5/transaction-iso.html).

This may lead to inconsistent results in fake-log when several SQL statements for nodes, ways and relations are executed in a row, while users are still uploading changesets at the same time. As an example: the "way" query fetches new ways that reference nodes, which weren't committed yet at the time the "node" query was executed -> log file is missing some "node" entries later on.

It would be better to use isolation level repeatable read:

This level is different from Read Committed in that a query in a repeatable read transaction sees a snapshot as of the start of the first non-transaction-control statement in the transaction, not as of the start of the current statement within the transaction. Thus, successive SELECT commands within a single transaction see the same data, i.e., they do not see changes made by other transactions that committed after their own transaction started.

The other issue that fake-log needs to address are "in-flight" transactions, as mentioned by @zerebubuth in openstreetmap/operations#154 (comment): This is the problem with timestamps - we have no way to ensure that all the changes before a particular timestamp are visible to the replicator's transaction.

Osmosis currently evaluates txid_current_snapshot to check for any in-flight transactions (you can still see them in today's state.txt files as "txnActiveList=..."), or in the example below as "3043".

It would probably be best to try a few times, until no more in-flight transactions are around, and then start the fake-log extraction process.

SELECT txid_current_snapshot();
 txid_current_snapshot 
-----------------------
 3043:3045:3043

By the way, Osmosis source code states the following rule re. transaction timestamp in the state.txt file:

A key rule in replication is that the timestamp we specify in our replication state is always equal to or later than all data timestamps. This allows consumers to know that when they pick a timestamp to start replicating from that all data created after that timestamp will be included in subsequent replications.

(Screenshot taken from https://habr.com/en/company/postgrespro/blog/479512/)

osmdbt and pgoutput

I'm moving my comment to a new issue, as requested by @joto

Since compiling and deploying a custom plugin seemed a bit cumbersome, I've been exploring the option to use pgoutput, a fast binary format-based plugin that's built into Postgresql.

pgoutput is the standard logical decoding output plug-in in PostgreSQL 10+. It is maintained by the PostgreSQL community, and used by PostgreSQL itself for logical replication. This plug-in is always present so no additional libraries need to be installed.¹

osmdbt-pgoutput interprets the raw replication event stream directly and translates it into the same text representation like the osm-logical plugin today. Most of what osm-logical plugin has been doing before has moved to osmdbt-get-log.cpp and pgoutput.[ch]pp. All command line tools should work like before. Configuration wise, a new database parameter publication was added to the osmdbt.yaml file.

Maybe in a long term, this approach could simplify our setup, or make it easier to use osmdbt in cloud environments with limited options for deploying custom plugins.

Link: https://github.com/mmd-osm/osmdbt-pgoutput

Quoting https://debezium.io/documentation/reference/stable/connectors/postgresql.html ↩

launchpad.net Ubuntu PPA build

Follow up for #6

As Ubuntu PPA build for Focal would need to be fetched from https://launchpad.net/~osmadmins/+archive/ubuntu/ppa for deployment to production, the build process needs to run successfully on https://launchpad.net as well.

I've mentioned some changes needed to produce PPA builds for Focal (#6 (comment)). In addition, the osm-logical shared library would need to be installed to successfully pass the unit tests on launchpad.net

You can also run the PPA build process locally via

debuild -k{GPG key ID} -S -sa -d
sudo pbuilder build --distribution focal ../osmdbt_0.1.dsc

pbuilder needs to initially set up via sudo pbuilder --create --distribution focal ...
https://wiki.ubuntu.com/PbuilderHowto or maybe https://blog.packagecloud.io/eng/2015/05/18/building-deb-packages-with-pbuilder/ --> Initializing and deleting the pbuilder chroot have a bit more detail on this initial set up step.

The overall process is a bit tricky, and tends to be somewhat annoying after a while. If you're stuck somewhere or get some strange error message, I can see if I can help sorting them out.

state.txt timestamp - and osmosis compatibility

Caveat: this topic is somewhat involved, and I've only started to realize that more research about the old osmosis logic is needed here to stay compatible with the old logic.

I've noticed that osmdbt uses a different approach to determine the timestamp for a new state.txt file (see create_replication_log_name: osmium::Timestamp now{time};) compared to the previous osmosis implementation.

Previously, state.txt would print the current timestamp when the process was executed as comment in the very first line.

To highlight the difference in both values, I'm using the example from the July 5 9-hours outage again, and show both the actual execution time (as column "comment in header") and the timestamp field.

As you can see in the following table, timestamp= matches the actual object timestamps contained in a minutely diff file, rather than the execution time. The timestamp continues to grow in roughly 30 minutes steps, which matches the maximum number of changes per osmosis run at that time.

File	comment in header	timestamp
https://planet.openstreetmap.org/replication/minute/004/095/148.state.txt	#Sun Jul 05 16:17:14 UTC 2020	2020-07-05T09:54:46Z
https://planet.openstreetmap.org/replication/minute/004/095/149.state.txt	#Sun Jul 05 16:18:10 UTC 2020	2020-07-05T10:19:44Z
https://planet.openstreetmap.org/replication/minute/004/095/150.state.txt	#Sun Jul 05 16:19:12 UTC 2020	2020-07-05T10:46:31Z

The actual logic that is used by osmosis is much more involved, and can basically be found in https://github.com/openstreetmap/osmosis/blob/master/osmosis-apidb/src/main/java/org/openstreetmap/osmosis/apidb/v0_6/impl/Replicator.java

It provides some clues about the timestamp field, such as:

A key rule in replication is that the timestamp we specify in our replication state is always equal to or later than all data timestamps. This allows consumers to know that when they pick a timestamp to start replicating from that all data created after that timestamp will be included in subsequent replications.

For sure, this is only the tip of the iceberg, and as stated initially, more research is needed.

Origin of timestamp information

Interesting point that I've seen is that osmosis doesn't trust the local system to provide the correct timestamp, and resorts to an SQL query instead: https://github.com/openstreetmap/osmosis/blob/master/osmosis-apidb/src/main/java/org/openstreetmap/osmosis/apidb/v0_6/impl/TimeDao.java#L37

That may not have the highest priority (due to NTP usage across servers), still I wanted to mention this peculiarity.

CI is broken

The CI (using Github actions) of this repository is broken, because Guthub keeps changing the images it runs on. We need a better, stable CI. Ideas and help welcome.

Ubuntu 20.04 changes

Heads up: Ubuntu 20.04 comes with a new libpqxx version 6. Some of the current code uses patterns which have been deprecated in the meantime and now cause compiler warnings (like executing prepared statements).

In CGImap, I'm using variadic templates to provide a libpqxx 6 like interface even when using older libqxx versions to avoid cluttering the code with tons of #ifdefs: https://github.com/mmd-osm/openstreetmap-cgimap/blob/patch/focal/include/cgimap/backend/apidb/transaction_manager.hpp#L74-L80

https://travis-ci.org/github/openstreetmap/osmdbt/jobs/689928538 reports a linker error with boost-filesystem:

/usr/bin/ld: CMakeFiles/osmdbt-create-diff.dir/osmdbt-create-diff.cpp.o: in function `app(osmium::util::VerboseOutput&, Config const&, CreateDiffOptions const&)':
/home/travis/build/openstreetmap/osmdbt/src/osmdbt-create-diff.cpp:163: undefined reference to `boost::filesystem::path::extension() const'
/usr/bin/ld: /home/travis/build/openstreetmap/osmdbt/src/osmdbt-create-diff.cpp:164: undefined reference to `boost::filesystem::path::filename() const'
/usr/bin/ld: CMakeFiles/osmdbt-create-diff.dir/osmdbt-create-diff.cpp.o: in function `boost::program_options::error_with_option_name::~error_with_option_name()':
/usr/include/boost/program_options/errors.hpp:124: undefined reference to `vtable for boost::program_options::error_with_option_name'
/usr/bin/ld: CMakeFiles/osmdbt-create-diff.dir/osmdbt-create-diff.cpp.o: in function `boost::program_options::validation_error::validation_error(boost::program_options::validation_error::kind_t, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)':

replication_slot in osmdbt-config.yaml example

Although it's an example file only, I'm getting an error message for osm-repl on Postgresql 12.2:

ERROR:  replication slot name "osm-repl" contains invalid character
HINT:  Replication slot names may only contain lower case letters, numbers, and the underscore character.

Changing "-" to an underscore should be good to fix this issue.

Similar in config.hpp: std::string m_replication_slot{"osm-repl"};

Fetching objects in create-diff (osmosis performance regression)

I did a small performance regression test using the current osmdbt vs. osmosis, and noticed some slowdown overall. While get-log did pretty ok, there's quite some slowdown in create-diff to create osc.gz files.

As it stands, this is mainly caused by a large number of database selects: for each single node, way, and relation and their respective tags/way_nodes and members, a single SQL statement is sent to the database.

Depending on the object type, 500k new objects will result in about 1 to 1.5mio queries, that would need to fit into a minutely diff time frame. Currently, I can't get osmdbt to fit in here, when running on the same host as the database (see test below).

Osmosis already uses bulk database access during replication. For this reason, I labeled this issue as osmosis performance regression.

Respective code from osmosis: https://github.com/openstreetmap/osmosis/blob/master/osmosis-apidb/src/main/java/org/openstreetmap/osmosis/apidb/v0_6/impl/EntityDao.java#L114-L142

CGImap as an example uses the following pattern for bulk fetches: https://github.com/zerebubuth/openstreetmap-cgimap/blob/master/src/backend/apidb/readonly_pgsql_selection.cpp#L136-L159 (ignore the LEFT JOIN on node_tags, that's not relevant here).

Ideally, bulk processing should come with a configurable parameter to control the number of objects to be fetched in one query.

Test

Database version: PostgreSQL 9.5.22, local database, running on SSD. Results consistent across several test runs.

500k ways added, no way_nodes, no tags

Step	Runtime
osmosis	8s

osmdbt-get-log --catchup	12s
/osmdbt-create-diff	101s
- (includes parsing log file)	2s

Osmosis: ./osmosis --replicate-apidb authFile=dbauth.txt allowIncorrectSchemaVersion=true validateSchemaVersion=no --write-replication workingDirectory=data

Osm-logical: sanity checks

I want to change "sanity check" / "paranoia checks" in osm-logical to emit WARNING (+ ERROR?) log messages. This is in line with how wal2json handles unexpected critical situations: https://github.com/eulerto/wal2json/blob/master/wal2json.c#L1373

Previously, failed paranoia checks would simply add a string like "X NULL id or version" to the output. Once a critical error occurs, osmdbt-get-log should at least report this somehow, and depending on the severity of the issue instead return an error code to stop further processing.

At the moment, osmdbt-get-log would only display:

[ 0:00] Reading replication log...
[ 0:00] No changes found.
[ 0:00] Did not write log file

WIth the proposed changes in #17, this turns into:

[ 0:00] Reading replication log...
[ 0:00] Logical decoding output plugin 'osm-logical' log messages:
[ 0:00] WARNING:  NULL id, version or changeset id in table "relations" (10001 times)
[ 0:00] 
[ 0:00] There are 1 entries in the replication log.

Alternative as ERROR:

[ 0:00] Reading replication log...
ERROR:  NULL id, version or changeset id in table "relations"
CONTEXT:  slot "rs2", output plugin "osm-logical", in the change callback, associated LSN AB/5F7E9390

A bit of fine tuning might be required here to decide which log messages should be raised as warning (hence not aborting the process), and which are ERRORs that abort the process. One of the downsides of "ERROR" is that catching up via "pg_logical_slot_get_changes" will not be possible anymore, as Postgresql has to process the faulty entries first in order to get rid of them (which is kind of futile). WARNING seems like a better choice here.

To simulate this error I used the following steps:

psql -d openstretmap
alter table relations rename column changeset_id to changeset_id2;

Insert a new row into table relations

Run osmdbt-get-log

Cleanup: Change column name back after test

There's probably a number of ways to trigger an issue. Something like alter table relations alter column changeset_id drop not null;, and inserting NULL values for the changeset_id would of course also do

Originally posted by @mmd-osm in #12 (comment)

Build for Ubuntu 22.04

Can we get a launchpad build for jammy aka 22.04 please @mmd-osm?

Merging diff files

To fully replace Osmosis, we need to merge minutely diff files into hourly and daily diffs and I am trying to figure out exactly how this should be done.

In normal operation the program will run

once an hour (shortly after the full hour) to assemble 60 minutely diffs into one hourly diff and
once a day (shortly after midnight) to assemble 24 hourly diffs into one daily diff.

So far that's easy enough. But the question is: What should happen in special situations, say when the program didn't run for some reason and there are older minutelies lying around etc.

What I am bit unclear about is how osmdbt should figure out which minutelies to assemble into an hourly. If if just takes the next 60 it hasn't processed yet, the result might not correspond to full hours, because there might be minutelies missing. But it can't really look at the timestamps either, because its a bit fuzzy how they are generated. The Osmosis code is really hard to read so I haven't figured out yet what it is doing exactly.

In addition to that I would expect something like the following behaviour:

For every hour fully in the past, if there is at least one minutely, create the hourly from all minutelies in that hour. And the analogue for the daylies.
Independently of when it is run, hours that are not "done" should not be processed. So if the program is run at half past, it should ignore the 30 minutely files lying around. Same for days.
Don't create empty diff files. This could happen if the server is down for some extended period.
It should be possible to "reset" the program to some earlier point in time, so that if we have created defective hourly or daily files for some reason, we can re-generate them easily.
The diff and state files should be generated outside the directory tree and synced before being moved in place, so that if the program dies for some reason, no partial output is generated. The next run should ignore partial work done before and do the right thing.

Any opinions on these issues?

disable-replication should check for remaining entries to be processed

osmdbt-disable-replication does not check for any remaining entries to be processed in the replication slot. This may be dangerous as it could lead to loss of data.

pg_logical_slot_peek_changes offers a way to check for existing entries without removing them. Maybe this could be used for an additional check here.

An additional --force may be useful, in case you know what you're doing.

Unit tests for create-diff

osmdb-create-diff has quite a bit of logic inside. However, the unit test seems to check for an existing node/way id+version only.
I think it would be a good to check a few additional things:

Test create, modify and delete operations and check that they end up with the correct enclosing XML tag <create>, <modify>, <delete>
Check that each object header field is correctly filled, including timestamp, uid, user, and changeset id
Check that all tags are included
Check ways include all way nodes, sequence of nodes is correct
Check relations include all relation members, members have correct member type and role, sequence of members is correct
Changeset cache works correctly

Apologies, if I missed something that is already covered in the existing tests.

Log file format

I know this may sound unlikely, however there’s no easy way for the create diff tool to validate if the log file is in fact complete and not corrupted in any way.

Handling redactions

Is there anything special we need to do to handle redactions?

Q: fake-log process and duplicates in diff files

osmdbt-fake-log may already be the solution for zerebubuth/osm-logical#2, which is good news.

I have a question on how the overall process looks like:

Normal replication breaks down

some time later:

Operator re-establishes logical replication on the database without calling any of the osmdbt-tools, except for enable/disable-replication maybe?
Operator runs osmdbt-fake-log
As the last .osc.gz file might still have some in-flight transactions, I cannot simply use the latest timestamp I find in that osc file, but need to go back some time (a few minutes) into the past. Otherwise, there's a risk that I might miss some object versions. Although that risk has greatly reduced since about a year, it's not zero.
(that's the old discussion about time based vs. transaction based replication -> wiki)
Operator continues with normal osmdbt-get-log and *-create-diff?

I'm not exactly sure how a smooth transition from logical replication to fake-log and back to logical replication looks like. In particular, I'm a bit worried that the diff files may include the same object version multiple times across different osc files, or that there might be even some going back and forth between object versions of the same object.

Somehow I was expecting to see a comparison between the existing osc files and objects to be written in new osc files, as part of the fake-log recovery process, to filter out any anomalies.

As @zerebubuth noted in openstreetmap/operations#154 (comment), transitioning from the old to the new replication may have similar requirements.

Include Database port number in config

Could you possibly include the database port number in the configuration file as well?

Exploring wal2json

As we were having lots of discussions on the plugin topic recently, and more challenges ahead of us when it comes to packaging the plugin for different Postgresql versions, I started exploring the option to use an existing plugin, that is part of the Postgresql repository. wal2json seemed like a good option for this exercise:

This extension is supported on those platforms that PostgreSQL is. The installation steps depend on your operating system. PostgreSQL yum repository and PostgreSQL apt repository provide wal2json packages.

There's one downside I see at the moment: they still have to figure out the same issue with excessive BEGIN/COMMIT entries, we've already solved (see eulerto/wal2json#106). Apart from that, I believe this plugin would be feature complete for our requirements.

I prepared a small demo in https://github.com/mmd-osm/osmdbt/tree/json as a proof of concept. It moves the mapping that was previously done in osm-plugin to a JSON parser in osmdbt-get-log.

In the long run this might be cheaper for us, as can leave all those Postgresql plugins details to others.

By the way, https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html also lists wal2json as supported plugin, albeit in a somewhat older version. So this all seems like production ready code.

Order of tags in object

The current code of osmdbt-create-diff will write out tags of an OSM object ordered by the key. This makes sure that there is a defined order and that running the code twice will get you the same result.

I believe this is different from

how Osmosis does it and
how planet-dump-ng does it

We recently had the case where somebody complained to me that updating a planet file with changes and comparing the resulting objects to the next planet file dump showed differences between them because of differently ordered tags. Yes, we don't guarantee any kind of tag order in OSM files, but it would occasionally be useful, like in this case.

Should tags be ordered? Opinions?

@zerebubuth

Should get-log check for existing log files with same LSN?

db.cpp, function catchup_to_lsn has a comment "If the advance didn't work it probably means we already have advanced, so that's okay.".

To better understand if there might be an issue, if this step doesn't work as expected, I took the README example and commented out a call to osmdbt-catchup. As the "catch" in that function doesn't raise an error code, the net result should be the same.

Even if I execute this simulation script only once, and then switch back to the original file, I'm getting multiple log files with the same LSN but different timestamps. This translates into identical osc.gz files later on.

I was wondering, if it would be a reasonable thing for osmdbt-get-log to check for existing (un-)processed log files with the same LSN, and complain, warn or abort processing?

The other question would be, if osmdbt-catchup can check if the catch up did actually happen. There seem to be some limits to that for versions below 10, according to the code in osmdbt-testdb?

#!/bin/sh

set -e

./osmdbt-catchup
./osmdbt-get-log
# optionally copy log file(s) to other hosts

# Simulate catchup_to_lsn did not work (-> no error code)
#./osmdbt-catchup
./osmdbt-create-diff

Multiple files in log directory with same LSN:

 915 Jul 16 16:16  osm-repl-2020-07-16T14:16:43Z-lsn-60-40994D28.log.done
 915 Jul 16 16:16  osm-repl-2020-07-16T14:16:47Z-lsn-60-40994D28.log.done

Documenting state.txt file format

I have documented the state.txt file format here.
Its a bit difficult, because it is a "legacy" format. Any comments on that welcome.

We should probably also document the file system structure of the diff/state files.

openstreetmap / osmdbt Goto Github PK

osmdbt's Issues

Building osmdbt

Running osmdbt in Docker

Postgres

Footnotes

Origin of timestamp information

Test

Recommend Projects

Recommend Topics

Recommend Org