openstreetmap / osmdbt Goto Github PK
View Code? Open in Web Editor NEWOSM Database Replication Tools
License: GNU General Public License v3.0
OSM Database Replication Tools
License: GNU General Public License v3.0
I do realize that this repo is still in an early stage. Do you have plans to provide a state.txt file in the future, and if so to what extent would you be able to replicate today’s file contents?
Hi,
Great work on this tool. We're beginning to use it in production in a few months, hopefully it will have a stable release by then.
I wanted to contribute back to this project so I'm posting the Docker files here for osmdbt.
We have the full cluster for OSM, osmcgimap, nginx proxy for the previous, osmdbt, postgres, planet-dump-ng, imposm3, renderd+mapnik, apache+mod_tile but I'm not sure if I was to break it in parts and post it here or post it as a whole to the openstreetmap wiki. They are containerized individually, as they should be (except for renderd and render_expire, since render_expire cannot work over TCP).
Our workflow is to have a bit of the whole environment (osm-website + tiles) as a private server for our clients.
This is the docker-compose thing I've patched up using docker mult-stage build.
docker-compose.yml
services:
osmpostgres:
environment:
- POSTGRES_PASSWORD
volumes:
- osmpostgres:/var/lib/postgresql/data
osmdbt:
build: build/osmdbt
environment:
- PGDATABASE
- PGHOST
- PGPASSWORD
- PGPORT
- PGUSER
volumes:
- osmdbt:/data:rw
osmdbt-builder:
build: build/osmdbt-builder
osmchanges:
image: nginx:alpine
ports:
- "80:80"
volumes:
- osmdbt:/usr/share/nginx/html:ro
volumes:
osmdbt:
osmpostgres:
build/osmdbt-builder/Dockerfile
FROM debian:buster-slim
ARG OSMDBT_SRC=https://github.com/openstreetmap/osmdbt/archive/master.tar.gz
RUN set -eu; \
apt-get update; \
apt-get install -y --no-install-recommends \
build-essential \
ca-certificates \
cmake \
curl \
gettext-base \
gosu \
libboost-filesystem-dev \
libboost-program-options-dev \
libbz2-dev \
libexpat1-dev \
libosmium2-dev \
libpqxx-dev \
libprotozero-dev \
libyaml-cpp-dev \
pandoc \
zlib1g-dev \
postgresql-common \
postgresql-server-dev-all; \
useradd -r -u 999 osmdbt; \
mkdir /osmdbt-src /osmdbt; \
curl -sL "$OSMDBT_SRC" | tar -xz -C /osmdbt-src --strip-components=1; \
chown -R osmdbt /osmdbt-src /osmdbt; \
cd /osmdbt; \
gosu osmdbt cmake /osmdbt-src; \
gosu osmdbt cmake --build .; \
rm -rf /osmdbt-src; \
apt-get clean; \
rm -rf /var/lib/apt/lists/*;
WORKDIR /osmdbt/src
In binaries are avalable at /osmdbt/src
This yacron for a proper crontab in docker. It's based on the builder image.
build/osmdbt/Dockerfile
FROM osmdbt-builder
FROM debian:buster-slim
ARG YACRON_VERSION=0.11.2/yacron-0.11.2-x86_64-unknown-linux-gnu
RUN set -eu; \
useradd -r -u 999 osmdbt; \
mkdir -p /opt/osmdbt/data; \
chown -R osmdbt:osmdbt /opt/osmdbt; \
apt-get update; \
apt-get install -y --no-install-recommends \
ca-certificates \
curl \
libexpat1 \
libboost-filesystem1.67.0 \
libboost-program-options1.67.0 \
libpqxx-6.2 \
libyaml-cpp0.6 \
postgresql-client-common \
gettext \
zlib1g; \
curl -sL https://github.com/gjcarneiro/yacron/releases/download/$YACRON_VERSION -o /usr/local/bin/yacron; \
chmod +x /usr/local/bin/yacron
COPY --from=osmdbt-builder /osmdbt/src/osmdbt-* /usr/local/bin/
COPY entrypoint /usr/local/bin/entrypoint
COPY --chown=osmdbt:osmdbt *.yaml* /opt/osmdbt/
VOLUME /data
WORKDIR /opt/osmdbt
USER osmdbt
ENTRYPOINT ["entrypoint"]
CMD ["run"]
build/osmdbt/entrypoint.sh
#!/bin/sh
set -eu
envsubst < osmdbt-config.yaml.tpl > osmdbt-config.yaml
case "$1" in
enable) osmdbt-enable-replication;;
disable) osmdbt-disable-replication;;
test) osmdbt-testdb;;
run) exec yacron -c yacron.yaml;;
*) exec "$@" ;;
esac
build/osmdbt/osmdbt-config.yml.template
---
database:
host: "$PGHOST"
port: "$PGPORT"
dbname: "$PGDATABASE"
user: "$PGUSER"
password: "$PGPASSWORD"
replication_slot: "$PGSLOT"
log_dir: /data/logs
changes_dir: /data/changes
tmp_dir: /data/tmp
build/osmdbt/yacron.yaml
jobs:
- name: replicate
command: |
mkdir -p /data/logs /data/changes /data/tmp
osmdbt-get-log --catchup --quiet
osmdbt-catchup --quiet
osmdbt-create-diff --quiet --max-changes=50000
schedule: "* * * * *"
- name: cleanup
command: find /data/logs/ -name '*.log.done' -mtime +7 -delete
schedule: "0 0 * * 0"
image based on osmdbt-builder and postgis images. Both are based on debian:buster-slim
FROM osmdbt-builder
FROM postgis/postgis:11-3.1
COPY --from=osmdbt-builder --chown=root:root /osmdbt/postgresql-plugin/osm-logical.so /usr/lib/postgresql/11/lib/osm-logical.so
CMD ["postgres", "--wal_level=logical", "--max_replication_slots=5"]
When I try run sequence of commands first time, I see error:
0:00] Started osmdbt-create-diff
[ 0:00] osmdbt version 0.6
[ 0:00] libosmium version 2.18.0
[ 0:00] Reading config from '/etc/osmdbt-config.yaml'
[ 0:00] Config:
[ 0:00] Database:
[ 0:00] Host: localhost
[ 0:00] Port: 5432
[ 0:00] Name: osm
[ 0:00] User: openstreetmap
[ 0:00] Password: (not shown)
[ 0:00] Replication Slot: osm_repl
[ 0:00] Directory for log files: /tmp/
[ 0:00] Directory for change files: /planet/replication/minute/data/
[ 0:00] Directory for tmp files: /tmp/
[ 0:00] Directory for run files: /tmp/
[ 0:00] No log files on command line. Looking for log files in log directory...
[ 0:00] 1 log files to read.
[ 0:00] Connecting to database...
[ 0:00] Database version: PostgreSQL 14.11 (Ubuntu 14.11-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
[ 0:00] Reading log file '/tmp/osm-repl-2024-03-09T15:54:16Z-lsn-17-67B2B088.log'...
[ 0:00] Got 6 nodes, 1 ways, 0 relations.
[ 0:00] Populating changeset cache...
[ 0:00] Got 1 changesets.
[ 0:00] Opening output file '/tmp/new-change.osc.gz'...
[ 0:00] Opening output file '/tmp/new-change.osc.pbf'...
[ 0:00] Processing 6 nodes, 1 ways, 0 relations...
[ 0:00] Wrote and synced output file.
[ 0:00] Writing state file '/tmp/new-state.txt'...
Missing state file: '/planet/replication/minute/data/state.txt'
How I can create file 'state.txt' first time?
Can anybody help with this?
Osmosis has a somewhat complex logic to determine ranges of transaction xids. They are used as a baseline for one diff generation run. The calculation includes the current maximum xid as well as information from the previous state.txt file. Due to (perceived) SQL query length restrictions, those xid ranges have an upper bound, which effectively limits the total number of transactions to be extracted in one go.
Long story short, the effect of those limits is, that osmosis never creates diff files that exceed a certain number of underlying database transactions. I should note that xid is a global Postgresql transaction counter, that gets increased by any Rails changes, Database backups, probably vacuuming, etc. Depending on overall system load, that might translate into something like 5-15 minutes (that's my current best guess).
Why do I bring this up? In the (unlikely) event that replication processing stops and needs to be restarted, say 0.5 - 1 hour later, we want to have some control over the size of the generated diff files. It is better to more evenly distribute those changes over a period of time, rather than overwhelm data consumers with one single huge diff file.
IMHO the most difficult task is to find the number of changes for logical replication, which best matches the current osmosis logic. Possibly, a configurable parameter would be best? The idea is to pass this value to pg_logical_slot_get_changes
as upto_nchanges
parameter (instead of NULL).
Some pointers to the relevant osmosis code as reference:
The effect can be seen here: https://gist.github.com/mmd-osm/b7347d23c73aa869d19841d081710b8c
e.g. on 2014-04-18 after a 13 hours outage with several minutes of ~450000 OPL line items (excludes start/end of transaction marker)
And right now here: https://planet.openstreetmap.org/replication/minute/004/095/ , starting at file 147.
I didn’t find a unit test covering the relations table. Is this still missing, and could we add it to have full unit test coverage for all of our tables?
pqxx::work transaction assumes an isolation level read_committed, which could result in both non-repeatable and phantom reads (see https://www.postgresql.org/docs/9.5/transaction-iso.html).
This may lead to inconsistent results in fake-log
when several SQL statements for nodes, ways and relations are executed in a row, while users are still uploading changesets at the same time. As an example: the "way" query fetches new ways that reference nodes, which weren't committed yet at the time the "node" query was executed -> log file is missing some "node" entries later on.
It would be better to use isolation level repeatable read:
This level is different from Read Committed in that a query in a repeatable read transaction sees a snapshot as of the start of the first non-transaction-control statement in the transaction, not as of the start of the current statement within the transaction. Thus, successive SELECT commands within a single transaction see the same data, i.e., they do not see changes made by other transactions that committed after their own transaction started.
The other issue that fake-log needs to address are "in-flight" transactions, as mentioned by @zerebubuth in openstreetmap/operations#154 (comment): This is the problem with timestamps - we have no way to ensure that all the changes before a particular timestamp are visible to the replicator's transaction.
Osmosis currently evaluates txid_current_snapshot to check for any in-flight transactions (you can still see them in today's state.txt files as "txnActiveList=..."), or in the example below as "3043".
It would probably be best to try a few times, until no more in-flight transactions are around, and then start the fake-log
extraction process.
SELECT txid_current_snapshot();
txid_current_snapshot
-----------------------
3043:3045:3043
By the way, Osmosis source code states the following rule re. transaction timestamp in the state.txt file:
A key rule in replication is that the timestamp we specify in our replication state is always equal to or later than all data timestamps. This allows consumers to know that when they pick a timestamp to start replicating from that all data created after that timestamp will be included in subsequent replications.
(Screenshot taken from https://habr.com/en/company/postgrespro/blog/479512/)
I'm moving my comment to a new issue, as requested by @joto
Since compiling and deploying a custom plugin seemed a bit cumbersome, I've been exploring the option to use pgoutput
, a fast binary format-based plugin that's built into Postgresql.
pgoutput is the standard logical decoding output plug-in in PostgreSQL 10+. It is maintained by the PostgreSQL community, and used by PostgreSQL itself for logical replication. This plug-in is always present so no additional libraries need to be installed.1
osmdbt-pgoutput
interprets the raw replication event stream directly and translates it into the same text representation like the osm-logical
plugin today. Most of what osm-logical plugin has been doing before has moved to osmdbt-get-log.cpp and pgoutput.[ch]pp. All command line tools should work like before. Configuration wise, a new database parameter publication
was added to the osmdbt.yaml file.
Maybe in a long term, this approach could simplify our setup, or make it easier to use osmdbt in cloud environments with limited options for deploying custom plugins.
Link: https://github.com/mmd-osm/osmdbt-pgoutput
Follow up for #6
As Ubuntu PPA build for Focal would need to be fetched from https://launchpad.net/~osmadmins/+archive/ubuntu/ppa for deployment to production, the build process needs to run successfully on https://launchpad.net as well.
I've mentioned some changes needed to produce PPA builds for Focal (#6 (comment)). In addition, the osm-logical shared library would need to be installed to successfully pass the unit tests on launchpad.net
You can also run the PPA build process locally via
debuild -k{GPG key ID} -S -sa -d
sudo pbuilder build --distribution focal ../osmdbt_0.1.dsc
pbuilder needs to initially set up via sudo pbuilder --create --distribution focal ...
https://wiki.ubuntu.com/PbuilderHowto or maybe https://blog.packagecloud.io/eng/2015/05/18/building-deb-packages-with-pbuilder/ --> Initializing and deleting the pbuilder chroot
have a bit more detail on this initial set up step.
The overall process is a bit tricky, and tends to be somewhat annoying after a while. If you're stuck somewhere or get some strange error message, I can see if I can help sorting them out.
Caveat: this topic is somewhat involved, and I've only started to realize that more research about the old osmosis logic is needed here to stay compatible with the old logic.
I've noticed that osmdbt uses a different approach to determine the timestamp for a new state.txt file (see create_replication_log_name: osmium::Timestamp now{time};
) compared to the previous osmosis implementation.
Previously, state.txt would print the current timestamp when the process was executed as comment in the very first line.
To highlight the difference in both values, I'm using the example from the July 5 9-hours outage again, and show both the actual execution time (as column "comment in header") and the timestamp
field.
As you can see in the following table, timestamp=
matches the actual object timestamps contained in a minutely diff file, rather than the execution time. The timestamp continues to grow in roughly 30 minutes steps, which matches the maximum number of changes per osmosis run at that time.
File | comment in header | timestamp |
---|---|---|
https://planet.openstreetmap.org/replication/minute/004/095/148.state.txt | #Sun Jul 05 16:17:14 UTC 2020 | 2020-07-05T09:54:46Z |
https://planet.openstreetmap.org/replication/minute/004/095/149.state.txt | #Sun Jul 05 16:18:10 UTC 2020 | 2020-07-05T10:19:44Z |
https://planet.openstreetmap.org/replication/minute/004/095/150.state.txt | #Sun Jul 05 16:19:12 UTC 2020 | 2020-07-05T10:46:31Z |
The actual logic that is used by osmosis is much more involved, and can basically be found in https://github.com/openstreetmap/osmosis/blob/master/osmosis-apidb/src/main/java/org/openstreetmap/osmosis/apidb/v0_6/impl/Replicator.java
It provides some clues about the timestamp field, such as:
A key rule in replication is that the timestamp we specify in our replication state is always equal to or later than all data timestamps. This allows consumers to know that when they pick a timestamp to start replicating from that all data created after that timestamp will be included in subsequent replications.
For sure, this is only the tip of the iceberg, and as stated initially, more research is needed.
Interesting point that I've seen is that osmosis doesn't trust the local system to provide the correct timestamp, and resorts to an SQL query instead: https://github.com/openstreetmap/osmosis/blob/master/osmosis-apidb/src/main/java/org/openstreetmap/osmosis/apidb/v0_6/impl/TimeDao.java#L37
That may not have the highest priority (due to NTP usage across servers), still I wanted to mention this peculiarity.
The CI (using Github actions) of this repository is broken, because Guthub keeps changing the images it runs on. We need a better, stable CI. Ideas and help welcome.
Heads up: Ubuntu 20.04 comes with a new libpqxx version 6. Some of the current code uses patterns which have been deprecated in the meantime and now cause compiler warnings (like executing prepared statements).
In CGImap, I'm using variadic templates to provide a libpqxx 6 like interface even when using older libqxx versions to avoid cluttering the code with tons of #ifdefs: https://github.com/mmd-osm/openstreetmap-cgimap/blob/patch/focal/include/cgimap/backend/apidb/transaction_manager.hpp#L74-L80
https://travis-ci.org/github/openstreetmap/osmdbt/jobs/689928538 reports a linker error with boost-filesystem:
/usr/bin/ld: CMakeFiles/osmdbt-create-diff.dir/osmdbt-create-diff.cpp.o: in function `app(osmium::util::VerboseOutput&, Config const&, CreateDiffOptions const&)':
/home/travis/build/openstreetmap/osmdbt/src/osmdbt-create-diff.cpp:163: undefined reference to `boost::filesystem::path::extension() const'
/usr/bin/ld: /home/travis/build/openstreetmap/osmdbt/src/osmdbt-create-diff.cpp:164: undefined reference to `boost::filesystem::path::filename() const'
/usr/bin/ld: CMakeFiles/osmdbt-create-diff.dir/osmdbt-create-diff.cpp.o: in function `boost::program_options::error_with_option_name::~error_with_option_name()':
/usr/include/boost/program_options/errors.hpp:124: undefined reference to `vtable for boost::program_options::error_with_option_name'
/usr/bin/ld: CMakeFiles/osmdbt-create-diff.dir/osmdbt-create-diff.cpp.o: in function `boost::program_options::validation_error::validation_error(boost::program_options::validation_error::kind_t, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)':
Although it's an example file only, I'm getting an error message for osm-repl
on Postgresql 12.2:
ERROR: replication slot name "osm-repl" contains invalid character
HINT: Replication slot names may only contain lower case letters, numbers, and the underscore character.
Changing "-" to an underscore should be good to fix this issue.
Similar in config.hpp: std::string m_replication_slot{"osm-repl"};
I did a small performance regression test using the current osmdbt
vs. osmosis
, and noticed some slowdown overall. While get-log
did pretty ok, there's quite some slowdown in create-diff
to create osc.gz files.
As it stands, this is mainly caused by a large number of database selects: for each single node, way, and relation and their respective tags/way_nodes and members, a single SQL statement is sent to the database.
Depending on the object type, 500k new objects will result in about 1 to 1.5mio queries, that would need to fit into a minutely diff time frame. Currently, I can't get osmdbt to fit in here, when running on the same host as the database (see test below).
Osmosis already uses bulk database access during replication. For this reason, I labeled this issue as osmosis performance regression.
Respective code from osmosis: https://github.com/openstreetmap/osmosis/blob/master/osmosis-apidb/src/main/java/org/openstreetmap/osmosis/apidb/v0_6/impl/EntityDao.java#L114-L142
CGImap as an example uses the following pattern for bulk fetches: https://github.com/zerebubuth/openstreetmap-cgimap/blob/master/src/backend/apidb/readonly_pgsql_selection.cpp#L136-L159 (ignore the LEFT JOIN on node_tags, that's not relevant here).
Ideally, bulk processing should come with a configurable parameter to control the number of objects to be fetched in one query.
Database version: PostgreSQL 9.5.22, local database, running on SSD. Results consistent across several test runs.
500k ways added, no way_nodes, no tags
Step | Runtime |
---|---|
osmosis | 8s |
osmdbt-get-log --catchup | 12s |
/osmdbt-create-diff | 101s |
- (includes parsing log file) | 2s |
Osmosis: ./osmosis --replicate-apidb authFile=dbauth.txt allowIncorrectSchemaVersion=true validateSchemaVersion=no --write-replication workingDirectory=data
I want to change "sanity check" / "paranoia checks" in osm-logical to emit WARNING (+ ERROR?) log messages. This is in line with how wal2json handles unexpected critical situations: https://github.com/eulerto/wal2json/blob/master/wal2json.c#L1373
Previously, failed paranoia checks would simply add a string like "X NULL id or version" to the output. Once a critical error occurs, osmdbt-get-log should at least report this somehow, and depending on the severity of the issue instead return an error code to stop further processing.
At the moment, osmdbt-get-log would only display:
[ 0:00] Reading replication log...
[ 0:00] No changes found.
[ 0:00] Did not write log file
WIth the proposed changes in #17, this turns into:
[ 0:00] Reading replication log...
[ 0:00] Logical decoding output plugin 'osm-logical' log messages:
[ 0:00] WARNING: NULL id, version or changeset id in table "relations" (10001 times)
[ 0:00]
[ 0:00] There are 1 entries in the replication log.
Alternative as ERROR:
[ 0:00] Reading replication log...
ERROR: NULL id, version or changeset id in table "relations"
CONTEXT: slot "rs2", output plugin "osm-logical", in the change callback, associated LSN AB/5F7E9390
A bit of fine tuning might be required here to decide which log messages should be raised as warning (hence not aborting the process), and which are ERRORs that abort the process. One of the downsides of "ERROR" is that catching up via "pg_logical_slot_get_changes" will not be possible anymore, as Postgresql has to process the faulty entries first in order to get rid of them (which is kind of futile). WARNING seems like a better choice here.
To simulate this error I used the following steps:
psql -d openstretmap
alter table relations rename column changeset_id to changeset_id2;
Insert a new row into table relations
Run osmdbt-get-log
Cleanup: Change column name back after test
There's probably a number of ways to trigger an issue. Something like alter table relations alter column changeset_id drop not null;
, and inserting NULL values for the changeset_id would of course also do
Originally posted by @mmd-osm in #12 (comment)
Can we get a launchpad build for jammy aka 22.04 please @mmd-osm?
To fully replace Osmosis, we need to merge minutely diff files into hourly and daily diffs and I am trying to figure out exactly how this should be done.
In normal operation the program will run
So far that's easy enough. But the question is: What should happen in special situations, say when the program didn't run for some reason and there are older minutelies lying around etc.
What I am bit unclear about is how osmdbt should figure out which minutelies to assemble into an hourly. If if just takes the next 60 it hasn't processed yet, the result might not correspond to full hours, because there might be minutelies missing. But it can't really look at the timestamps either, because its a bit fuzzy how they are generated. The Osmosis code is really hard to read so I haven't figured out yet what it is doing exactly.
In addition to that I would expect something like the following behaviour:
Any opinions on these issues?
osmdbt-disable-replication does not check for any remaining entries to be processed in the replication slot. This may be dangerous as it could lead to loss of data.
pg_logical_slot_peek_changes offers a way to check for existing entries without removing them. Maybe this could be used for an additional check here.
An additional --force may be useful, in case you know what you're doing.
osmdb-create-diff has quite a bit of logic inside. However, the unit test seems to check for an existing node/way id+version only.
I think it would be a good to check a few additional things:
<create>
, <modify>
, <delete>
Apologies, if I missed something that is already covered in the existing tests.
I know this may sound unlikely, however there’s no easy way for the create diff tool to validate if the log file is in fact complete and not corrupted in any way.
Is there anything special we need to do to handle redactions?
osmdbt-fake-log may already be the solution for zerebubuth/osm-logical#2, which is good news.
I have a question on how the overall process looks like:
some time later:
Operator re-establishes logical replication on the database without calling any of the osmdbt-tools, except for enable/disable-replication maybe?
Operator runs osmdbt-fake-log
As the last .osc.gz file might still have some in-flight transactions, I cannot simply use the latest timestamp I find in that osc file, but need to go back some time (a few minutes) into the past. Otherwise, there's a risk that I might miss some object versions. Although that risk has greatly reduced since about a year, it's not zero.
(that's the old discussion about time based vs. transaction based replication -> wiki)
Operator continues with normal osmdbt-get-log and *-create-diff?
I'm not exactly sure how a smooth transition from logical replication to fake-log and back to logical replication looks like. In particular, I'm a bit worried that the diff files may include the same object version multiple times across different osc files, or that there might be even some going back and forth between object versions of the same object.
Somehow I was expecting to see a comparison between the existing osc files and objects to be written in new osc files, as part of the fake-log recovery process, to filter out any anomalies.
As @zerebubuth noted in openstreetmap/operations#154 (comment), transitioning from the old to the new replication may have similar requirements.
Could you possibly include the database port number in the configuration file as well?
As we were having lots of discussions on the plugin topic recently, and more challenges ahead of us when it comes to packaging the plugin for different Postgresql versions, I started exploring the option to use an existing plugin, that is part of the Postgresql repository. wal2json seemed like a good option for this exercise:
This extension is supported on those platforms that PostgreSQL is. The installation steps depend on your operating system. PostgreSQL yum repository and PostgreSQL apt repository provide wal2json packages.
There's one downside I see at the moment: they still have to figure out the same issue with excessive BEGIN/COMMIT entries, we've already solved (see eulerto/wal2json#106). Apart from that, I believe this plugin would be feature complete for our requirements.
I prepared a small demo in https://github.com/mmd-osm/osmdbt/tree/json as a proof of concept. It moves the mapping that was previously done in osm-plugin to a JSON parser in osmdbt-get-log.
In the long run this might be cheaper for us, as can leave all those Postgresql plugins details to others.
By the way, https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html also lists wal2json as supported plugin, albeit in a somewhat older version. So this all seems like production ready code.
The current code of osmdbt-create-diff will write out tags of an OSM object ordered by the key. This makes sure that there is a defined order and that running the code twice will get you the same result.
I believe this is different from
We recently had the case where somebody complained to me that updating a planet file with changes and comparing the resulting objects to the next planet file dump showed differences between them because of differently ordered tags. Yes, we don't guarantee any kind of tag order in OSM files, but it would occasionally be useful, like in this case.
Should tags be ordered? Opinions?
db.cpp, function catchup_to_lsn has a comment "If the advance didn't work it probably means we already have advanced, so that's okay.".
To better understand if there might be an issue, if this step doesn't work as expected, I took the README example and commented out a call to osmdbt-catchup. As the "catch" in that function doesn't raise an error code, the net result should be the same.
Even if I execute this simulation script only once, and then switch back to the original file, I'm getting multiple log files with the same LSN but different timestamps. This translates into identical osc.gz files later on.
I was wondering, if it would be a reasonable thing for osmdbt-get-log
to check for existing (un-)processed log files with the same LSN, and complain, warn or abort processing?
The other question would be, if osmdbt-catchup can check if the catch up did actually happen. There seem to be some limits to that for versions below 10, according to the code in osmdbt-testdb?
#!/bin/sh
set -e
./osmdbt-catchup
./osmdbt-get-log
# optionally copy log file(s) to other hosts
# Simulate catchup_to_lsn did not work (-> no error code)
#./osmdbt-catchup
./osmdbt-create-diff
Multiple files in log directory with same LSN:
915 Jul 16 16:16 osm-repl-2020-07-16T14:16:43Z-lsn-60-40994D28.log.done
915 Jul 16 16:16 osm-repl-2020-07-16T14:16:47Z-lsn-60-40994D28.log.done
I have documented the state.txt
file format here.
Its a bit difficult, because it is a "legacy" format. Any comments on that welcome.
We should probably also document the file system structure of the diff/state files.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.