Comments (2)
I have looked into this some more and believe the comment you found in the Osmosis code isn't really that helpful. And the Osmosis code is overly complex.
The basic problem with replication in our case is this:
There can be multiple transactions open in the database at the same time with objects being written to the database with whatever timestamp is current at that point in time. When a transaction is committed, the objects show up in the log file, but there might be other transactions still open with objects with an unknown timestamp (which might be smaller or larger than any timestamp we have in the committed transaction). So we can not know whether there are unprocessed objects with older and/or newer timestamps than we currently have. So it might be that there are objects with timestamp t
in change file 1234.osc.gz and in 1235.osc.gz there is an object with a timestamp <t
. That's why we can't use the timestamps as some kind of marker for the replication, but we use the sequence numbers instead.
So it doesn't really matter all that much what exact timestamp is in the file. For an ongoing replication you simply keep incrementing the sequence number. When you set up a new replication you have to find out at what point in time you want to start the replication, then go back "a bit" and start from there. Usually you will start from a planet dump, but they use a separate mechanism that is not synced with the change files, so there is no clean way to start consuming change files exactly at that point where a planet dump is. So you go back "a bit" and hope that it all works out. We should document what this "a bit" is and tell users how to properly consume changes, but this isn't really a concern for us here.
I have just changed the code in 58513a5 and put the largest timestamp of any of the objects in the .osc.gz file into the state.txt, because I believe that is probably the most useful timestamp here. It is arguably a bit more useful than the timestamp from the log file I had used before, because that doesn't really tell us anything about the data, just the random time when the transaction was closed.
from osmdbt.
I am a bit confused here. I don't care about the comment in the state.txt file, osmdbt doesn't even generate it. For the actual timestamp, I have to check where this comes from, I don't remember of the top of my head.
Concerning the origin of the timestamp information: Different clocks were historically a problem with OSM data. That's probably where this came from.
from osmdbt.
Related Issues (20)
- Limit total number of changes per diff generation (osmosis compatibility) HOT 10
- Q: fake-log process and duplicates in diff files HOT 6
- launchpad.net Ubuntu PPA build HOT 23
- Unit test for relations table HOT 2
- Osm-logical: sanity checks HOT 2
- Unit tests for create-diff HOT 5
- Exploring wal2json HOT 2
- Should get-log check for existing log files with same LSN? HOT 1
- Fetching objects in create-diff (osmosis performance regression) HOT 2
- fake-log, isolation level and concurrent txns HOT 4
- Order of tags in object HOT 4
- Docker compose cluster HOT 1
- Merging diff files HOT 2
- Documenting state.txt file format HOT 2
- CI is broken HOT 4
- Build for Ubuntu 22.04 HOT 3
- Ubuntu 20.04 changes HOT 14
- replication_slot in osmdbt-config.yaml example
- disable-replication should check for remaining entries to be processed HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from osmdbt.