Caveat: this topic is somewhat involved, and I've only started to realize that more re

state.txt timestamp - and osmosis compatibility about osmdbt HOT 2 CLOSED

openstreetmap commented on August 27, 2024

state.txt timestamp - and osmosis compatibility

from osmdbt.

Comments (2)

joto commented on August 27, 2024 1

I have looked into this some more and believe the comment you found in the Osmosis code isn't really that helpful. And the Osmosis code is overly complex.

The basic problem with replication in our case is this:

There can be multiple transactions open in the database at the same time with objects being written to the database with whatever timestamp is current at that point in time. When a transaction is committed, the objects show up in the log file, but there might be other transactions still open with objects with an unknown timestamp (which might be smaller or larger than any timestamp we have in the committed transaction). So we can not know whether there are unprocessed objects with older and/or newer timestamps than we currently have. So it might be that there are objects with timestamp t in change file 1234.osc.gz and in 1235.osc.gz there is an object with a timestamp <t. That's why we can't use the timestamps as some kind of marker for the replication, but we use the sequence numbers instead.

So it doesn't really matter all that much what exact timestamp is in the file. For an ongoing replication you simply keep incrementing the sequence number. When you set up a new replication you have to find out at what point in time you want to start the replication, then go back "a bit" and start from there. Usually you will start from a planet dump, but they use a separate mechanism that is not synced with the change files, so there is no clean way to start consuming change files exactly at that point where a planet dump is. So you go back "a bit" and hope that it all works out. We should document what this "a bit" is and tell users how to properly consume changes, but this isn't really a concern for us here.

I have just changed the code in 58513a5 and put the largest timestamp of any of the objects in the .osc.gz file into the state.txt, because I believe that is probably the most useful timestamp here. It is arguably a bit more useful than the timestamp from the log file I had used before, because that doesn't really tell us anything about the data, just the random time when the transaction was closed.

from osmdbt.

joto commented on August 27, 2024

I am a bit confused here. I don't care about the comment in the state.txt file, osmdbt doesn't even generate it. For the actual timestamp, I have to check where this comes from, I don't remember of the top of my head.

Concerning the origin of the timestamp information: Different clocks were historically a problem with OSM data. That's probably where this came from.

from osmdbt.

state.txt timestamp - and osmosis compatibility about osmdbt HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent