Giter Club home page Giter Club logo

Comments (10)

joto avatar joto commented on August 27, 2024 1

This should now be fixed.

from osmdbt.

joto avatar joto commented on August 27, 2024 1

@mmd-osm Thanks for finding this. Just fixed it.

from osmdbt.

mmd-osm avatar mmd-osm commented on August 27, 2024

Thanks for implementing this limit. I have a quick follow up question if you don’t mind:

According to the Readme, create-diff can run in parallel to the other command line tools, and also pick up multiple log files. What would be the best way for a sysadmin to make sure that create diff considers the same upper limit, as get-log did before?

„osmdbt-create-diff can handle any number of log files, so if it is not run for a while it will recover by reading all log files it finds and creating one replication diff file with the (sorted) data from all of them.“

from osmdbt.

joto avatar joto commented on August 27, 2024

Sorry, I overlooked that create-diff merges the data from several log files.

from osmdbt.

mmd-osm avatar mmd-osm commented on August 27, 2024

I'm just testing your changes and noticed something odd with no limits set. I assume options.max_changes() == 0 is supposed to mean "unlimited". However, the value compare below would always be true, if there's at least one entry in the todo list. This condition should probably be only true in case options.max_changes() > 0, or you need to initialize max_changes with a really large value (max_int or something).

 if (objects_todo.size() > options.max_changes()) {
            vout << "  Reached limit of " << options.max_changes()
                 << " objects.\n";
            break;

[ 0:00] Got 3 objects.
[ 0:00] Reached limit of 0 objects.
[ 0:00] Populating changeset cache...
[ 0:00] Got 2 changesets.

from osmdbt.

mmd-osm avatar mmd-osm commented on August 27, 2024

Thanks, I'm closing this for the time being...

from osmdbt.

mmd-osm avatar mmd-osm commented on August 27, 2024

Is there anything we need to do for fake-log? As we've seen recently, even looking at a time period of 6 hours would quickly accumulate a few million entries. fake-log currently only comes with a start timestamp, but has otherwise no limits.

One thing I could imagine is to calculate an (exclusive upper bound) timestamp_to in addition to the timestamp_from, and check that the total number of changes across nodes, ways and relations is still below some upper limit (--split-every-n-items). This can be accomplished by the following index-only operation:

SELECT date_trunc('second', MAX(timestamp)), count(*) FROM (
  SELECT * FROM (
     SELECT timestamp FROM nodes  UNION ALL
     SELECT timestamp FROM ways   UNION ALL
     SELECT timestamp FROM relations 
    ) a 
  WHERE a.timestamp >= '2020-05-23T11:34:44Z'   -- current timestamp_from
  ORDER BY a.timestamp
  LIMIT 500000   -- limit
) b;

Then iterate by creating a log file, setting the timestamp_from to the previous timestamp_to, etc.
This approach would be most suitable to quickly process a large backlog. As an alternative to a dynamically determined range, a user might want to specify a fixed window size (like 60 seconds), to mimic the usual diff processing (--split-every-n-minutes).

from osmdbt.

joto avatar joto commented on August 27, 2024

I think we probably want to mimic the usual get-log processing here, ie. create minutely diffs (with some upper bound on the size). We should also make sure we don't create any empty files.

This would give us diff files which are as "similar" as possible to the normal operation which hopefully means the chances of this breaking some data consumer is the lowest.

from osmdbt.

joto avatar joto commented on August 27, 2024

osmdbt-fake-log now writes minutely log files (8ca509c).

from osmdbt.

joto avatar joto commented on August 27, 2024

I believe we have addressed all issues here. Closing.

from osmdbt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.