When I started a node after the first node has already created some documents only the

Restarted orbitdb is not informed in update listener about all new added documents about orbitdb HOT 10 CLOSED

koh-osug commented on June 11, 2024

Restarted orbitdb is not informed in update listener about all new added documents

from orbitdb.

Comments (10)

haydenyoung commented on June 11, 2024

The remote peer only needs to be aware of the latest record to be able to verify and build a replica of the source db's entries. Therefore, only the latest addition is available via 'update'. For example:

await db1.add('a')
await db1.add('b')

// state of the oplog is 'a' <- 'b' where <- 'b' follows 'a'.

await db2.open(db1.address)

db2.on('update', (entry) => {
  // 'update' only gives 'b' because 'b' follows 'a'. db2 does not need to be notified of the update 'a'; 'b' is enough to determine and replicate the correct oplog entries.
})

await db1.add('c') // this will fire db2's update event listener again because db1's oplog has been appended to and the (entry) will be 'c'.

I'll provide a link to https://github.com/orbitdb/orbitdb/blob/main/docs/OPLOG.md which may provide clarity.

from orbitdb.

koh-osug commented on June 11, 2024

Thank you. This clarifies it. Is there any chance to get the full delta of changes? I have a business process which has to post process only new documents on each node. E.g. could I get some current value, counter, hash or anything before opening the other database and then compute the delta on my own? E.g. I see this function:

log.iterator({ amount: 1 }) -> get the last update
gt: return all entries after entry with specified hash (exclusive) -> get all new updates

from orbitdb.

haydenyoung commented on June 11, 2024

Yes, the gt iterator param is one option.

If items are coming in from one peer you could loop over the database using iterator, processing each item until you reach the most recently "processed" document.

If items are being sync-ed from various peers you may need to keep a record of the documents already processed and use it to determine whether a document requires processing.

However, the best method will be determined by your technical requirements.

Also, remember, if you are working at the oplog level you are working with operations. You may want to consider working with the data at the database level.

from orbitdb.

koh-osug commented on June 11, 2024

If items are coming in from one peer you could loop over the database using iterator

This means using log.iterator({ amount: 1 }) before connecting to any different peer's DB is not giving the latest hash of the database? I would use this otherwise if it works as latest element while syncing with all other databases.

Also, remember, if you are working at the oplog level you are working with operations. You may want to consider working with the data at the database level.

But on database level I do not have something like log.iterator({ amount: 1 }) to get the last document and I have also no option to get any newer documents then specific hash, right?

from orbitdb.

haydenyoung commented on June 11, 2024

You have the iterator available at the db level, yes. The params available to you will depend on the data store being used. I would recommend looking at https://api.orbitdb.org/ for more information about the functions available.

from orbitdb.

koh-osug commented on June 11, 2024

My logic now is getting the log heads and then iterating over all log entries after this. This seems to be functional I can get all new changes:

const db2 = await orbitdb2.open('subscription-data', {type: 'documents'})
let db2a = await orbitdb2.open(db1.address, {sync: false})
const heads = await db2a.log.heads()
await db2a.close()
db2a = await orbitdb2.open(db1.address)

db2a.events.on('update', async (entry) => {
  let filter = {}
  if (heads.length > 0) {
    filter = {gt: heads[0].hash }
  }
  for await (const record of db2a.log.iterator(filter)) {
    console.log("new entry", record)
  }
})

But the documentation of log.heads() returns an array. I'm wondering why multiple heads are possible and what I'm missing here? Also i open the database with sync: false to get the original state, close it and open it again. Somehow this feels inefficient.

from orbitdb.

haydenyoung commented on June 11, 2024

But the documentation of log.heads() returns an array.

You can have multiple heads because peers can add records at the same (logical) time.

I'm not sure why you are working at the log level. Why not process records from the document store then mark them as processed?

from orbitdb.

koh-osug commented on June 11, 2024

The first issue I see is that I do not have write access to the synced database replica, I assume your approach would be to overwrite the document with the same document and set a marker attribute to the JSON structure. I.e. write access is necessary, right? My understanding here is that the database I open with orbitdb.open is the same instance as the remote one just a local copy and that changes to this copy would also be reflected in the original database of the owning peer.

Another issue I see that I will miss deletions. How can I see a DELETE operation? I cannot iterate over all documents - I assume - and do a full database comparison to find out which documents have been deleted. My business process needs to know this, hence I has hoping to get all updates, also deletions with the update listener.

Also if I do this this generic marker would be seen also by other peers and then these peers would think that they have already processed it on their end, which is not the case. I could also add a peer specific marker, but this would mean to have 100 specific marker attributes in the document for supporting 100 peers. And all peers need write access.

Also, what I see in the code it that the query function is using the iterator also and I think I cannot afford a complete database scan, if I have 10000 documents it looks like the iterator looks into all 10000 documents then. If something like in SQL a specific query would be possible with a WHERE clause, then I could consider this. Am I wrong here with my assumptions?

from orbitdb.

haydenyoung commented on June 11, 2024

What you're describing seems more like a software architecture problem rather than a bug or issue with OrbitDB. If you're looking for help with integration of OrbitDB with your software systems, you may get better traction on the OrbitDB Lobby where other developers may have run into similar implementation issues.

from orbitdb.

koh-osug commented on June 11, 2024

I have added a question in the Lobby. I will add it here if my problem can be solved.

from orbitdb.

Restarted orbitdb is not informed in update listener about all new added documents about orbitdb HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent