See PR <a class="issue-link js-issue-link" data-error-text="Failed to load title" data

Improve the implementation of in-memory, general purpose, non-transactional graphs. about jena HOT 4 CLOSED

apache commented on June 7, 2024

Improve the implementation of in-memory, general purpose, non-transactional graphs.

from jena.

Comments (4)

arne-bdt commented on June 7, 2024

My work on an alternative to GraphMem is progressing slowly.
So I decided to contribute a few improvements to the existing GraphMem on the way.
Please review my PR.

from jena.

arne-bdt commented on June 7, 2024

There's a straightforward way to improve the speed of GraphMem for Graph#add by nearly 50% and Graph#delete by 10-35%. This change has almost no impact on any other operation. However, it uses an additional array for the hashCodes in HashCommon. As a result, the memory consumption of the indexing structures (excluding the space for the triples) would increase by 31-51%.

I propose not to implement this for the deprecated GraphMem, but instead to create a successor GraphMem2 with the following changes:

Speed up Graph#add and Graph#delete accepting the given memory cost
Remove support for Iterator#remove to reduce complexity
(other implementations like DatasetGraphInMemory don't support this either)
Switch to term-equality as proposed in issue#1867 mostly by:
- Removing calls to org.apache.jena.graph.Node#sameValueAs
- Replacing usages of org.apache.jena.graph.Node#getIndexingValue with the Node itself

What do you say?

from jena.

afs commented on June 7, 2024

Yes - term-quality.

A GraphMem2 sounds like a good plan; just the switch to term-equality means we need to migrate somehow. Jena5 is likely later this year.

No support Iterator#remove is generally a good idea. It is at best an "optional extra" not a core part of iterators IMO.

The extra space usage - this I am in two minds about. Faster is better but so is having an storage that optimizes for space. It doesn't sound if it is so fundamental as to be irreversible.

from jena.

arne-bdt commented on June 7, 2024

Thanks, that's the information I needed to proceed.

Adding and removing the array with the hashes is straightforward. It might also be quite simple to have two variants of the graph store. That way, users can choose between the fast variant or the low memory one.

Speaking of which:
There are two fields tracking changes for potential concurrent modifications:

org.apache.jena.mem.ArrayBunch#changes
org.apache.jena.mem.HashCommon#changes

The one in ArrayBunch is volatile, but the one in HashCommon is not. ArrayBunch is totally fine with integer overflow, since it checks for equality when verifying:
if (changes != initialChanges) throw new ConcurrentModificationException();

HashCommon, however, wouldn't handle an integer overflow correctly:
if (changes > initialChanges) throw new ConcurrentModificationException();

The document header of java.util.HashMap states:
` *

Note that the fail-fast behavior of an iterator cannot be guaranteed

as it is, generally speaking, impossible to make any hard guarantees in the
presence of unsynchronized concurrent modification. Fail-fast iterators
throw {@code ConcurrentModificationException} on a best-effort basis.
Therefore, it would be wrong to write a program that depended on this
exception for its correctness: the fail-fast behavior of iterators

should be used only to detect bugs.`

Tracking the changes here might be the safest possible way. However, HashCommon should then also make changes volatile and compare for equality rather than using a greater than comparison.

On the other hand, if we removed both fields and checked org.apache.jena.mem.HashCommon#size and org.apache.jena.mem.ArrayBunch#size instead, we would save memory (since there are many instances of both classes) and avoid incrementing a second variable on #add and #remove operations. The iterators should still help detect concurrent modifications in almost any scenario. Only if there are an equal number of #add and #remove calls, so that the total size of the graph is not affected, could there be undetected concurrent modifications while executing an iterator. This seems unlikely enough to me that it would prevent anyone from detecting a concurrency bug.
What is your opinion in this?

from jena.

Improve the implementation of in-memory, general purpose, non-transactional graphs. about jena HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent