Whilst fine in most situations, this doesn't work for IndexedCorpus which loads an ent

Last but not least, benchmarks of 32-bit size deors vs 64-bit size deors (

Unable to load large corpora into memory because PatternPointer length can't exceed 2^32 bytes (32 bit size descriptor) about colibri-core HOT 5 CLOSED

proycon commented on June 6, 2024

Unable to load large corpora into memory because PatternPointer length can't exceed 2^32 bytes (32 bit size descriptor)

from colibri-core.

Comments (5)

wanthalf commented on June 6, 2024

As for me, loading the whole corpus in a single file is rather a burden anyway. A three level index would be more useful (file, sentence, token). But that would probably require larger changes of the whole architecture, I am afraid. And adding a third number to the index wouldn't solve the problem, even if both the "file" and "sentence" pointers remain 32-bit - together they still make up 64 bits.

from colibri-core.

proycon commented on June 6, 2024

This current line of implementation is a rather major refactoring which will take some more time, so I'm gonna start again and try a more 'quick' fix, although that might come at increases memory cost in certain computations.

from colibri-core.

proycon commented on June 6, 2024

As for me, loading the whole corpus in a single file is rather a burden anyway. A three level index would be more useful (file, sentence, token). But that would probably require larger changes of the whole architecture, I am afraid.

Yeah, that would be another major refactoring. I don't think we can go in that direction. You'd have to solve that in some wrapper stage (mapping file,sentence to some kind of aggregated sentence)

And adding a third number to the index wouldn't solve the problem, even if both the "file" and "sentence" pointers remain 32-bit - together they still make up 64 bits.

Indeed, that would come at an extra memory penalty so I don't want to go there.

from colibri-core.

proycon commented on June 6, 2024

Solved in v2.5.0 release

from colibri-core.

proycon commented on June 6, 2024

Last but not least, benchmarks of 32-bit size descriptors vs 64-bit size descriptors (in patterpointers) show a small increase in peak memory:

benchmarks32.txt
benchmarks64.txt

from colibri-core.

Recommend Projects

Unable to load large corpora into memory because PatternPointer length can't exceed 2^32 bytes (32 bit size descriptor) about colibri-core HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent