I'm running instance of N

Solr dashboard: <a target="_blank" rel="noopener noreferrer nofollow" href="https:

Also, here is output of check : <div class="snippe

Nextant indexing performance concerns about fulltextsearch HOT 14 CLOSED

nextcloud commented on June 9, 2024

Nextant indexing performance concerns

from fulltextsearch.

Comments (14)

ArtificialOwl commented on June 9, 2024 1

Next release will display more information about the time spent to index your files and the number of document scanned within the last 60 seconds.

Also, local test shows that indexing has been speed up by at least 25%.
I am running test on bigger files than you (15 MB per pdf) I think that indexing your small file should be even faster

from fulltextsearch.

SlavikCA commented on June 9, 2024

Solr dashboard:

from fulltextsearch.

ArtificialOwl commented on June 9, 2024

Yes, I am currently working on optimizing the commit during the indexin. The schema also will need a good clean.

But you should not have more than 1 process running the Solr Servlet.

Also, everything is in local ? you have no external storage ?

from fulltextsearch.

SlavikCA commented on June 9, 2024

Yes, everything is local on that VM.

from fulltextsearch.

Pant commented on June 9, 2024

I had the same problem and fixed by changing the Timeout option in Nextant settings, in Nextcloud.

I also had an other problem. While on first indexing and on normal start of solr the JVM memory limit was (by default) 512MB. To change the limit I edited /etc/init.d/solr at the last lines to this:

MEMRAM="-m 1024m"
if [ -n "$RUNAS" ]; then
su -c "SOLR_INCLUDE="$SOLR_ENV" "$SOLR_INSTALL_DIR/bin/solr" $SOLR_CMD $MEMRAM" - "$RUNAS"
else
SOLR_INCLUDE="$SOLR_ENV" "$SOLR_INSTALL_DIR/bin/solr" "$SOLR_CMD"
fi

from fulltextsearch.

SlavikCA commented on June 9, 2024

For timeout, I'm using 30 seconds, - just like you have in the screenshot.

@Pant, the MEMRAM value - is it different from SOLR_HEAP, which suggested on this page?
In my case SOLR_HEAP set to 2048m. So, are you advising to increase MEMRAM, too?

from fulltextsearch.

ArtificialOwl commented on June 9, 2024

Please try this : https://github.com/nextcloud/nextant/releases/tag/v0.6.5

I would do like this, if I were you:

./occ nextant:index --debug --force

This will re-index everything; also check your numbers of segmant in the end:

./occ nextant:check

from fulltextsearch.

SlavikCA commented on June 9, 2024

So, indexing got completed in about 2.5 hours. Great!
However, after it completed, it started all over again, Is it expected?:

root@sf-hosting:/var/www/html# sudo -u www-data ./occ nextant:index --debug --force
nextant v0.6.5 (beta)

/StanM                           0/     0 [>---------------------------]   0%
/slavik/files               307919/307919 [============================] 100%
/vvooaz/files                 5595/  5595 [============================] 100%


  313514 file(s) processed ; 0 orphan(s) removed
  313514 documents indexed ; 38828 fully extracted

/StanM                           0/     0 [>---------------------------]   0%
%slavik/files                38700/307919 [===>------------------------]  12%
 (02:58:26) [preparing]  Solr memory: 759.4 MB (%38.7)

from fulltextsearch.

SlavikCA commented on June 9, 2024

Also, here is output of check:

root@sf-hosting:/var/www/html# sudo -u www-data ./occ nextant:check
Pinging 127.0.0.1:8983/solr/nextant : ok
Checking Solr schema fields
 * Checking dynamic-field 'nextant_attr_*' : ok
 * Checking field 'nextant_path' : ok
 * Checking field 'text' : ok
 * Checking field 'nextant_owner' : ok
 * Checking field 'nextant_mtime' : ok
 * Checking field 'nextant_share' : ok
 * Checking field 'nextant_sharegroup' : ok
 * Checking field 'nextant_deleted' : ok
 * Checking field 'nextant_source' : ok
 * Checking field 'nextant_tags' : ok
 * Checking field 'nextant_extracted' : ok
 * Checking field 'nextant_ocr' : ok
 * Checking field 'nextant_unmounted' : ok
 * Checking field-type 'text_general' : ok

Your solr contains 313514 documents :
 - 313514 files
 - 0 bookmarks
 - 16 segments

What does it tell me? Is 16 segments good for performance?

from fulltextsearch.

ArtificialOwl commented on June 9, 2024

optimizing your index works like defrag. each time you update the index - _ie. creating/editing file, moving file, editing sharing rights, deleting files - you are fragmenting it into segments.

16 is insignificant. I have clearly no idea of what is a big enough number of segments to initiate an optimization by any background job. And also because, even if your index will loose some bytes at the end, the operation - itself - to optimize requiert a lot of disk space (more or less +30% of your current index).

It takes time also, don't be afraid if the command is taking hours (I should add some animation during the operation)

from fulltextsearch.

SlavikCA commented on June 9, 2024

Thank you.

What about nextant:index --debug --force starting all over again?

from fulltextsearch.

ArtificialOwl commented on June 9, 2024

Oups, sorry. Haven't seen the good news. In fact it is some fucking good news ! But maybe because you're not indexing the audio files (in 0.6.5, you can filter what you want to index in the Admin UI and audio/image are not included by default)
You might want to select Index File Tree if you want to index at least the folders/filename but not the file content.

It is not starting all over again, first step is extracting documents, second step is updating data on sharing rights (mostly) and few minor stuff. For sure, when you start a full index, the second step (updating) is not doing that much (in fact, might not doing anything).

from fulltextsearch.

ArtificialOwl commented on June 9, 2024

looks like we're good ?

from fulltextsearch.

SlavikCA commented on June 9, 2024

Yes,
Thank you.

from fulltextsearch.

Nextant indexing performance concerns about fulltextsearch HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent