Giter Club home page Giter Club logo

Comments (14)

ArtificialOwl avatar ArtificialOwl commented on June 9, 2024 1

Next release will display more information about the time spent to index your files and the number of document scanned within the last 60 seconds.

Also, local test shows that indexing has been speed up by at least 25%.
I am running test on bigger files than you (15 MB per pdf) I think that indexing your small file should be even faster

from fulltextsearch.

SlavikCA avatar SlavikCA commented on June 9, 2024

Solr dashboard:
image

from fulltextsearch.

ArtificialOwl avatar ArtificialOwl commented on June 9, 2024

Yes, I am currently working on optimizing the commit during the indexin. The schema also will need a good clean.

But you should not have more than 1 process running the Solr Servlet.

Also, everything is in local ? you have no external storage ?

from fulltextsearch.

SlavikCA avatar SlavikCA commented on June 9, 2024

Yes, everything is local on that VM.

from fulltextsearch.

Pant avatar Pant commented on June 9, 2024

I had the same problem and fixed by changing the Timeout option in Nextant settings, in Nextcloud.
nextant_timeout
I also had an other problem. While on first indexing and on normal start of solr the JVM memory limit was (by default) 512MB. To change the limit I edited /etc/init.d/solr at the last lines to this:

MEMRAM="-m 1024m"
if [ -n "$RUNAS" ]; then
su -c "SOLR_INCLUDE="$SOLR_ENV" "$SOLR_INSTALL_DIR/bin/solr" $SOLR_CMD $MEMRAM" - "$RUNAS"
else
SOLR_INCLUDE="$SOLR_ENV" "$SOLR_INSTALL_DIR/bin/solr" "$SOLR_CMD"
fi

from fulltextsearch.

SlavikCA avatar SlavikCA commented on June 9, 2024

For timeout, I'm using 30 seconds, - just like you have in the screenshot.

@Pant, the MEMRAM value - is it different from SOLR_HEAP, which suggested on this page?
In my case SOLR_HEAP set to 2048m. So, are you advising to increase MEMRAM, too?

from fulltextsearch.

ArtificialOwl avatar ArtificialOwl commented on June 9, 2024

Please try this : https://github.com/nextcloud/nextant/releases/tag/v0.6.5

I would do like this, if I were you:

./occ nextant:index --debug --force

This will re-index everything; also check your numbers of segmant in the end:

./occ nextant:check

from fulltextsearch.

SlavikCA avatar SlavikCA commented on June 9, 2024

So, indexing got completed in about 2.5 hours. Great!
However, after it completed, it started all over again, Is it expected?:

root@sf-hosting:/var/www/html# sudo -u www-data ./occ nextant:index --debug --force
nextant v0.6.5 (beta)

/StanM                           0/     0 [>---------------------------]   0%
/slavik/files               307919/307919 [============================] 100%
/vvooaz/files                 5595/  5595 [============================] 100%


  313514 file(s) processed ; 0 orphan(s) removed
  313514 documents indexed ; 38828 fully extracted

/StanM                           0/     0 [>---------------------------]   0%
%slavik/files                38700/307919 [===>------------------------]  12%
 (02:58:26) [preparing]  Solr memory: 759.4 MB (%38.7)

from fulltextsearch.

SlavikCA avatar SlavikCA commented on June 9, 2024

Also, here is output of check:

root@sf-hosting:/var/www/html# sudo -u www-data ./occ nextant:check
Pinging 127.0.0.1:8983/solr/nextant : ok
Checking Solr schema fields
 * Checking dynamic-field 'nextant_attr_*' : ok
 * Checking field 'nextant_path' : ok
 * Checking field 'text' : ok
 * Checking field 'nextant_owner' : ok
 * Checking field 'nextant_mtime' : ok
 * Checking field 'nextant_share' : ok
 * Checking field 'nextant_sharegroup' : ok
 * Checking field 'nextant_deleted' : ok
 * Checking field 'nextant_source' : ok
 * Checking field 'nextant_tags' : ok
 * Checking field 'nextant_extracted' : ok
 * Checking field 'nextant_ocr' : ok
 * Checking field 'nextant_unmounted' : ok
 * Checking field-type 'text_general' : ok

Your solr contains 313514 documents :
 - 313514 files
 - 0 bookmarks
 - 16 segments

What does it tell me? Is 16 segments good for performance?

from fulltextsearch.

ArtificialOwl avatar ArtificialOwl commented on June 9, 2024

optimizing your index works like defrag. each time you update the index - _ie. creating/editing file, moving file, editing sharing rights, deleting files - you are fragmenting it into segments.

16 is insignificant. I have clearly no idea of what is a big enough number of segments to initiate an optimization by any background job. And also because, even if your index will loose some bytes at the end, the operation - itself - to optimize requiert a lot of disk space (more or less +30% of your current index).

It takes time also, don't be afraid if the command is taking hours (I should add some animation during the operation)

from fulltextsearch.

SlavikCA avatar SlavikCA commented on June 9, 2024

Thank you.

What about nextant:index --debug --force starting all over again?

from fulltextsearch.

ArtificialOwl avatar ArtificialOwl commented on June 9, 2024

Oups, sorry. Haven't seen the good news. In fact it is some fucking good news ! But maybe because you're not indexing the audio files (in 0.6.5, you can filter what you want to index in the Admin UI and audio/image are not included by default)
You might want to select Index File Tree if you want to index at least the folders/filename but not the file content.

It is not starting all over again, first step is extracting documents, second step is updating data on sharing rights (mostly) and few minor stuff. For sure, when you start a full index, the second step (updating) is not doing that much (in fact, might not doing anything).

from fulltextsearch.

ArtificialOwl avatar ArtificialOwl commented on June 9, 2024

looks like we're good ?

from fulltextsearch.

SlavikCA avatar SlavikCA commented on June 9, 2024

Yes,
Thank you.

from fulltextsearch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.