Comments (14)
Next release will display more information about the time spent to index your files and the number of document scanned within the last 60 seconds.
Also, local test shows that indexing has been speed up by at least 25%.
I am running test on bigger files than you (15 MB per pdf) I think that indexing your small file should be even faster
from fulltextsearch.
from fulltextsearch.
Yes, I am currently working on optimizing the commit during the indexin. The schema also will need a good clean.
But you should not have more than 1 process running the Solr Servlet.
Also, everything is in local ? you have no external storage ?
from fulltextsearch.
Yes, everything is local on that VM.
from fulltextsearch.
I had the same problem and fixed by changing the Timeout option in Nextant settings, in Nextcloud.
I also had an other problem. While on first indexing and on normal start of solr the JVM memory limit was (by default) 512MB. To change the limit I edited /etc/init.d/solr
at the last lines to this:
MEMRAM="-m 1024m"
if [ -n "$RUNAS" ]; then
su -c "SOLR_INCLUDE="$SOLR_ENV" "$SOLR_INSTALL_DIR/bin/solr" $SOLR_CMD $MEMRAM" - "$RUNAS"
else
SOLR_INCLUDE="$SOLR_ENV" "$SOLR_INSTALL_DIR/bin/solr" "$SOLR_CMD"
fi
from fulltextsearch.
For timeout, I'm using 30 seconds, - just like you have in the screenshot.
@Pant, the MEMRAM
value - is it different from SOLR_HEAP
, which suggested on this page?
In my case SOLR_HEAP
set to 2048m. So, are you advising to increase MEMRAM
, too?
from fulltextsearch.
Please try this : https://github.com/nextcloud/nextant/releases/tag/v0.6.5
I would do like this, if I were you:
./occ nextant:index --debug --force
This will re-index everything; also check your numbers of segmant in the end:
./occ nextant:check
from fulltextsearch.
So, indexing got completed in about 2.5 hours. Great!
However, after it completed, it started all over again, Is it expected?:
root@sf-hosting:/var/www/html# sudo -u www-data ./occ nextant:index --debug --force
nextant v0.6.5 (beta)
/StanM 0/ 0 [>---------------------------] 0%
/slavik/files 307919/307919 [============================] 100%
/vvooaz/files 5595/ 5595 [============================] 100%
313514 file(s) processed ; 0 orphan(s) removed
313514 documents indexed ; 38828 fully extracted
/StanM 0/ 0 [>---------------------------] 0%
%slavik/files 38700/307919 [===>------------------------] 12%
(02:58:26) [preparing] Solr memory: 759.4 MB (%38.7)
from fulltextsearch.
Also, here is output of check
:
root@sf-hosting:/var/www/html# sudo -u www-data ./occ nextant:check
Pinging 127.0.0.1:8983/solr/nextant : ok
Checking Solr schema fields
* Checking dynamic-field 'nextant_attr_*' : ok
* Checking field 'nextant_path' : ok
* Checking field 'text' : ok
* Checking field 'nextant_owner' : ok
* Checking field 'nextant_mtime' : ok
* Checking field 'nextant_share' : ok
* Checking field 'nextant_sharegroup' : ok
* Checking field 'nextant_deleted' : ok
* Checking field 'nextant_source' : ok
* Checking field 'nextant_tags' : ok
* Checking field 'nextant_extracted' : ok
* Checking field 'nextant_ocr' : ok
* Checking field 'nextant_unmounted' : ok
* Checking field-type 'text_general' : ok
Your solr contains 313514 documents :
- 313514 files
- 0 bookmarks
- 16 segments
What does it tell me? Is 16 segments good for performance?
from fulltextsearch.
optimizing your index works like defrag. each time you update the index - _ie. creating/editing file, moving file, editing sharing rights, deleting files - you are fragmenting it into segments.
16 is insignificant. I have clearly no idea of what is a big enough number of segments to initiate an optimization by any background job. And also because, even if your index will loose some bytes at the end, the operation - itself - to optimize requiert a lot of disk space (more or less +30% of your current index).
It takes time also, don't be afraid if the command is taking hours (I should add some animation during the operation)
from fulltextsearch.
Thank you.
What about nextant:index --debug --force
starting all over again?
from fulltextsearch.
Oups, sorry. Haven't seen the good news. In fact it is some fucking good news ! But maybe because you're not indexing the audio files (in 0.6.5, you can filter what you want to index in the Admin UI and audio/image are not included by default)
You might want to select Index File Tree if you want to index at least the folders/filename but not the file content.
It is not starting all over again, first step is extracting documents, second step is updating data on sharing rights (mostly) and few minor stuff. For sure, when you start a full index, the second step (updating) is not doing that much (in fact, might not doing anything).
from fulltextsearch.
looks like we're good ?
from fulltextsearch.
Yes,
Thank you.
from fulltextsearch.
Related Issues (20)
- Question: When ES is working for files and deck provider how to switch off DB operation for those providers?
- 0 results, test returns "Unexpected SearchResult" error HOT 6
- fulltextsearch:index decrypts and copies all files to /tmp HOT 7
- indexing OpenOffice documents - first line after headline missing
- Links to results are wrong when searching in the desktop client HOT 2
- Search returns no results HOT 3
- Constantly getting warning message Exception while improving searchresult HOT 2
- Fulltextsearch upgrade fails 27.0.1 -> 27.1.4 HOT 13
- List of Platform Apps: Add Xapian
- Search return only results from filenames HOT 1
- Full Text Search Enable App fails "An error occurred during the request. Unable to proceed." HOT 3
- High, permantent CPU load on two MySQL connections
- Usability is a Joke HOT 2
- nc 28 compatibility HOT 1
- elasticsearch connecting string gets saved in db in cleartext HOT 1
- Open search result directly
- NC 29 Beta 3: Cron job execution fails because `OC\BackgroundJob\TimedJob` was not found HOT 19
- `No alive nodes. All the 1 nodes seem to be down.` in `fulltextsearch:reset` but `fulltextsearch:stop` is working HOT 4
- [Bug]: scrollto is not executed from fulltextsearch HOT 4
- Hard Error/Force Quit on files with parentheses - ( ) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fulltextsearch.