Giter Club home page Giter Club logo

Comments (33)

kdejaeger avatar kdejaeger commented on May 28, 2024 1

Update: I have added a @scheduled in spring boot to trigger the compact every day. So far 2 times successful. I'll update when I notice the compact fails.

from jena.

kinow avatar kinow commented on May 28, 2024 1

Yes I think so. Sometimes we're lucky and it get's released on time, sometimes not, as you can see. Anyway I need to figure out how to delete those files, is that hard?

I'm not 100% sure that's the problem, but that's what I would start investigating.

What do I need to do for that? Or should I wait for an update from fuseki code?

After you have compacted the database, and deleted the old files, if the space hasn't been claimed by the OS, then you can try lsof +L1, and search for files names to see if you can identify any Fuseki files.

If you find any files, share a list here and we can see if there's a file/resource not closed properly in Java.

from jena.

afs avatar afs commented on May 28, 2024 1

Summary of compact improvements to consider:

  1. Improve compaction handling across a crash/restart (part written compacted databases)
  2. Make sure files are getting closed.
  3. (Maybe) allow existing readers to continue on the old database after change over.

A possible TDB2 improvement for many small updates is to front the database with buffering updates; wait for 10 or 20 updates or a timeout and execute a TDB transaction for all of small updates. Losses the boundaries of old version databases are ever exposed which they aren't currently.

from jena.

kdejaeger avatar kdejaeger commented on May 28, 2024 1

I don't know unfortunately. We'll add a backup command before the compact command, to be safe.

from jena.

afs avatar afs commented on May 28, 2024 1

I think I have identified the problem - an update during the compaction can cause the transaction coordinator to block. It is a deadlock - no amount of waiting will cause the system to unblock.

from jena.

afs avatar afs commented on May 28, 2024

Hi - Fuseki has a operation for compacting TDB2 databases : /$/compact/fedora.

The documentation for secoresearch/fuseki/ implies the admin operations are available and are password protected.

The dataSet mentioned is a java object in the same JVM as the database so it is in Fuseki, not on the client-side.

from jena.

kdejaeger avatar kdejaeger commented on May 28, 2024

Ok I'll install curl and use the url. And thanks!

from jena.

kdejaeger avatar kdejaeger commented on May 28, 2024

Ok I tried the url and it seems he's stuck somewhere.

bash-5.0# pwd
/fuseki-base/databases/fedora

bash-5.0# ls -al
drwxr-xr-x 2 root root 4096 Apr 6 07:44 Data-0001
drwxr-xr-x 2 root root 4096 Apr 9 12:32 Data-0002
-rw-r--r-- 1 root root 2 Apr 8 13:04 tdb.lock

bash-5.0# du -h
74M ./Data-0002
38G ./Data-0001

bash-5.0# date
Sat Apr 9 17:24:37 UTC 202

And in the logs:
12:32:23 INFO Admin :: [135306] Compact dataset /fedora
12:32:23 INFO Server :: Task : 1 : Compact
12:32:23 INFO Server :: [Task 1] starts : Compact
12:32:23 INFO Compact :: [135306] >>>> Start compact /fedora

What now?

from jena.

afs avatar afs commented on May 28, 2024

Compaction can take some time and there will need to be sufficient space in the container.

You can check the Linux process state to see if the Fuseki process, with no requests, is showing as busy.

from jena.

kdejaeger avatar kdejaeger commented on May 28, 2024

I guess you need at least 50% of space left for this?
/dev/sde 47G 38G 8.2G 83% /fuseki-base/databases

image

from jena.

kdejaeger avatar kdejaeger commented on May 28, 2024

Ow off course not, the new db in this example would be maybe 10GB.

from jena.

afs avatar afs commented on May 28, 2024

There needs to be space for a new DB which is at most the same size.

The old DB is not deleted by the compact function but it is no longer needed. You won't see a drop in disk space until you move/compress/delete it (e.g. it can be used as an archive or backup).

from jena.

kdejaeger avatar kdejaeger commented on May 28, 2024

So I rebooted the server, it started filling up in 0002 from 74 to 128M. Then I ran the compact url again:

bash-5.0# du -h
128M    ./Data-0002
74M     ./Data-0003
38G     ./Data-0001
39G     .

He created a 003 with again 74MB without finishing the command. I'm not sure what to think of this.

from jena.

kdejaeger avatar kdejaeger commented on May 28, 2024

I rebooted the server yet again, it started filling op the 0003. I ran the url but with deleteOld=true as a request parameter.
Now the compact starts and finishes also without blocking up, in a few seconds. He deleted 0003 and continues with filling up into 0004.

from jena.

kdejaeger avatar kdejaeger commented on May 28, 2024

So he compacted a 38GB db into 74MB? I doubt, the 004 is already 200MB again from filling up with some items.

from jena.

kdejaeger avatar kdejaeger commented on May 28, 2024

Ok sorry for the rant, I did a compact again. 005 turned into 74MB again. With the deleteOld param. But this time it does block again, the compact command, and there's no finish.

from jena.

kdejaeger avatar kdejaeger commented on May 28, 2024

I discovered the 'tasks' endpoint

{ 
    "task" : "Compact" ,
    "taskId" : "2" ,
    "started" : "2022-04-09T19:28:00.006+00:00"
  } ,
  { 
    "task" : "Compact" ,
    "taskId" : "1" ,
    "started" : "2022-04-09T19:18:30.446+00:00" ,
    "finished" : "2022-04-09T19:18:38.939+00:00" ,
    "success" : true
  }

As you can see indeed the first time I tried that deleteOld parameter , it went through after a few seconds and the server continued. The second call a few minutes later is like the other ones where the server seems to block. Also read calls are being blocked. Maybe a bug?

from jena.

afs avatar afs commented on May 28, 2024

read calls are being blocked

At what point? Compaction is a write operation but it does need to take exclusive access to switch the storage databases over.

To do the switchover at, it has to (1) stop new requests (2) let existing requests finish. Outstanding reads do mean a longer wait. Although it might be possible to let outstanding reads finish on the old database without block, that interacts with deleting the old database.

from jena.

kdejaeger avatar kdejaeger commented on May 28, 2024

We do it now every 4 hours and it shrinks a database of 25GB to 100MB in 15 seconds. Unfortunately Ubuntu doesn't seem to realize that the disc isn't full anymore and fuseki still hangs on a full disc. Sigh ... .

bash-5.0# pwd
/fuseki-base
bash-5.0# du -h
1.2G    ./databases
bash-5.0# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdi         19G   19G     0 100% /fuseki-base/databases

The disc itself is a CSI disc in azure.

from jena.

kinow avatar kinow commented on May 28, 2024

Ubuntu doesn't seem to realize that the disc isn't full anymore and fuseki still hangs on a full disc. Sigh ... .

That might be due to the JVM process still running, so the OS holds back on reclaiming the space. You should be able to confirm with lsof +L1 (“``+aL1 <file_system>'' will select unlinked open files on the specified file system.”)

You could try deleting the file descriptors, but I think a better option would be trying to figure out (if confirmed the JVM is the process holding the deleted files) if we could close the files letting the OS reclaim the space.

from jena.

kdejaeger avatar kdejaeger commented on May 28, 2024

Yes I think so. Sometimes we're lucky and it get's released on time, sometimes not, as you can see. Anyway I need to figure out how to delete those files, is that hard? What do I need to do for that? Or should I wait for an update from fuseki code?

from jena.

kdejaeger avatar kdejaeger commented on May 28, 2024

Also I still wanted to point out we had previously version 4.1.0, and it didn't have the issue of a growing disc like that. It would stay around a few hundreds of megabytes for day's. Our configuration and workflow remained the same. Just some queries and saving a few triples.

from jena.

afs avatar afs commented on May 28, 2024

About 4.1.0 - could the change be that Fuseki UI defaults to creating TDb2 databases whereas at 4.1.0 it was TDB1.

You can still use TDB1 - create a config.ttl in run/configuration/databasename.ttl or copy the run/ setup from 4.1.0 usage.

from jena.

kdejaeger avatar kdejaeger commented on May 28, 2024

I'm strolling from one issue to the other it seems, now that server that did several compacts gives an error on startup:

08:36:45 INFO  System          :: Journal recovery start
08:36:45 ERROR Server          :: Exception in initialization: caught: Failed to read the journal entry data: wanted 24 bytes, got -1

I guess the DB is corrupt now and I have to start over?

from jena.

afs avatar afs commented on May 28, 2024

(sorry - missed this)

Yes, probably. Do you happen to know how it got in this state? Was a compaction done around that time?

from jena.

kdejaeger avatar kdejaeger commented on May 28, 2024

Can you guys document the 'deleteOld' parameter in de documentation. And also add a deleteOld parameter on the backup command?

from jena.

afs avatar afs commented on May 28, 2024

@kdejaeger - the website source is at https://github.com/apache/jena-site/ if you want to put in a pull request.

from jena.

kdejaeger avatar kdejaeger commented on May 28, 2024

I had another compact today the didn't end properly:

04:14:27 INFO Admin :: [86966] Compact dataset /fedora
04:14:27 INFO Server :: Task : 4 : Compact
04:14:27 INFO Server :: [Task 4] starts : Compact
04:14:27 INFO Compact :: [86966] >>>> Start compact /fedora
04:14:28 INFO Fuseki :: [86967] POST http://localhost:3030/fedora?default

Old dir not deleted - 'deleteOld' didn't work:

root@ibron--surf-acc-ibron-2-57888f6745-b9nl8:/fuseki-base/databases/fedora# ls -l
total 12
drwxr-xr-x 2 root root 4096 Jul 7 11:57 Data-0093
drwxr-xr-x 2 root root 4096 Jul 8 04:14 Data-0094
-rw-r--r-- 1 root root 2 Jul 7 11:57 tdb.lock

I think for now we will just disable compact operation and pay more for disc space.

from jena.

afs avatar afs commented on May 28, 2024

Is task 4 still running? (compact takes a while and is executed asynchronously to the request - the request schedules the compact, not do it and complete it).

from jena.

kdejaeger avatar kdejaeger commented on May 28, 2024

Usually it takes a few seconds for our server to go from gigabytes to some megabites after compacting. I was 4 hours after the start and I rebooted the server.

from jena.

afs avatar afs commented on May 28, 2024

rdfs:seeAlso #1432.

from jena.

afs avatar afs commented on May 28, 2024

@kdejaeger
After #1432, I have reloaded my understanding of compaction.

Is the setup of this issue the same - a text dataset over TDB2? Or does the report apply when there is just TDB2, no layer on top of it?

When a compact freezes the system, are there any parallel requests happening during the compaction? I'm wondering especially write requests. In one case you show this, is it always the case?

04:14:27 INFO Compact :: [86966] >>>> Start compact /fedora
04:14:28 INFO Fuseki :: [86967] POST http://localhost:3030/fedora?default

from jena.

kdejaeger avatar kdejaeger commented on May 28, 2024

It's the same configuration indeed as in #1432. We do a compact with a spring @scheduled. Every 12 hours or so. So yes, other part of our code that read or write to fuseki will just try to continue. I cannot say if that was the case and caused a block.

from jena.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.