Giter Club home page Giter Club logo

Comments (36)

trapexit avatar trapexit commented on June 1, 2024

I need more detailed information about the setup and use a simplified setup that can be replicated.

https://github.com/trapexit/mergerfs#support

What OS? How was it installed? Is the offset for diff always different? What are your exact config?

from mergerfs.

PetrosStavrakakis avatar PetrosStavrakakis commented on June 1, 2024

Hi.

I will try and tell you all informatons:

The problem exists on my system since the 16.10.2023.

All Downloads prior the date are OK.

I download files via JDownloader or wget or curl.

I made a system update on the 16th via "apt update && apt upgrade" and made a reboot.

The only relevant to this case, IMHO, is a update of glibc to version 2.36-9+deb12u3 from 2.36-9+deb12u1.

Mergerfs is not running on any virtualisation like docker or such.


  • OS:

root@pve:/ext-usb/mergerfs/union/data# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
root@pve:/ext-usb/mergerfs/union/data#

root@pve:/ext-usb/mergerfs/union/data# uname -a

Linux pve 6.2.16-15-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-15 (2023-09-28T13:53Z) x86_64 GNU/Linux


  • MergerFS Version:
    root@pve:/ext-usb/mergerfs/union/data# mergerfs -V
    mergerfs v2.37.1

I downloaded the deb package from github an installed it via dpkg.


The offset for diff is not always the same:

First Download:
root@pve:/ext-usb/mergerfs/union/data# cmp PDF200MB_sdh1.pdf PDF200MB_mergerfs.pdf
PDF200MB_sdh1.pdf PDF200MB_mergerfs.pdf differ: byte 56137241, line 224719
root@pve:/ext-usb/mergerfs/union/data#

Second Download:
root@pve:/ext-usb/sdh1/data# cmp PDF200MB_sdh1.pdf PDF200MB_mergerfs.pdf
PDF200MB_sdh1.pdf PDF200MB_mergerfs.pdf differ: byte 197153817, line 1336303
root@pve:/ext-usb/sdh1/data#

Third Downlad:
--> Files are the same

Fourth Download:
root@pve:/ext-usb/sdh1/data# cmp PDF200MB_sdh1.pdf PDF200MB_mergerfs.pdf
PDF200MB_sdh1.pdf PDF200MB_mergerfs.pdf differ: byte 25270809, line 105234
root@pve:/ext-usb/sdh1/data#

I mount the mergerfs via fstab:
root@pve:/ext-usb/sdh1/data# cat /etc/fstab

/dev/pve/root / ext4 errors=remount-ro 0 1
UUID=F391-570F /boot/efi vfat defaults 0 1
/dev/pve/swap none swap sw 0 0
proc /proc proc defaults 0 0
UUID=fd756d7d-a57b-4447-951a-1cba87230415 /ext-usb/sdc1 xfs defaults 0 1
UUID=6486f55c-8481-4881-af2e-3f1dd5609c99 /ext-usb/sda1 xfs defaults 0 1
UUID=53318944-360b-4192-a796-04fca3b0125b /ext-usb/sdf1 xfs defaults 0 1
UUID=ace6e4c2-8246-4259-939d-238ee5eeee10 /ext-usb/sdg1 xfs defaults 0 1
UUID=bc7b716e-c420-40b5-af6e-8b2d89b7a9f6 /ext-usb/sdb1 xfs defaults 0 1
UUID=dcde2686-23a9-4cf8-866b-b158b1253042 /ext-usb/sdh1 xfs defaults 0 1
/ext-usb/sda1:/ext-usb/sdb1:/ext-usb/sdc1:/ext-usb/sdf1:/ext-usb/sdg1:/ext-usb/sdh1 /ext-usb/mergerfs/union fuse.mergerfs defaults,nonempty,allow_other,use_ino,ignorepponrename=true,dropcacheonclose=true,category.create=mfs,moveonenospc=true,posix_acl=true,func.getattr=newest,fsname=mergerfs,cache.files=per-process 0 0
root@pve:/ext-usb/sdh1/data#

As a furthergoing test i downloaded the files to another disk and copied the file to to the mergerfs:

root@pve:/ext-usb/sdh1/data# cd /root/
root@pve:~# wget https://link.testfile.org/PDF200MB
--2023-10-22 09:14:40-- https://link.testfile.org/PDF200MB
Resolving link.testfile.org (link.testfile.org)... 188.114.96.3, 188.114.97.3, 2a06:98c1:3121::3, ...
Connecting to link.testfile.org (link.testfile.org)|188.114.96.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://files.testfile.org/PDF/200MB-TESTFILE.ORG.pdf [following]
--2023-10-22 09:14:40-- https://files.testfile.org/PDF/200MB-TESTFILE.ORG.pdf
Resolving files.testfile.org (files.testfile.org)... 188.114.96.3, 188.114.97.3, 2a06:98c1:3121::3, ...
Connecting to files.testfile.org (files.testfile.org)|188.114.96.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 214119654 (204M) [application/pdf]
Saving to: ‘PDF200MB’

PDF200MB 100%[========================================>] 204.20M 115MB/s in 1.8s

2023-10-22 09:14:42 (115 MB/s) - ‘PDF200MB’ saved [214119654/214119654]

root@pve:~# cp PDF200MB /ext-usb/mergerfs/union/data/PDF200MB_mergerfs

root@pve:~# cmp PDF200MB /ext-usb/mergerfs/union/data/PDF200MB_mergerfs

PDF200MB /ext-usb/mergerfs/union/data/PDF200MB_mergerfs differ: byte 11872793, line 53254

root@pve:~# cp PDF200MB /ext-usb/mergerfs/union/data/PDF200MB_mergerfs

root@pve:~# cmp PDF200MB /ext-usb/mergerfs/union/data/PDF200MB_mergerfs

root@pve:~# cp PDF200MB /ext-usb/mergerfs/union/data/PDF200MB_mergerfs

root@pve:~# cmp PDF200MB /ext-usb/mergerfs/union/data/PDF200MB_mergerfs

PDF200MB /ext-usb/mergerfs/union/data/PDF200MB_mergerfs differ: byte 149810201, line 587626

root@pve:~#

So it seems that this a general problem with file operations to the mergerfs mount on my system.

I made an strace of the cp command which results in a corrupt file on the mergerfs.

app.strace.txt

from mergerfs.

trapexit avatar trapexit commented on June 1, 2024

The problem exists on my system since the 15.10.2023.

What do you mean by this?

from mergerfs.

PetrosStavrakakis avatar PetrosStavrakakis commented on June 1, 2024

The problem exists on my system since the 15.10.2023.

What do you mean by this?

I'm sorry I meant 16.10.2023 as is made an update which i described above.

from mergerfs.

PetrosStavrakakis avatar PetrosStavrakakis commented on June 1, 2024

I made an mergerfs trace while copying the File "PDF200MB" to the mergerfs mount which results in a corrupt file.

mergerfs.strace.txt

from mergerfs.

trapexit avatar trapexit commented on June 1, 2024

Have you tried any settings relevant to narrow down things? Such as cache.files, threads, etc.?

from mergerfs.

PetrosStavrakakis avatar PetrosStavrakakis commented on June 1, 2024

I did not alter the configuration yet , as this system has run now for about a year without any problems.
I installed 2.37.1 of mergerfs only because this problem occured out of a sudden.
Prior to this i had 2.33.5, which was bundled with debian.

As a test i added threads=-1 and also threads=6. Same result.

What occured to me is, that there is a slight timing Variation in this.
I made a small test:
cp PDF200MB /ext-usb/mergerfs/union/data/PDF200MB_mergerfs && echo "Compare A: " && cmp -l PDF200MB /ext-usb/mergerfs/union/data/PDF200MB_mergerfs && sleep 2 && echo -n "Compare B: " && cmp PDF200MB /ext-usb/mergerfs/union/data/PDF200MB_mergerfs

Output is:
root@pve:~# cp PDF200MB /ext-usb/mergerfs/union/data/PDF200MB_mergerfs && echo "Compare A: " && cmp -l PDF200MB /ext-usb/mergerfs/union/data/PDF200MB_mergerfs && sleep 2 && echo -n "Compare B: " && cmp PDF200MB /ext-usb/mergerfs/union/data/PDF200MB_mergerfs
Compare A:
Compare B: PDF200MB /ext-usb/mergerfs/union/data/PDF200MB_mergerfs differ: byte 171749122, line 841596

So..Directly after the copy-process files are the same. After 2 Seconds they differ.
And every time the offset is another.

from mergerfs.

trapexit avatar trapexit commented on June 1, 2024

This is proxmox? Not regular debian? I just installed Debian 12 on a system, installed bookworm x86_64 version from releases page, wget'ed the file... works fine. Copied files around through mergerfs. No issues. Same settings.

from mergerfs.

PetrosStavrakakis avatar PetrosStavrakakis commented on June 1, 2024

Sorry...I did not mention that. Yes ist proxmox , latest stable version. Underlying OS is bookworm.
I make the test directly on bare metal , not in an Virtualization.

from mergerfs.

trapexit avatar trapexit commented on June 1, 2024

I installed 2.37.1 of mergerfs only because this problem occured out of a sudden.

You are saying this happened after an update of the OS across multiple versions of mergerfs? This is important information. Please... I truly mean all info when I say I need all details about what your setup is and what you've tried.

from mergerfs.

PetrosStavrakakis avatar PetrosStavrakakis commented on June 1, 2024

No..This Problem on my system occured also with 2.33.5.
I installed 2.37.1 manually ONLY to check if it would fix it.

from mergerfs.

PetrosStavrakakis avatar PetrosStavrakakis commented on June 1, 2024

I should do tests with other filesizes.
I think the error occures only on Files bigger than 100MB.
I will get to you with my results.

from mergerfs.

PetrosStavrakakis avatar PetrosStavrakakis commented on June 1, 2024

I made the Copy-Compare-Test with 3 More Files Sizes (100MB, 50MB, 20MB).
These files get also corrupted.

from mergerfs.

PetrosStavrakakis avatar PetrosStavrakakis commented on June 1, 2024

Base Configuration:
First Installation: 12.01.2023 Proxmox 7.3 (Bullseye)
Updated in March to 7.4 (Bullseye)
Updated in August to Proxmox 8.0 (Bookworm)
Updated to 8.0.3 and 8.0.4
Proxmox 8.04 (Debian 12) + mergerfs 2.33.5 : Everything was fine.
Updated the System on the 16.10.2023 .
The only relevant beside samba updates was the update of libc-bin to version 2.36-9+deb12u3.
From that point on all downloaded files were corrupt.
I installed mergerfs 2.37.1 yesterday to see if the problem persisted.

from mergerfs.

PetrosStavrakakis avatar PetrosStavrakakis commented on June 1, 2024

Downgraded to prior installed libc version 2.36-9+deb12u1.
Same behavior. So i think libc has nothing to do with it.

from mergerfs.

trapexit avatar trapexit commented on June 1, 2024

This sounds like a hardware or OS issue. mergerfs has changed a decent amount between 2.33.5 and 2.37.1 and the fact it happened after an OS update after a year of working fine suggests similar. It could be that something in the OS triggered a bug in mergerfs but it would be unique to Proxmox. They seem to use a non-Debian kernel. Have you updated the kernel lately?

I am unable to install proxmox on a spare machine I have. I'll have to try a VM.

from mergerfs.

trapexit avatar trapexit commented on June 1, 2024

And have you tried any other FUSE filesystems? There was a number of changes to FUSE in 6.2 kernel and I'm pretty sure there was some bugs introduced.

from mergerfs.

trapexit avatar trapexit commented on June 1, 2024

image

I can't seem to replicate the issue in a VM with a fresh install of Proxmox.

from mergerfs.

PetrosStavrakakis avatar PetrosStavrakakis commented on June 1, 2024

Hi.
Have you tried update your proxmox installation to 8.0.4?
I have configured now a unionfs as a workaround. No Problems thus far.
It is slow but it gets the job done for now.
As a test I configured mergerfs and unionfs parallel with the same disks and mountpoints.
Made the Copy And Compare Test of a 100MB File on both Union Filesystems.
Unionfs works. Mergerfs does not.
It must be something with the proxmox Kernel then. or the FUSE.

from mergerfs.

trapexit avatar trapexit commented on June 1, 2024

I'm not a Proxmox user. How does one upgrade?

from mergerfs.

PetrosStavrakakis avatar PetrosStavrakakis commented on June 1, 2024

In Addition: With the update i made the kernel was updated from 6.2.16-12-pve to 6.2.16-15-pve

Regarding the Update:
https://tteck.github.io/Proxmox/
-> Proxmox VE Tools -> Proxmox VE Post Install

Run the command below in the Proxmox VE Shell.
bash -c "$(wget -qLO - https://github.com/tteck/Proxmox/raw/main/misc/post-pve-install.sh)"

After the reboot you can update/upgrade via apt.

from mergerfs.

trapexit avatar trapexit commented on June 1, 2024

I'm having nothing but problems with networking with virtualbox and proxmox. I'll have to try again another time.

from mergerfs.

trapexit avatar trapexit commented on June 1, 2024

Just as I posted the above I got it working.... and after the update it still works fine.

from mergerfs.

PetrosStavrakakis avatar PetrosStavrakakis commented on June 1, 2024

Ok .
So as we tried everything to reproduce this, this must be an error on my side.
In the meantime unionfs works flawlessly for the last 4 hours ,
so I will use unionfs until I figure out where the problem with mergerfs resides.
Thank you for your time and support.

from mergerfs.

trapexit avatar trapexit commented on June 1, 2024

There shouldn't be "an error on [your] side." There is no valid situation where corruption will happen that wouldn't happen on another filesystem.

unionfs is very old and uses a simple set of FUSE options. This is why it is important to play with different options related to writing to see if any of them change things. cache.files, writeback, moveonenospc, etc.

from mergerfs.

PetrosStavrakakis avatar PetrosStavrakakis commented on June 1, 2024

Here in Germany it is now 7:00 AM. I have to go to work. After Work I will alter my mergerFS options and set them to them which are stated in your Image.I will get back to you in a couple of hours.
Thank you.

from mergerfs.

PetrosStavrakakis avatar PetrosStavrakakis commented on June 1, 2024

Eureka , I found it.
I had to remove the option "dropcacheonclose=true" although i use cache.file=partial.
Same configuration , without the option works now flawlessly. No Corruption .
Made the test again with the option enabled....Corruption every time i create a file on mergerfs.
So there must be a problem with the kernel.
Could it be this?
https://lore.kernel.org/all/[email protected]/

from mergerfs.

trapexit avatar trapexit commented on June 1, 2024

Curious. It shouldn't be in that you are running 6.2 kernel. Unless Proxmox backported it from 6.3+. Also... I'm using "DONTNEED". Not "NOREUSE". In part because NOREUSE didn't work as desired.

To be clear: you have cache.file=partial and then dropcacheonclose=true == corruption and dropcacheonclose=false == no corruption? What about cache.file=off? Or auto-full?

from mergerfs.

PetrosStavrakakis avatar PetrosStavrakakis commented on June 1, 2024

cache.file=partial + dropchacheonclose=true == corruption
cache.file=off + dropacheonclose=true == corruption
cache.file=auto-full + dropcacheonclose=true ==corruption

cache.file=[partial/off/auto-full] + dropcacheonclose=false == no corruption.

from mergerfs.

trapexit avatar trapexit commented on June 1, 2024

Interesting. dropcacheonclose literally is used in 1 place in all the code and just calls fadvise dont need twice. If you have the time... could you try https://github.com/Feh/nocache ? Just copy a file with nocache tool like it shows. And perhaps test with mergerfs pointing to just /tmp?

It could be a xfs bug too.

from mergerfs.

PetrosStavrakakis avatar PetrosStavrakakis commented on June 1, 2024

I compiled nocache and did the tests.
Copied a file on my mergerfs (dropcacheonclose=true , cache.files=partial) on XFS with nocache and compared it = corruption
Configured a mergerfs mount with the same configuration on /tmp and did the Copy/Compare... = no corruption

I think you are right. XFS is the Problem.

from mergerfs.

trapexit avatar trapexit commented on June 1, 2024

I meant to use nocache directly with the underlying filesystem. Like /ext-usb/sdc1. Since nocache and mergerfs are effectively doing the same thing.

from mergerfs.

PetrosStavrakakis avatar PetrosStavrakakis commented on June 1, 2024

Sorry. This was a misunderstanding.
I made the test with all 6 hard disks.
The Copy/Compare Test works on sda1,sdb1,sdc1,sdf1 and sdg1.
sdh1, my new HDD (only 3 Months old) , a Seagate Exos X18 SATA III 18TB, corrupts the file .
smartctl logged no errors on a short test.
Could it be the SATA Cable that would cause this?
But why did it work so long?

from mergerfs.

trapexit avatar trapexit commented on June 1, 2024

So all drives... you ran something like nocache cp /tmp/randomfile /ext-usb/sdX/ and all worked fine except sdh1?

Do you know if you formatted that one with different settings? A different version of mkfs.xfs or whatnot? As I understand xfs has been getting numerous enhancements over the past year or two. Might want to use xfs tooling to check each filesystem's settings and do a xfs_repair or whatnot.

from mergerfs.

PetrosStavrakakis avatar PetrosStavrakakis commented on June 1, 2024

Hi.
I owe you an apology for wasting your time.
I found the problem.
The Hard disk which had the Problem was connected to PCI-E To SATA Adapter.
I think that something changed in the kernel or the adapter got faulty all out of a sudden as it worked flawlessly yet.
I ordered a new adapter, changed the adapter and there is no more corruption on this disk after writing a file to it.
The Adapter ist ASUS U3S6 Rev 1.0. A Marvell SE9123 Chip is repsonsible for both SATA Ports.
As i said...Sorry for wasting your time.
This can be closed.

from mergerfs.

trapexit avatar trapexit commented on June 1, 2024

I appreciate the apology, I did spend several hours looking into this, but... it happens. No worries.

So the chipset of both the non-working and working adapters are the same? No errors in the kernel logs? That is an oddly specific issue.

from mergerfs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.