/usr/bin/cp very slow

Hello,
recently we noticed that using cp on a directory with many files was taking much longer on almalinux 9.3 than it was taking on oracle linux 7.9. The machines are the same, just an OS reinstallation.

After trying many things, I copied the cp binary from oracle linux 7 on the alma linux 9.3 machine and the time went from 830 minutes on alma to 14 minutes.

What could possibly have changed in cp that would have such an impact?

oracle linux cp:
cp (GNU coreutils) 8.22
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Torbjörn Granlund, David MacKenzie, and Jim Meyering.

alma linux cp:
cp (GNU coreutils) 8.32
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later https://gnu.org/licenses/gpl.html.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Torbjorn Granlund, David MacKenzie, and Jim Meyering.

Thanks for any hints on how to get that to work as fast as it used to.

Pierre,

It looks like nobody is interested in this problem but, as a user, I feel concerned. If this is a general problem with cp, everyone should feel worried!

But I’m trying to understand, and I don’t have enough data.

  • How many files is “many files”? And how small are they?
  • What filesystem? And on what kind of devices?
  • Does rsync suffer from the same problem, or it’s much faster?
  • Have you tried timing on fewer files, but still with relevantly different results between the cp versions?
  • Any relevant warnings or errors, I don’t know, in dmesg or elsewhere?

This is stunning.

Hello,
For the tests, I’m using the gcc-13.2 tar which has around 120000 files, I don’t know which size but source code so probably not very big for most of them.

The filesystem that shows this problem is an NFS server running truenas. I tried a few NFS clients in case it was specific to 1 machine and now I’m doing all the tests from the same machine to have numbers I can compare.

rsync is working as expected. scp is as slow as cp (I used scp without a hostname so maybe it calls cp underneath).

I haven’t tried on fewer files.

I haven’t seen anything that seems relevant in logs but I could’ve missed it.

I downloaded the source code to coreutils 8.32 (same version as alma) and compiled it and this version seems to be ok. I also compiled coreutils-9.0 and that version is slow (15m25.343s vs 954m49.535s). I don’t know what that means but it might be related. Also tried the centos stream cp program and that one seems to behave the same as the alma linux one.

I also asked someone in a different department that’s also using alma linux 9 with NFS and it seems to be working at normal speed for them. Maybe we can get information from comparing our systems.

Thanks.

  • So the problem only occurs on non-local filesystems?

  • I’m pretty sure that scp does not call cp.

  • Very strange thing that 8.32 recompiled works fine, but 9.0 does not. Centos 9-Stream has still 8.23, so it makes sense to behave like Alma’s. But why compiling it by hand gives different results?

The mystery doesn’t seem to give up.

Ah, but no. You should build it from the SRPM, i.e. https://repo.almalinux.org/vault/9.3/BaseOS/Source/Packages/coreutils-8.32-34.el9.src.rpm, because coreutils.spec will then apply 36 patch files! One of these patches is likely to be the reason for your performance issues!

I suspect this one:

# basic support for checking NFSv4 ACLs (#2137866)
Patch19:  coreutils-nfsv4-acls.patch

Do you happen to have on your system an alias, so that cp actually performs a cp --preserve=xattr? I would rather use alias cp="rsync -ah --progress" :wink:

Otherwise, maybe you have too small rsize and wsize values in fstab for that mount.

You might be right with the acl patch, I was thinking of something acl related and a backport of some code from a more recent version.

With a manual mount and the bigger wsize and rsize options, it looks like the server rejects the higher values and puts lower ones.

We’ll have to try and see if it’s possible to change that but I don’t want to do that on production servers so we’ll have to see how we can test this.

I think the acl patch might be something to look into more details as well.

Thanks for all the inputs.

I did many more tests, eventually rebuilt coreutils and disabled the nfsv4-acls patch but that didn’t have any effect on the time.

Looking throught the coreutils.spec, I noticed other cp patches and this comment:
# cp: default to --reflink=auto (#1861108)

So I did a test with --reflink=never and got the correct time. It seems like this option is very costly for us at least.

1 Like

Forgot how I stumbled across this thread, but I have successfully reproduced the observation on a TrueNAS-13.0-U6.2 NFS server (vers=4.2). When I tried reproducing this on an Ubuntu 24.04 NFS server, it seems unaffected (vers=4.2).

[root@alma9 export]# time timeout 10m cp --reflink=never -r gcc-13.2.0 gcc-13.2.0c && time timeout 10m cp -r gcc-13.2.0 gcc-13.2.0d

real    4m41.712s
user    0m1.410s
sys     0m28.554s

real    3m15.751s
user    0m1.020s
sys     0m21.993s
[root@alma9 export]# uname -a
Linux localhost 5.14.0-427.33.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Aug 30 09:45:56 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux

Another observation is that when cp is not given --reflink=never, cp would call copy_file_range() instead of just opening the files and performing reads & writes; according to man page, this will use server-side copy when supported (and it looks like it is supported by the Ubuntu server since the copy was faster than --reflink=never).

The TrueNAS community seems to have concluded that this is a bug on the coreutils when working on a ZFS-backed NFS export. But since I could not reproduce this on the Ubuntu server, I think I will perform further testing with vanilla FreeBSD 13, FreeBSD 14 and TrueNAS Scale when time allows.