Giter Club home page Giter Club logo

Comments (33)

DrNeutron avatar DrNeutron commented on September 24, 2024 10

Hi @bitprophet

We use Paramiko to move large CSV files between machines over long distance WAN connections (could be from a 1Mbps to 1Gbps link depending on location). We've noticed the long transfer times noted here in #60, in our use case the issue was when the transport was using compression. I know this might not apply to all the folks here with #60 issues but if you have compression read on.

Our test case is a 1.3Gb CSV file, to be moved between two machines connected by a 100Mbps lan segment. We're running RHEL 7.3, paramiko-1.15.2 and Python 3.4.1 Sending machine is using Paramiko and receiving end is running RHEL sshd.

After getting poor reports from our production instances using Paramiko with compression we did some tests looking at multiple ways to move the file between the same servers, see results below

Transfer type Avg time secs (3 runs) Effective data rate
regular scp 116 93 Mbps
scp with compression 54 201 Mbps
gzip, scp, gunzip combo 69 158 Mbps
Paramiko no compression 117 93 Mbps
Paramiko with compression 278 39 Mbps

This showed that Paramiko with compression was by far the slowest option, so we went digging...

The root cause of the issue in the choice of compression level in paramiko/paramiko/compress.py

import zlib

class ZlibCompressor (object):
    def __init__(self):
        self.z = zlib.compressobj(9)

Its hard coded to 9 which is Zlib's max compression (its default is the same as gzip = 6)

To prove this out, we did the same test setting the zlib.compressobj level to 0 through 9, shown below. We used sshd set to DEBUG level logging on the far end to record exactly how many bytes passed over the WAN/LAN.

lib Compression level Time to send (secs) Data over WAN MBytes Compression %
0 117 1,301 0%
1 33 317 76%
2 35 291 78%
3 37 269 79%
4 41 266 80%
5 53 245 81%
6 64 232 82%
7 74 227 83%
8 159 221 83%
9 278 219 83%

csv transfer times vs compression

Clearly this data is only valid for our test case of a very big CSV file, but in general we find that compression set to 9 is a bad trade off in CPU and time vs the actual gain on compression. From our data setting the level to 8 and 9 is a terrible choice while the levels 1-4 give you a pretty good bang for the buck, a lot of this will depend just how constrained each end users WAN or LAN connection really is.

The simple fix for many people will be to alter the call to zlib.compressobj() and let zlib default to 6 or a more elegant solution would be to allow the API to pass the users requested level of compression. We will be happy to submit a PR for this change.

from paramiko.

DrNeutron avatar DrNeutron commented on September 24, 2024 3

Just one more note to say that the compression slowness I commented on above may also explain why a lot of people have uni-directional slowness.

The compression level is chosen by the sending end, so for example if we are doing Paramiko -> sshd put of a file with compression, we would see slow times, high CPU. But getting exactly the same file from sshd -> Paramiko will be good performance as its the end running sshd thats doing the compression.

It could also explain the issue noted above by @Dreamsorcerer where txt files are quick and .png's are slow. This is due to the fact the trying to re-compress an already compressed file, especially at level 9 will take much longer than squishing a txt file.

Pull Request is in for this #1037

from paramiko.

 avatar commented on September 24, 2024

I also experienced similar problems - let me know if I can help to debug the issue!

from paramiko.

Bluehorn avatar Bluehorn commented on September 24, 2024

I am seeing this as well, and it is even worse on windows machines. So far I was unable to find the root cause from profiling.

from paramiko.

damiendube avatar damiendube commented on September 24, 2024

The speed problems were a blocker for me. I have tested paramiko and the fork ssh (https://github.com/bitprophet/ssh)
I now have moved to pylibssh2 (https://github.com/wallix/pylibssh2).

from paramiko.

wwh avatar wwh commented on September 24, 2024

I have seen this issue as well, and have attempted to trace the problem with the profiler.

During lengthy 'put' transfers, there is significant time spent reading the response packets from the connection. My suspicion is that self.sftp.sock.recv_ready() in SFTPFile._write() is returning true (since there is at least 1 byte available), but the response has not yet been fully received, so the calls below self.sftp._read_response(req) (also in SFTPFile._write() block waiting for the complete response. This makes the transfer behave similar to the non-pipelined mode since there is still a blocking read for each packet written.

My klugey workaround -- placing a timer around the 'if not self.pipelined...' block that limits it to once per second -- increases the same-host put speed rate from 11 MB/sec to 23 MB/sec (vs. command-line sftp at 74 MB/sec).

from paramiko.

dagwieers avatar dagwieers commented on September 24, 2024

@wwh Indeed, there is a loop with no delay that's likely doing nothing but consuming cycles while checking if there's data as fast as possible. So in fact it's reading in chunks as small as possible. Clients like lftp implement something smarter, lftp itself only reads out the buffer when the data in the buffer is larger than the size of the read (which is 64k) or it suspends the connection. (Beware that lftp is multithreaded though)

https://github.com/lavv17/lftp/blob/89487eb2d7ff3d320ee8441ad3e7f4eec0d28fc5/src/SFtp.cc#L237

from paramiko.

bitprophet avatar bitprophet commented on September 24, 2024

I've definitely noticed that SFTP is pretty slow, and have complaints about that from Fabric users as well. Experienced it myself just the other night, even.

Fabric itself recently fixed a potentially similar problem in its command execution stuff, where it was using Paramiko's channel read methods to read byte-by-byte -- super inefficient. If Paramiko itself is doing something similar in the SFTP lib, fixing that would be pretty great. I'll try to take a look soon.

from paramiko.

knktc avatar knktc commented on September 24, 2024

I am wondering that is this problem solved?

from paramiko.

bitprophet avatar bitprophet commented on September 24, 2024

Unfortunately I still haven't had a chance to dig into this and it hasn't been a pain point for me recently. Totally willing to field a pull request if somebody has more time to dig & figure it out, though.

from paramiko.

lndbrg avatar lndbrg commented on September 24, 2024

I have managed to push it up to 33MB/sec localhost -> localhost (which seems to be the limit) whitout compression and bulk checking the responses every 32MB transfered (i.e every 100 _write call in SFTPFile.py). To me this is an okay compromise since you are not ignoring to check the ACKs from the server, you are also making sure to not get the error too late in a large transfer. This works for smaller files too, since there is a cleanup routine being called when the file has been transfered.

from paramiko.

dioni21 avatar dioni21 commented on September 24, 2024

Is this bug still open? I noticed a very heavy CPU usage during a fabric put file. It's not even a big file, just some 11kb, but it takes 7 seconds and full CPU usage just to send the file. The same transfer with scp takes only 0.2 seconds.

System: Fedora 20
Fabric 1.7.0
Paramiko 1.15.1

from paramiko.

bitprophet avatar bitprophet commented on September 24, 2024

Could swear I saw recent chatter about this, yes, but I cannot find any other open tickets matching appropriate keywords. @dioni21 if you're able to do some quick bisecting with versions (eg try 1.10, 1.13, etc, narrow down whether this got worse recently or if it's constant always) that might help give us a clue. Thanks!

from paramiko.

dioni21 avatar dioni21 commented on September 24, 2024

Hi @bitprophet ,

My bysecting found no difference at all. But I managed to "locate" the problem.

Using a test file with 11583 bytes, I my results were:

Fabric==1.5.0
ecdsa==0.11
paramiko==1.15.1
pycrypto==2.6.1

=> Elapsed time: 7.620926 seconds

Fabric==1.4.4
pycrypto==2.6.1
ssh==1.8.0

=> Elapsed time: 0.423845 seconds

This elapsed time is just around the put operation:

t1 = time.time()
put(tmpfile,tmpfile)
print 'Elapsed: %f seconds' % (time.time() - t1)

Apparently, this problem has always been around ever since Fabric migrated from ssh to Paramiko.

from paramiko.

dioni21 avatar dioni21 commented on September 24, 2024

FYI,

I managed to do some profiling with these versions (latest):

Fabric==1.10.0
ecdsa==0.11
paramiko==1.15.1
pycrypto==2.6.1

The code used:

pr = cProfile.Profile()
pr.enable()
put(tmpfile,tmpfile)
pr.disable()
pr.dump_stats("profile.stats")

And the top results, sorted by cumulative time:

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   13.042   13.042 network.py:640(host_prompting_wrapper)
        1    0.000    0.000   13.042   13.042 operations.py:253(put)
        1    0.000    0.000   13.024   13.024 sftp.py:29(__init__)
        1    0.000    0.000   12.995   12.995 network.py:153(__getitem__)
        1    0.000    0.000   12.994   12.994 network.py:138(connect)
        1    0.000    0.000   12.994   12.994 network.py:381(connect)
        1    0.000    0.000   12.765   12.765 client.py:75(load_system_host_keys)
        1    0.017    0.017   12.765   12.765 hostkeys.py:79(load)
     2831    0.006    0.000   12.322    0.004 hostkeys.py:188(check)
     2833    1.755    0.001   12.287    0.004 hostkeys.py:127(lookup)
   119315    1.233    0.000    5.465    0.000 hostkeys.py:258(hash_host)
   119315    3.101    0.000    4.158    0.000 util.py:315(constant_time_bytes_eq)
   119331    0.799    0.000    1.417    0.000 hmac.py:27(__init__)
 14323543    1.013    0.000    1.013    0.000 {ord}
   238630    0.543    0.000    0.903    0.000 base64.py:310(encodestring)
   240293    0.186    0.000    0.830    0.000 py3compat.py:48(u)
  4101533    0.718    0.000    0.718    0.000 {method 'startswith' of 'str' objects}
   240293    0.240    0.000    0.603    0.000 {method 'decode' of 'str' objects}
...

Apparently, something in hostkeys is taking about 7 seconds (without profiling). Not much in a single run, but after hundreds of runs a day, it gets boring.

from paramiko.

bitprophet avatar bitprophet commented on September 24, 2024

Yea, the hostkeys related stuff is a known thing (#437 I believe). So this may just be another facet of that same issue.

from paramiko.

Dreamsorcerer avatar Dreamsorcerer commented on September 24, 2024

I'm encountering serious performance issues too. Downloading a directory of 313 items totalling 4.2 MB through my file browser takes me about 10 seconds, with paramiko it takes in the region of an hour.

from paramiko.

Dreamsorcerer avatar Dreamsorcerer commented on September 24, 2024

Printing out the files as they are downloaded, I can see that it has downloaded a bunch of text files in the region of 60KB each without a problem, but when it gets to a .png that is only 5.5KB it hangs for atleast several minutes.

from paramiko.

Dreamsorcerer avatar Dreamsorcerer commented on September 24, 2024

Oh, there are other .png's before that which it has also downloaded fine. But, it always gets stuck at the same file.

from paramiko.

mynameiscfed avatar mynameiscfed commented on September 24, 2024

Hi - I am experiencing similiar issues of performance using Paramiko and PySFTP libraries for SFTP downloads of files in the 30-40MB range. Using the package 6.6p1-2ubuntu2 for SFTP client in Ubuntu 14.04 is giving me about 4x better download speed.

I have tried to actively debug this issue and tune some parameters but I cannot figure it out. Curious if anyone has solved this issue or knows of some resources I may be able to use to debug this performance problem.

from paramiko.

manuelep avatar manuelep commented on September 24, 2024

Is this thread related to this other one marked as closed?
#175

from paramiko.

dioni21 avatar dioni21 commented on September 24, 2024

There are (were) two different problems. #175 solved a transfer speed problem, but I still have speed problems during key exchange.

from paramiko.

mukpat avatar mukpat commented on September 24, 2024

We are facing similar problem. 500 MB file SFTP is taking 90+ minutes. Tried changing window_size to 2147483647 but it did not help. Is there any other parameter which needs to be changed?

8 MB file is taking 50 seconds
8 KB file is taking 0.3 seconds

from paramiko.

romanl-g avatar romanl-g commented on September 24, 2024

@mukpat, on large files sending data in chunks may help, see #892

from paramiko.

bitprophet avatar bitprophet commented on September 24, 2024

Gonna be real ecstatic if that change fixes it in some/many/most cases. (FTR the hardcoded level 9 compression looks to be an arbitrary choice from literally the start of the project, 12 years ago. 😮) Thanks for doing the detailed digging, @DrNeutron!

from paramiko.

bitprophet avatar bitprophet commented on September 24, 2024

Quick note about this: while testing #1037, it took me an embarrassingly long time to notice that no compression occurs by default - even though debug logs imply negotiation is occurring, Paramiko disables compression (by negotiating with just an option list of ["none"]) unless you explicitly say Client.connect(compress=True) (or manually call Transport.use_compression(True), if not using Client.)

So while I'm still hoping #1037 fixes the issue for folks egressing files and explicitly using compression, I wonder if that actually covers all the various "SFTP is slow" reports we've gotten over time - presumably the average user isn't remembering (or wanting) to enable compression.

from paramiko.

jkpubsrc avatar jkpubsrc commented on September 24, 2024

I came across this here after I did a speed test in my implementation and was very surprised: Via loopback I only got just about 1.1 MByte/s of data transfer rate. This is a show stopper. As this obviously has not yet been resolved till now I need to reject the use of paramiko :-(

from paramiko.

ramei avatar ramei commented on September 24, 2024

does any has a solution for slow paramiko performance. i am transferring 15 GB of data which takes around 45 minutes. normal scp takes only around 7 minutes.
any welcome would be great. thanks

from paramiko.

svanscho avatar svanscho commented on September 24, 2024

Same here - is this a recent regression?

from paramiko.

BedivereZero avatar BedivereZero commented on September 24, 2024

same problem
paramiko only 4MB/s
but command scp, sftp nearly 100MB/s

I try to change window_size, max_package_size and compress, no change.

I set window_size, max_package_size as 4MB and get the log

DEB 2020-02-24 13:53:14,571 paramiko.transport: starting thread (client mode): 0xe47d510L
DEB 2020-02-24 13:53:14,572 paramiko.transport: Local version/idstring: SSH-2.0-paramiko_2.6.0
DEB 2020-02-24 13:53:14,588 paramiko.transport: Remote version/idstring: SSH-2.0-OpenSSH_7.4
INF 2020-02-24 13:53:14,588 paramiko.transport: Connected (version 2.0, client OpenSSH_7.4)
DEB 2020-02-24 13:53:14,594 paramiko.transport: kex algos:[u'curve25519-sha256', u'[email protected]', u'ecdh-sha2-nistp256', u'ecdh-sha2-nistp384', u'ecdh-sha2-nistp521', u'diffie-hellman-group-exchange-sha256', u'diffie-hellman-group16-sha512', u'diffie-hellman-group18-sha512', u'diffie-hellman-group-exchange-sha1', u'diffie-hellman-group14-sha256', u'diffie-hellman-group14-sha1', u'diffie-hellman-group1-sha1'] server key:[u'ssh-rsa', u'rsa-sha2-512', u'rsa-sha2-256', u'ecdsa-sha2-nistp256', u'ssh-ed25519'] client encrypt:[u'[email protected]', u'aes128-ctr', u'aes192-ctr', u'aes256-ctr', u'[email protected]', u'[email protected]', u'aes128-cbc', u'aes192-cbc', u'aes256-cbc', u'blowfish-cbc', u'cast128-cbc', u'3des-cbc'] server encrypt:[u'[email protected]', u'aes128-ctr', u'aes192-ctr', u'aes256-ctr', u'[email protected]', u'[email protected]', u'aes128-cbc', u'aes192-cbc', u'aes256-cbc', u'blowfish-cbc', u'cast128-cbc', u'3des-cbc'] client mac:[u'[email protected]', u'[email protected]', u'[email protected]', u'[email protected]', u'[email protected]', u'[email protected]', u'[email protected]', u'hmac-sha2-256', u'hmac-sha2-512', u'hmac-sha1'] server mac:[u'[email protected]', u'[email protected]', u'[email protected]', u'[email protected]', u'[email protected]', u'[email protected]', u'[email protected]', u'hmac-sha2-256', u'hmac-sha2-512', u'hmac-sha1'] client compress:[u'none', u'[email protected]'] server compress:[u'none', u'[email protected]'] client lang:[u''] server lang:[u''] kex follows?False
DEB 2020-02-24 13:53:14,594 paramiko.transport: Kex agreed: [email protected]
DEB 2020-02-24 13:53:14,594 paramiko.transport: HostKey agreed: ssh-ed25519
DEB 2020-02-24 13:53:14,595 paramiko.transport: Cipher agreed: aes128-ctr
DEB 2020-02-24 13:53:14,595 paramiko.transport: MAC agreed: hmac-sha2-256
DEB 2020-02-24 13:53:14,595 paramiko.transport: Compression agreed: none
DEB 2020-02-24 13:53:14,612 paramiko.transport: kex engine KexCurve25519 specified hash_algo <built-in function openssl_sha256>
DEB 2020-02-24 13:53:14,613 paramiko.transport: Switch to new keys ...
DEB 2020-02-24 13:53:14,635 paramiko.transport: Attempting password auth...
DEB 2020-02-24 13:53:14,653 paramiko.transport: userauth is OK
INF 2020-02-24 13:53:14,688 paramiko.transport: Authentication (password) successful!
DEB 2020-02-24 13:53:14,700 paramiko.transport: [chan 0] Max packet in: 4194304 bytes
DEB 2020-02-24 13:53:15,002 paramiko.transport: Received global request "[email protected]"
DEB 2020-02-24 13:53:15,002 paramiko.transport: Rejecting "[email protected]" global request from server.
DEB 2020-02-24 13:53:15,042 paramiko.transport: [chan 0] Max packet out: 32768 bytes
DEB 2020-02-24 13:53:15,042 paramiko.transport: Secsh channel 0 opened.
DEB 2020-02-24 13:53:15,071 paramiko.transport: [chan 0] Sesch channel 0 request ok
INF 2020-02-24 13:53:15,095 paramiko.transport.sftp: [chan 0] Opened sftp connection (server version 3)
DEB 2020-02-24 13:53:15,096 paramiko.transport.sftp: [chan 0] open('/usr/local/bin/calicoctl', 'wb')
DEB 2020-02-24 13:53:15,103 paramiko.transport.sftp: [chan 0] open('/usr/local/bin/calicoctl', 'wb') -> 00000000
DEB 2020-02-24 13:53:25,195 paramiko.transport.sftp: [chan 0] close(00000000)
DEB 2020-02-24 13:53:25,201 paramiko.transport.sftp: [chan 0] stat('/usr/local/bin/calicoctl')
INF 2020-02-24 13:53:25,203 paramiko.transport.sftp: [chan 0] sftp session closed.
DEB 2020-02-24 13:53:25,204 paramiko.transport: [chan 0] EOF sent (0)
DEB 2020-02-24 13:53:25,206 paramiko.transport: EOF in transport thread

I notice that max packet in is 4MB, but Max packet out is still 32KB. Is it the reason?

from paramiko.

sakisv avatar sakisv commented on September 24, 2024

We had the same problem and we solved it with these two changes:

  1. We used the set_pipelined to reduce the time waiting to hear back from the server
  2. We changed our data to memoryview objects to take care of this bottleneck here:
    data = data[count:]

Locally, for a 100MB file this reduced the time to transfer from 254s to 3.4s.

from paramiko.

jowparks avatar jowparks commented on September 24, 2024

FWIW to any future reader, it looks like this was fixed here (haven't tested it myself):
#892

from paramiko.

Dreamsorcerer avatar Dreamsorcerer commented on September 24, 2024

Also, more recently, I've found this issue too: pyinvoke/invoke#819
Which also causes severe speed drops, if you end up hitting it in your code.

from paramiko.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.