Giter Club home page Giter Club logo

txi2p's People

Contributors

exarkun avatar str4d avatar warner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

txi2p's Issues

.dataReceived() is called once per byte, probably inefficient

While analyzing tahoe#2861, we identified a likely performance concern with the way txi2p delivers data on the server side of a connection.

This happens when you use a txi2p.sam.endpoints.SAMI2PStreamServerEndpoint, which is how you listen on an .i2p address. txi2p implements this by making an outbound connection to the local I2P daemon, writing a command that says "hey, I want to accept connections for (some .i2p address)", then waiting for a response. When some client connects, the daemon responds ("hey, someone connected, get ready to talk to them"), and then uses the same TCP connection for the subsequent tunneled data.

On the txi2p side, there is a parser/state machine (implemented with Parsely) that manages the initial command and response. Once the response is received, this state machine is moved into State_readData, which matches on arbitrary single bytes (the "anything:data" target), and delivers each one to receiver.dataReceived().

This is sound, but slow. The expected scenario is when e.g. a Tahoe client uploads several megabytes of binary data to an I2P-based server, delivered through Foolscap and into the I2P connection. On the receiving (server) side, large buffers can be delivered in a single system call, up to the size of the kernel buffers (typically 64kB). This could all be processed in a single .dataReceived() invocation. When txi2p breaks this up into a lot of one-byte invocations instead, performance will suffer (in particular, CPU usage on the server will be higher than necessary). Worst case is probably a quadratic slowdown, if the next-higher protocol (e.g. Foolscap) does the lazy thing and appends the incoming bytes to a buffer until the expected number have been received:

  def dataReceived(self, data):
      self.buffer += data
      if len(self.buffer) == self.expected:
          self.messageReceived(self.buffer)
          self.buffer = b""

To fix this, txi2p will need to swap out the Parsley parser for a direct connection to the target protocol's .dataReceived, when it moves into State_readData. @washort suggested:

<dash> so what i'd do is, subclass the protocol, write a dataReceived for it
that checks self.currentRule
<dash> and if it's 'State_readdata' just invoke the appropriate dataReceived
directly and not the parser

This might also affect the inbound side of I2P client connections too (those created with SAMI2PStreamClientEndpoint), I'm not sure. In the Tahoe context, this would be a Tahoe client downloading a file from an I2P-hosted server, and the additional CPU load would occur on the client side.

setup_requires is superseded by PEP 518

PEP 518 provides a mechanism for declaring "build-system" requirements. pip >= 10 supports PEP 518 and therefore using it allows pip to install things which previously "setup_requires" was used to get installed. The advantage of having pip install the dependencies is that pip is maintained and receives a lot of attention. setuptools, which makes "setup_requires" work, does not receive as much attention (for historical reasons if nothing else).

Specifically, this would fix an issue where pip uses a modern TLS client & configuration to download dependencies and setuptools uses something older and less functionality. On some platforms, this causes setuptools to fail with a spurious certificate validation failure when pip can succeed. eg:

$ python setup.py install
Download error on https://pypi.org/simple/vcversioner/: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:726) -- Some packages may not be found!
Couldn't find index page for 'vcversioner' (maybe misspelled?)
Download error on https://pypi.org/simple/: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:726) -- Some packages may not be found!
No local packages or working download links found for vcversioner>=1
Traceback (most recent call last):
  File "setup.py", line 45, in <module>
    'twisted.plugins',
  File "/tmp/scratch/env/lib/python2.7/site-packages/setuptools/__init__.py", line 128, in setup
    _install_setup_requires(attrs)
  File "/tmp/scratch/env/lib/python2.7/site-packages/setuptools/__init__.py", line 123, in _install_setup_requires
    dist.fetch_build_eggs(dist.setup_requires)
  File "/tmp/scratch/env/lib/python2.7/site-packages/setuptools/dist.py", line 514, in fetch_build_eggs
    replace_conflicting=True,
  File "/tmp/scratch/env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 770, in resolve
    replace_conflicting=replace_conflicting
  File "/tmp/scratch/env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1053, in best_match
    return self.obtain(req, installer)
  File "/tmp/scratch/env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1065, in obtain
    return installer(requirement)
  File "/tmp/scratch/env/lib/python2.7/site-packages/setuptools/dist.py", line 581, in fetch_build_egg
    return cmd.easy_install(req)
  File "/tmp/scratch/env/lib/python2.7/site-packages/setuptools/command/easy_install.py", line 667, in easy_install
    raise DistutilsError(msg)
distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('vcversioner>=1')
$ pip install vcversioner                                                        
Collecting vcversioner                                                                         
  Using cached https://files.pythonhosted.org/packages/5a/6b/6f5da157648cadbaf83f625c395cd23ff6be3421268b7bf54523b8d9aaab/vcversioner-2.16.0.0-py2-none-any.whl
Installing collected packages: vcversioner                                                     
Successfully installed vcversioner-2.16.0.0            
$

inbound data may be delayed until next chunk is received

While analyzing tahoe#2861, we noticed a performance concern with the way txi2p delivers the first few chunks of inbound data.

txi2p.sam.stream.StreamAcceptReceiver.dataReceived has two jobs. First, it waits until a complete newline-terminated SAM handshake response is received. At this point it can notify the higher-level protocol that a connection has been established. Second, once established, all subsequent (application-layer) bytes that arrive at .dataReceived() should be passed directly to the higher-level protocol's .dataReceived().

    def dataReceived(self, data):
        if self.peer:
            # Pass all other data to the wrapped Protocol.
            if self.initialData:
                data = self.initialData + data
                self.initialData = None
            self.wrappedProto.dataReceived(data)
        else:
            self.initialData += data
            if '\n' in self.initialData:
                # First line is the peer's Destination.
                data, self.initialData = self.initialData.split('\n', 1)
                self.peer = peerSAM(data)
                self.factory.streamAcceptIncoming(self)

However, if the chunk of data that completes the SAM handshake response also includes some application-layer bytes, those bytes will be trapped in a buffer until a subsequent .dataReceived() is called.

This is not a problem for HTTP-like protocols, where the client speaks first: in those cases, the expected order of events is:

  • SAM.dataReceived(handshake response)
  • -> app.connectionMade()
  • -> app.transport.write(client request)
  • SAM.dataReceived(server response)
  • -> app.dataReceived(server response)

However, it will cause a loss of progress for SMTP-like protocols, where the server speaks first, and the client waits for the server to speak before making a request. This could happen if the server makes it's initial message very quickly, and if the local I2P daemon chooses to include it in the same TCP segment as the SAM response:

  • .dataReceived(handshake response + initial server message)
  • -> app.connectionMade()
  • (but no call to app.dataReceived())

In this case, the initial server message will be stuck in self.initialData until the server sends a second segment, causing .dataReceived() to be invoked a second time, allowing the if self.peer: clause to be invoked, whereupon the stored data can finally be delivered. If the client strictly waits for the server's message before speaking, the client will never progress, and the protocol will deadlock.

To fix this, .dataReceived() should do something more like this:

        else:
            self.initialData += data
            if '\n' in self.initialData:
                # First line is the peer's Destination.
                data, self.initialData = self.initialData.split('\n', 1)
                self.peer = peerSAM(data)
                self.factory.streamAcceptIncoming(self)
                data, self.initialData = self.initialData, None
                self.wrappedProto.dataReceived(data)

Alternatively, it could do self.dataReceived("") immediately after the call to streamAcceptIncoming(self), to reduce code duplication slightly. The general hazard to remain aware of is reentrancy: if the call to streamAcceptIncoming() could somehow cause the reactor to call .dataReceived(), then we must be sure that self.initialData is in a safe state before allowing the reentrant call to occur.

Note that this problem might be bypassed entirely by #3, if that causes .dataReceived() to be fed one byte at a time. In that case, self.initialData will never hold anything. However once #3 is fixed, this problem could be exposed.

Time for a new release?

I'm working on a Python 3 port for Tahoe-LAFS, which depends via foolscap on txi2p. I see that Python 3 support was added in #9. It's unclear to me whether foolscap will maintain i2p support, but in any case I thought I'd echo @sajith's question: Any chance of cutting a new release so txi2p's Python 3 support is more readily available to the world? :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.