whtsky / bencoder.pyx Goto Github PK
View Code? Open in Web Editor NEWA fast bencode implementation in Cython
License: BSD 3-Clause "New" or "Revised" License
A fast bencode implementation in Cython
License: BSD 3-Clause "New" or "Revised" License
Hi,
I have a torrent file decoded, and now I need to bencode it again after having modified a few things. However, I get an error:
Traceback (most recent call last):
File "test.py", line 61, in <module>
print (bencode(data['info']))
File "bencoder.pyx", line 159, in bencoder.bencode (bencoder.c:4217)
File "bencoder.pyx", line 100, in bencoder.encode (bencoder.c:2787)
File "bencoder.pyx", line 141, in bencoder.encode_dict (bencoder.c:4084)
File "bencoder.pyx", line 100, in bencoder.encode (bencoder.c:2787)
File "bencoder.pyx", line 109, in bencoder.encode_int (bencoder.c:3015)
OverflowError: value too large to convert to int
Here is a pprint of the data I'm trying to encode:
https://gist.github.com/vinz243/9071201b4057f6f838e97936595572fd
Thanks in advance
Not 100% sure, but on first glance it looks like you didn't build for Python 3.7.
I'm running macOS 10.13.2:
(tester--unpHGAN) ➜ tester pip install bencoder.pyx==1.2.1
Collecting bencoder.pyx==1.2.1
Could not find a version that satisfies the requirement bencoder.pyx==1.2.1 (from versions: 1.0.0, 1.1.0, 1.1.1, 1.1.2, 1.1.3)
No matching distribution found for bencoder.pyx==1.2.1
I am uncertain if this is a bug or not, but I noticed that when decoding bencoded data, any decoded strings are left as byte strings even though they're not really binary data, but actual human-readable strings.
eg.
bdecode(b'8:a string')
# b'a string'
If you wanted to use this as a string you would have to also run it through decode('utf-8)
.
From what I understand, the bencode standard does not make a distinction between strings and binary data. Would it be a good idea to try and decode any byte strings to regular strings and if it fails then leave it as-is, assuming it is actually binary data? I just know that having to append decode('utf-8)
onto every decoded dict key gets to be very repetitive.
Doing some hacking on @boramalper's magnetico.
It relies on data present at the end of a bencoded bytestring. In order to use bencoder.pyx I had to patch out https://github.com/whtsky/bencoder.pyx/blob/master/bencoder.pyx#L96 and return both r
and l
.
See https://github.com/boramalper/magnetico/blob/master/magneticod/magneticod/bencode.py#L53 for the function for which I needed to add a replacement.
Would this project accept a PR adding a bdecode2 returning both r and l, and not erroring on excess data?
Thanks!
Hi,
I've discovered that my pure Python implementation of bencode encode is faster than your Cython one. Depending on what is being encoded it can range from a few percent faster to several orders of magnitude faster. Not sure why that is the case, I would expect the Cython version to be faster in all cases.
Here's my bencode encode function.
def encode(obj):
"""
Encode data in to bencode, return bytes.
The following objects may be encoded: int, bytes, list, dicts.
Dict keys must be bytes, and unicode strings will be encoded in to
utf-8.
"""
binary = []
append = binary.append
def add_encode(obj):
"""Encode an object, appending bytes to `binary` list."""
if isinstance(obj, bytes):
append(b'%i:%b' % (len(obj), obj))
elif isinstance(obj, memoryview):
append(b'%i:%b' % (len(obj), obj.tobytes()))
elif isinstance(obj, str):
obj_bytes = obj.encode('utf-8')
append(b"%i:%b" % (len(obj_bytes), obj_bytes))
elif isinstance(obj, int):
append(b"i%ie" % obj)
elif isinstance(obj, (list, tuple)):
append(b"l")
for item in obj:
add_encode(item)
append(b'e')
elif isinstance(obj, dict):
append(b'd')
try:
for key, value in sorted(obj.items(), key=itemgetter(0)):
append(b"%i:%b" % (len(key), key))
add_encode(value)
except TypeError:
raise EncodeError('dict keys must be bytes')
append(b'e')
else:
raise EncodeError(
'value {!r} can not be encoded in Bencode'.format(obj)
)
add_encode(obj)
return b''.join(binary)
https://docs.python.org/3.11/whatsnew/3.11.html documents that longintrepr.h
can no longer be included, but just patching this out is insufficient to fix compilation failures.
bencoder.pyx
fails to build on 3.11 due to this exact reason. Can this be fixed somehow?
> [builder 3/3] RUN pip wheel --no-cache --no-deps bencoder.pyx:
#0 24.14 Building wheel for bencoder.pyx (pyproject.toml): started
#0 24.50 Building wheel for bencoder.pyx (pyproject.toml): finished with status 'error'
#0 24.51 error: subprocess-exited-with-error
#0 24.51
#0 24.51 × Building wheel for bencoder.pyx (pyproject.toml) did not run successfully.
#0 24.51 │ exit code: 1
#0 24.51 ╰─> [14 lines of output]
#0 24.51 Compiling bencoder.pyx because it changed.
#0 24.51 [1/1] Cythonizing bencoder.pyx
#0 24.51 running bdist_wheel
#0 24.51 running build
#0 24.51 running build_ext
#0 24.51 building 'bencoder' extension
#0 24.51 creating build
#0 24.51 creating build/temp.linux-x86_64-cpython-311
#0 24.51 gcc -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -DTHREAD_STACK_SIZE=0x100000 -fPIC -I/usr/local/include/python3.11 -c bencoder.c -o build/temp.linux-x86_64-cpython-311/bencoder.o -O3
#0 24.51 bencoder.c:211:12: fatal error: longintrepr.h: No such file or directory
#0 24.51 211 | #include "longintrepr.h"
#0 24.51 | ^~~~~~~~~~~~~~~
#0 24.51 compilation terminated.
#0 24.51 error: command '/usr/bin/gcc' failed with exit code 1
#0 24.51 [end of output]
#0 24.51
#0 24.51 note: This error originates from a subprocess, and is likely not a problem with pip.
#0 24.51 ERROR: Failed building wheel for bencoder.pyx
Source distribution for 2.0.1 is missing from PyPI. There are only wheels published. Previous versions had both wheels and source distributions.
This means that if you try to install package on platform where wheels are not available, like Raspberry Pi (ARM), it will most likely fail, because Pip won't find support distribution.
This is due to a stray line in the PyPI package in the bencoder.pyx.egg-info/SOURCES.TXT:
/Users/whtsky/Documents/codes/bencoder.pyx/bencoder.c
Removing this allows the build to complete successfully.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.