Giter Club home page Giter Club logo

Comments (8)

Safihre avatar Safihre commented on June 8, 2024 1

Hi, me again!

I have finished work on a new C-module that we will use in upcoming SABnzbd releases: SABYenc.
Instead of having to do any pre-processing on the yEnc data, it will take the raw chunks of data that come from the socket (in a list) and decode them. Avoiding costly join, filtering, etc.
This offers amazing increases in performance for SAB.

But I found that the biggest boost was that in this new C-module we release Python's GIL during decoding, basically making it multi-core! Doing this plus the other optimizations in SABYenc has doubled SAB's performance (we will release it to the public in a new 2.0.0 release, coming soon).

I also found that if you add the following 2 lines to the original _yenc module _yenc.c code, it already boosts performance by ~50-60%, especially when using SSL (because of the smaller chunks).
image

Thought you might find it usefull :)

from newsreap.

Safihre avatar Safihre commented on June 8, 2024

Small other issues:
https://github.com/caronc/newsreap/blob/master/setup.py#L27
README.rst is now called README.md

https://github.com/caronc/newsreap/blob/master/newsreap/lib/SocketBase.py#L63
PROTOCOL_SSLv2 doesn't exist anymore in Python 2.7.something on many platforms (OSX/Windows), so needed to remove that to get it to work.

Also PROTOCOL_SSLv23 actually doesn't mean SSL v2 or v3 at all, weirdly enough it means to Python: negotiate the highest encryption possible. So if you set that one, Python will make sure it always uses the best one, so probably either TLSv12 or TLSv1. We found that out recently in SABnzd.
Weird, but that's Python :P

from newsreap.

caronc avatar caronc commented on June 8, 2024

Unfortunately this code is under heavy development. The readme's say what it should do :). But I'm definitely not there yet. That said, all of the framework works. I can call xover, i can decode yEnc and uuencoded files. I JUST got all of the unit tests working tonight (finally... sighs).

So... in short; I can't tell you when this will work for Windows. It needs a lot of testing still! if you've got pip installed though, you might be able to just pass it the requirements.txt file as an argument and get most of the stuff you need. In windows, i think you need a small tool that allows you to compile stuff (gevent requires some compiling for example). Sorry :(

Thanks for the info on the PROTOCOL_SSLV2; will update that.
Secondly, I will try to eliminate the yEnc dependency (TBH, i thought i already had; I have tests that test the manual python way vs the C libraries). I'll look further into that next too (if not tomorrow - getting late here).

from newsreap.

Safihre avatar Safihre commented on June 8, 2024

I did install using the requirements, that worked fine. Just the curses module is the problem.
Especially since I already have the _yenc.
Maybe you can add an if-statement that the curseswill only be imported if _yenc is not available.
Then I can try the software ;)

from newsreap.

caronc avatar caronc commented on June 8, 2024

Good call (sorry, I didn't notice that i didn't have it there already). You're certainly going to put this to the test because I've never even tried any of this on Windows. I may be using a lot of proprietary calls that are Linux only.

Again, please let me re-iterate how new this project is. It's maybe 50% complete :). I don't want to set any high expectations for you at this present time.

The framework is my focus right now which still might be useful to you. An example of how it might work would be:

from newsreap.NNTPSettings import NNTPSettings
from newsreap.NNTPManager import NNTPManager

# You can also just manually populate an NNTPSettings object from your
# own configuration file and pass it in. I'm just pulling a temporary file
# i have lingering around with this line. The Configuration file controls things
# such as # of threads, backup usenet servers, primary usenet servers,
# timeouts, etc.
cfg_file = join(dirname(dirname(dirname(abspath(__file__)))), 'config.yaml')

# Create our Settings Object and load our config file. 
settings = NNTPSettings(
    cfg_file=cfg_file,
    cfg_path=dirname(cfg_file),
    # Make sure the existing database is reset
    reset=True,
)

# Create Manager
mgr = NNTPManager(settings)
mgr.connect()

article = mgr.get(
    '[email protected]',
    work_dir='/tmp',
    group='alt.binaries.newsreap',
)

# Print content retrieved
 print article.body
 print article.header

print mgr.stat(
    '[email protected]',
    group='alt.binaries.newsreap,
)

Notes of concern would be that importing from lib.* would be bad, i'll have to rework that so it's import newsreap.NNTPSettings so it can be an installable pip module (again, just stressing how this is merely checked in code from a test directory i started). So you can expect this to all change in the near future. -Done (commit).

I also have the lower level calls if you don't want to use the threading maintained by the NNTPManager (you can do it yourself or just run single threaded):

from newsreap.NNTPConnection import NNTPConnection
## NON SSL (toggle secure=True for SSL)
socket = NNTPConnection(
     host="nntp.server", port='465',
     username='mylogin', password='mypass'
     secure=False,
     # If your usenet server doesn't need it; don't use it (faster that way but not by much)
     # most providers don't require it.
     join_group=False,
)

# Make your connection
socket.connect()

# Fetch a list of groups (all of em)
socket.groups()

# Fetch a filtered list
sock.groups(filters='alt.binaries')

# Switch to a group of interest
sock.group('alt.binaries.newsreap')

# find out your current index location
sock.tell()

# got to a specific location (partial whence support - still in todo list to finish)
sock.seek(index)

# retrieve a batch of 500 article headers (index moves):
headers = sock.next(500)

#  Xover articles manually if you don't want to use the next() and prev() functions
sock.xover(start=300, end=500)

# Know the Article ID of something you want to download:
fileptr = sock.get("crazy.cool.articleidname", "temporary/directory")

# this will pull the article down as a temporary (file) object. if the object goes out of
# scope then the file it's associated with gets lost (deleted) too.

# The fileptr returned is actually a whole other object with a save() function allowing
# you to write the file to disk as the name you want it to be.  If you just call save()
# without parameters, it uses the name parsed from the yEnc tag or uuencode tag,
# or however it was detected.

The nr.py file I intend to complete (eventually) will allow for plugins (white lists, black lists, etc) but will allow you to control all of these functions identified above the command line as well as a full indexer as it completely ties itself in with any database you like (thanks to the beauty of SQLAlchemy). By default it just uses an SQLite database since just about anyone can run one of those (they are however brutally slow with lots of data unless you use a ram drive).

I'm hoping users of this framework would anyone to be able to rewrite an indexer and focus entirely on the website instead of the backend. Or maybe they just want to write a Usenet tool that checks a specific group for new posts from time to time... etc.

I'm also hoping to make this library/framework so general that anyone (who wanted to) could even build a web page around it; so i'd be overjoyed if it ever became the backend of SABnzbd! :)

Edit: Typo's, clarification, and a big push I did last night now handle the python module aspect (fixed setup.py up so it should work too).

from newsreap.

Safihre avatar Safihre commented on June 8, 2024

I tried, but got more errors on Windows :/
Maybe I should be a bit more patient until you feel it's more ready and maybe even had time to test it on Windows yourself :)
(And when I can just give it an NZB and it will fetch all those articles and put them in files, not verification etc yet.)

One more note: I see you send the GROUP command before requesting article, but this is not nessecary anymore nowadays. I know it's probably in the Usenet spec, but none of the Usenet servers actually need it. SABnzbd will never send it, only when you force it in the Server Config.

from newsreap.

caronc avatar caronc commented on June 8, 2024

The group() command is mainly used for indexing and article tracking (a huge part of an indexer). It's needed when parsing articles (with xover()) you'll parse headers and based on how clever your regular expressions are to key on certain things, you'll extrapolate enough information to build an NZBFile (most obfuscated stuff can't be done using this method though). For SABnzbd, you'd never use this portion or these functions ever. You'd already have all the Article-IDs ahead of time (in the NZBFile itself), so you'd be doing something more like:

# open nzbfile (NNTPManager will definitely support this eventually)
# the code below is greatly simplified (without error checking for
# readability and explanatory reasons):
files = {}
for filename in nzbfile:
    files[filename] = list()
    for segment in file:
        files[filename].append(sock.get("article-id", "SABznbd/directory"))

# merge/process here

The above would be handled for you eventually automatically in the NNTPManager with:

from newsreap.NNTPSettings import NNTPSettings
from newsreap.NNTPManager import NNTPManager
# Not written yet
from newsreap.NNTPnzb import NNTPnzb

# You can also just manually populate an NNTPSettings object from your
# own configuration file and pass it in. I'm just pulling a temporary file
# i have lingering around with this line. The Configuration file controls things
# such as # of threads, backup usenet servers, primary usenet servers,
# timeouts, etc.
cfg_file = join(dirname(dirname(dirname(abspath(__file__)))), 'config.yaml')

# Create our Settings Object and load our config file. 
settings = NNTPSettings(
    cfg_file=cfg_file,
    cfg_path=dirname(cfg_file),
    # Make sure the existing database is reset
    reset=True,
)

# Create Manager
mgr = NNTPManager(settings)
mgr.connect()

# Parse an NZB File (NNTPnzb not written yet)
# This is it's own class because when you post, you'll be able to call the write()
# function and produce an NZBFile for distribution.  It'll be used for created
# and extrapolating to/frome
nzb = NNTPnzb('path/to/nzbfile')

# results i'm still deciding as to what they'll be
# Probably a list of NNTPArticles()
results = mgr.get(nzb, '/path/to/download')

I like your idea though, maybe it's better if you hold off for a bit. Windows support isn't high on my priority list at the moment unfortunately. :)

Downloading nzb files a next in line for support (for sure). The other thing i'd like to support is the ability to monitor download transfers themselves and retry stale/stalled ones automatically. We all have those Usenet providers that stall out on us (network throughput dies) every now and then. This is a common gripe in the indexing world (was with newznab anyway).

from newsreap.

caronc avatar caronc commented on June 8, 2024

That's great man! Good for you for porting some of the slow code to C! That can be challenging dealing with all the different platforms out there but so rewarding in the end if you pull it off!

I appreciate you sharing your finds too! Thank you!

from newsreap.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.