Giter Club home page Giter Club logo

minet's Issues

pain to install thoughts

  • from my experience on Linux python3.5 python3-dev seems like required by dragnet install process. Not 100% sure though.

  • dragnet does not install its pip dependencies, while waiting for them to accept the PR adding those deps to minet requirements would help many users

  • finally dragnet being a large piece to get installed, it could be installed only optionally? Something like the dragnet option catch the specific dragnet module error and display the command to install it + documentation ?

Add adaptative throttling strategies

Using concepts such as exponential backoff. Might be a bit fun/tricky to implement in a multithreaded fashion with limited memory consumption.

weird idna encoding error when resolving some urls

(note: It might be more appropriate to move this issue to quenouille)

When trying to resolve this url http://www.outremersbeyou.com/talent-de-la-semaine-la-designer-comorienne-aisha-wadaane-je-suis-fiere-de-mes-origines/ we end up with the following surprising stacktrace

(which is weird as this url is indeed a redirection, but to some normally encoded (but bad) url : Location: http://www.outremers360.comtalent-de-la-semaine-la-designer-comorienne-aisha-wadaane-je-suis-fiere-de-mes-origines/

I guess because of the missing slash, it considers as TLD "comtalent-de-la-semaine-la-designer-comorienne-aisha-wadaane-je-suis-fiere-de-mes-origines" and since there are dashes inside, it tries to interpret it as punycode...

Traceback (most recent call last):
  File "/home/boo/.pyenv/versions/3.6.9/lib/python3.6/encodings/idna.py", line 167, in encode
    raise UnicodeError("label too long")
UnicodeError: label too long
 
The above exception was the direct cause of the following exception:
 
Traceback (most recent call last):
  File "bin/complete_links_resolving_v2.py", line 100, in <module>
    resolve()
  File "/home/boo/.pyenv/versions/quenouille/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/boo/.pyenv/versions/quenouille/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/boo/.pyenv/versions/quenouille/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/boo/.pyenv/versions/quenouille/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "bin/complete_links_resolving_v2.py", line 57, in resolve
    for res in multithreaded_resolve(urls_to_clear, threads=50, throttle=0.5, max_redirects=15):
  File "/home/boo/.pyenv/versions/quenouille/lib/python3.6/site-packages/quenouille/imap.py", line 353, in output
    raise e.with_traceback(trace)
  File "/home/boo/.pyenv/versions/quenouille/lib/python3.6/site-packages/quenouille/imap.py", line 303, in worker
    result = func(data)
  File "/home/boo/.pyenv/versions/quenouille/lib/python3.6/site-packages/minet/fetch.py", line 237, in worker
    error, stack = resolve(http, url, max=max_redirects, **kwargs)
  File "/home/boo/.pyenv/versions/quenouille/lib/python3.6/site-packages/minet/utils.py", line 187, in resolve
    headers_only=True
  File "/home/boo/.pyenv/versions/quenouille/lib/python3.6/site-packages/minet/utils.py", line 167, in request
    redirect=redirect
  File "/home/boo/.pyenv/versions/quenouille/lib/python3.6/site-packages/urllib3/request.py", line 68, in request
    **urlopen_kw)
  File "/home/boo/.pyenv/versions/quenouille/lib/python3.6/site-packages/urllib3/request.py", line 89, in request_encode_url
    return self.urlopen(method, url, **extra_kw)
  File "/home/boo/.pyenv/versions/quenouille/lib/python3.6/site-packages/urllib3/poolmanager.py", line 326, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/home/boo/.pyenv/versions/quenouille/lib/python3.6/site-packages/urllib3/connectionpool.py", line 603, in urlopen
    chunked=chunked)
  File "/home/boo/.pyenv/versions/quenouille/lib/python3.6/site-packages/urllib3/connectionpool.py", line 355, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/home/boo/.pyenv/versions/3.6.9/lib/python3.6/http/client.py", line 1254, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/home/boo/.pyenv/versions/3.6.9/lib/python3.6/http/client.py", line 1300, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/home/boo/.pyenv/versions/3.6.9/lib/python3.6/http/client.py", line 1249, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/home/boo/.pyenv/versions/3.6.9/lib/python3.6/http/client.py", line 1036, in _send_output
    self.send(msg)
  File "/home/boo/.pyenv/versions/3.6.9/lib/python3.6/http/client.py", line 974, in send
    self.connect()
  File "/home/boo/.pyenv/versions/quenouille/lib/python3.6/site-packages/urllib3/connection.py", line 183, in connect
    conn = self._new_conn()
  File "/home/boo/.pyenv/versions/quenouille/lib/python3.6/site-packages/urllib3/connection.py", line 160, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "/home/boo/.pyenv/versions/quenouille/lib/python3.6/site-packages/urllib3/util/connection.py", line 57, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/home/boo/.pyenv/versions/3.6.9/lib/python3.6/socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label too long)

Output file error

Not giving a -o output.csv file will result in a TypeError: expected str, bytes or os.PathLike object, not NoneType on line 142 of fetch.py -> output_file = open(namespace.output, 'w')

I had other issues but you broke paris.demosphere.net so I can't do any further testing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.