Giter Club home page Giter Club logo

schlabber's Introduction

Hi there πŸ‘‹

Below are a few things I might working on

schlabber's People

Contributors

blickfeldkurier avatar locke avatar oe1rfc avatar scottytm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

schlabber's Issues

Dealing with failed downloads

Sometimes image-files get created without any content.
For now I had to weed out any corrupted files (<1KB) and re-run the entire process to fill in the failed downloads.
Would be sweet to have a post download check if filesize is anything reasonable (e.g. >1KB)
If not retry the download.

Failed to establish a new connection: [Errno -2] Name or service not known'

The URL in the first line 5 looks wrong to me. The base-url is doubled:

Get: http://fritzoid.soup.io/since/5405703?mode=own
Process Posts
Looking for next Page
no next found. retry 1 of 9999
Get: http://fritzoid.soup.iohttp://fritzoid.soup.io/since/5405703?mode=own
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 137, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 67, in create_connection
    for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM):
  File "/usr/lib/python3.5/socket.py", line 732, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 560, in urlopen
    body=body, headers=headers)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 354, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.5/http/client.py", line 1134, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python3.5/http/client.py", line 1179, in _send_request
    self.endheaders(body)
  File "/usr/lib/python3.5/http/client.py", line 1130, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python3.5/http/client.py", line 946, in _send_output
    self.send(msg)
  File "/usr/lib/python3.5/http/client.py", line 889, in send
    self.connect()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 162, in connect
    conn = self._new_conn()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 146, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x7f820ce6f5c0>: Failed to establish a new connection: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 376, in send
    timeout=timeout
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 610, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 283, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='fritzoid.soup.iohttp', port=80): Max retries exceeded with url: //fritzoid.soup.io/since/5405703?mode=own (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f820ce6f5c0>: Failed to establish a new connection: [Errno -2] Name or service not known',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./schlabber.py", line 400, in <module>
    main(args.soups, args.dir, args.continue_from, args.retries)
  File "./schlabber.py", line 391, in main
    soup.backup(cont_from, max_retries)
  File "./schlabber.py", line 370, in backup
    dl = requests.get(dlurl)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 67, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 480, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 588, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 437, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='fritzoid.soup.iohttp', port=80): Max retries exceeded with url: //fritzoid.soup.io/since/5405703?mode=own (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f820ce6f5c0>: Failed to establish a new connection: [Errno -2] Name or service not known',))

Video breaks stuff

Pretty sure this video is the culprit:
https://volldost.soup.io/post/624265635/Pharaoh-cuttlefish-pretending-to-be-a-hermit

Not sure what's different with this particular video… Any help appreciated!

Get: http://xxxxxxxx.soup.io/since/624753327?mode=own
Process Posts
                Image:
                        Skip https://asset.soup.io/asset/13645/1804_6a84_500.jpeg: File exists
                Video:
Traceback (most recent call last):
  File "./schlabber.py", line 400, in 
    main(args.soups, args.dir, args.continue_from, args.retries)
  File "./schlabber.py", line 391, in main
    soup.backup(cont_from, max_retries)
  File "./schlabber.py", line 375, in backup
    self.process_posts(page, dlurl)
  File "./schlabber.py", line 349, in process_posts
    self.process_video(post, dlurl)
  File "./schlabber.py", line 152, in process_video
    filename = self.get_asset_name(meta['soup_url'])
  File "./schlabber.py", line 45, in get_asset_name
    return name.split('/')[-1].split('.')[0]
AttributeError: 'NoneType' object has no attribute 'split'

TypeError: getresponse() got an unexpected keyword argument 'buffering'

Get: http://fritzoid.soup.io/since/94376386?mode=own
{'cookies_enabled': '1',
 'soup_pool': 'B',
 'soup_session_id': '0c3995cd91089f38f59c83cd87037008'}
Failed with Status Code: 503
no next found. retry 1 of 9999
Get: http://fritzoid.soup.io/since/94376386?mode=own
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 377, in _make_request
    httplib_response = conn.getresponse(buffering=True)
TypeError: getresponse() got an unexpected keyword argument 'buffering'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 560, in urlopen
    body=body, headers=headers)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 379, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.5/http/client.py", line 1225, in getresponse
    response.begin()
  File "/usr/lib/python3.5/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.5/http/client.py", line 276, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 376, in send
    timeout=timeout
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 610, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 257, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/lib/python3/dist-packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 560, in urlopen
    body=body, headers=headers)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 379, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.5/http/client.py", line 1225, in getresponse
    response.begin()
  File "/usr/lib/python3.5/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.5/http/client.py", line 276, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
requests.packages.urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./schlabber.py", line 428, in <module>
    main(args.soups, args.dir, args.continue_from, args.retries)
  File "./schlabber.py", line 419, in main
    soup.backup(cont_from, max_retries)
  File "./schlabber.py", line 393, in backup
    dl = self.session.get(dlurl, allow_redirects=True, cookies=self.session.cookies)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 492, in get
    return self.request('GET', url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 480, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 588, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 426, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

caption used in description

Get: http://athalis.soup.io/since/280148032?mode=own
{'cookies_enabled': '1',
 'soup_pool': 'B',
 'soup_session_id': 'f610bb6de72062e91cb9ce580c93ed27'}
Process Posts
		Image:
			Skip https://asset.soup.io/asset/3774/1985_bda6_500.jpeg: File exists
		Image:
			Skip https://asset.soup.io/asset/3778/1675_796d.jpeg: File exists
		Image:
			Skip https://asset.soup.io/asset/3777/9764_4631_684.jpeg: File exists
		Image:
Traceback (most recent call last):
  File "./schlabber.py", line 428, in <module>
    main(args.soups, args.dir, args.continue_from, args.retries)
  File "./schlabber.py", line 419, in main
    soup.backup(cont_from, max_retries)
  File "./schlabber.py", line 399, in backup
    self.process_posts(page, dlurl)
  File "./schlabber.py", line 354, in process_posts
    self.process_image(post, dlurl)
  File "./schlabber.py", line 58, in process_image
    meta['source'] = caption.find('a').get("href")
AttributeError: 'NoneType' object has no attribute 'get'

It's this post: https://athalis.soup.io/post/280147923/WeeeSOURCE-domics-me

There are two

in that post, one as the source caption and one within the description.

It looks like this is resetting the url to the beginning. I uploaded a log here: https://gist.github.com/Locke/e6fd4d1a1966980e27ce0e453b9436c2

It looks like this is resetting the url to the beginning. I uploaded a log here: https://gist.github.com/Locke/e6fd4d1a1966980e27ce0e453b9436c2

Small excerpt:

Get: http://athalis.soup.io/since/613560173?mode=own
Looking for next Page
Process Posts
no next found. retry 1 of 10
Get: http://athalis.soup.io
Looking for next Page
Process Posts
no next found. retry 2 of 10
Get: http://athalis.soup.io
Looking for next Page
Process Posts
no next found. retry 3 of 10
Get: http://athalis.soup.io
Looking for next Page
Process Posts
no next found. retry 4 of 10
Get: http://athalis.soup.io
Looking for next Page
Process Posts
no next found. retry 5 of 10
Get: http://athalis.soup.io
Looking for next Page
	...found script
Process Posts
		Image:
			Skip https://asset.soup.io/asset/14977/4363_42f4_600.jpeg: File exists
[...]
Get: http://athalis.soup.io/since/696310839?mode=own
Looking for next Page
Process Posts
no next found. retry 1 of 10
Get: http://athalis.soup.io
Looking for next Page
Process Posts
no next found. retry 2 of 10
Get: http://athalis.soup.io
Looking for next Page
	...found script
Process Posts
		Image:
			Skip https://asset.soup.io/asset/14977/4363_42f4_600.jpeg: File exists
[...]

Originally posted by @Locke in #7 (comment)

UnicodeEncodeError: 'charmap' codec can't encode character '\u2764' in position 129: character maps to <undefined>

(Windows) Python 3.8, how invoked:
python schlabber.py -d d:\download\astrid astrid

Traceback (most recent call last):
File "schlabber.py", line 361, in
main(args.soups, args.dir, args.continue_from)
File "schlabber.py", line 353, in main
soup.backup(cont_from)
File "schlabber.py", line 345, in backup
self.process_posts(page)
File "schlabber.py", line 332, in process_posts
self.process_regular(post)
File "schlabber.py", line 288, in process_regular
rf.write(content)
File "C:\Users\pulaski\AppData\Local\Programs\Python\Python38\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2764' in position 129: character maps to

Seems that if a character is not in the default windows character set it falls flat on its face and cannot converted from unicode, should be taken care of.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.