Below are a few things I might working on
blickfeldkurier / schlabber Goto Github PK
View Code? Open in Web Editor NEWArchives Soup.io pages
Archives Soup.io pages
Sometimes image-files get created without any content.
For now I had to weed out any corrupted files (<1KB) and re-run the entire process to fill in the failed downloads.
Would be sweet to have a post download check if filesize is anything reasonable (e.g. >1KB)
If not retry the download.
The URL in the first line 5 looks wrong to me. The base-url is doubled:
Get: http://fritzoid.soup.io/since/5405703?mode=own
Process Posts
Looking for next Page
no next found. retry 1 of 9999
Get: http://fritzoid.soup.iohttp://fritzoid.soup.io/since/5405703?mode=own
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 137, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 67, in create_connection
for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM):
File "/usr/lib/python3.5/socket.py", line 732, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 560, in urlopen
body=body, headers=headers)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python3.5/http/client.py", line 1134, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python3.5/http/client.py", line 1179, in _send_request
self.endheaders(body)
File "/usr/lib/python3.5/http/client.py", line 1130, in endheaders
self._send_output(message_body)
File "/usr/lib/python3.5/http/client.py", line 946, in _send_output
self.send(msg)
File "/usr/lib/python3.5/http/client.py", line 889, in send
self.connect()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 162, in connect
conn = self._new_conn()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 146, in _new_conn
self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x7f820ce6f5c0>: Failed to establish a new connection: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 376, in send
timeout=timeout
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 610, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 283, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='fritzoid.soup.iohttp', port=80): Max retries exceeded with url: //fritzoid.soup.io/since/5405703?mode=own (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f820ce6f5c0>: Failed to establish a new connection: [Errno -2] Name or service not known',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./schlabber.py", line 400, in <module>
main(args.soups, args.dir, args.continue_from, args.retries)
File "./schlabber.py", line 391, in main
soup.backup(cont_from, max_retries)
File "./schlabber.py", line 370, in backup
dl = requests.get(dlurl)
File "/usr/lib/python3/dist-packages/requests/api.py", line 67, in get
return request('get', url, params=params, **kwargs)
File "/usr/lib/python3/dist-packages/requests/api.py", line 53, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 480, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 588, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 437, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='fritzoid.soup.iohttp', port=80): Max retries exceeded with url: //fritzoid.soup.io/since/5405703?mode=own (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f820ce6f5c0>: Failed to establish a new connection: [Errno -2] Name or service not known',))
Pretty sure this video is the culprit:
https://volldost.soup.io/post/624265635/Pharaoh-cuttlefish-pretending-to-be-a-hermit
Not sure what's different with this particular video⦠Any help appreciated!
Get: http://xxxxxxxx.soup.io/since/624753327?mode=own Process Posts Image: Skip https://asset.soup.io/asset/13645/1804_6a84_500.jpeg: File exists Video: Traceback (most recent call last): File "./schlabber.py", line 400, in main(args.soups, args.dir, args.continue_from, args.retries) File "./schlabber.py", line 391, in main soup.backup(cont_from, max_retries) File "./schlabber.py", line 375, in backup self.process_posts(page, dlurl) File "./schlabber.py", line 349, in process_posts self.process_video(post, dlurl) File "./schlabber.py", line 152, in process_video filename = self.get_asset_name(meta['soup_url']) File "./schlabber.py", line 45, in get_asset_name return name.split('/')[-1].split('.')[0] AttributeError: 'NoneType' object has no attribute 'split'
Get: http://fritzoid.soup.io/since/94376386?mode=own
{'cookies_enabled': '1',
'soup_pool': 'B',
'soup_session_id': '0c3995cd91089f38f59c83cd87037008'}
Failed with Status Code: 503
no next found. retry 1 of 9999
Get: http://fritzoid.soup.io/since/94376386?mode=own
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 377, in _make_request
httplib_response = conn.getresponse(buffering=True)
TypeError: getresponse() got an unexpected keyword argument 'buffering'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 560, in urlopen
body=body, headers=headers)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 379, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.5/http/client.py", line 1225, in getresponse
response.begin()
File "/usr/lib/python3.5/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.5/http/client.py", line 276, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 376, in send
timeout=timeout
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 610, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 257, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/lib/python3/dist-packages/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 560, in urlopen
body=body, headers=headers)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 379, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.5/http/client.py", line 1225, in getresponse
response.begin()
File "/usr/lib/python3.5/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.5/http/client.py", line 276, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
requests.packages.urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./schlabber.py", line 428, in <module>
main(args.soups, args.dir, args.continue_from, args.retries)
File "./schlabber.py", line 419, in main
soup.backup(cont_from, max_retries)
File "./schlabber.py", line 393, in backup
dl = self.session.get(dlurl, allow_redirects=True, cookies=self.session.cookies)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 492, in get
return self.request('GET', url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 480, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 588, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 426, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
Get: http://athalis.soup.io/since/280148032?mode=own
{'cookies_enabled': '1',
'soup_pool': 'B',
'soup_session_id': 'f610bb6de72062e91cb9ce580c93ed27'}
Process Posts
Image:
Skip https://asset.soup.io/asset/3774/1985_bda6_500.jpeg: File exists
Image:
Skip https://asset.soup.io/asset/3778/1675_796d.jpeg: File exists
Image:
Skip https://asset.soup.io/asset/3777/9764_4631_684.jpeg: File exists
Image:
Traceback (most recent call last):
File "./schlabber.py", line 428, in <module>
main(args.soups, args.dir, args.continue_from, args.retries)
File "./schlabber.py", line 419, in main
soup.backup(cont_from, max_retries)
File "./schlabber.py", line 399, in backup
self.process_posts(page, dlurl)
File "./schlabber.py", line 354, in process_posts
self.process_image(post, dlurl)
File "./schlabber.py", line 58, in process_image
meta['source'] = caption.find('a').get("href")
AttributeError: 'NoneType' object has no attribute 'get'
It's this post: https://athalis.soup.io/post/280147923/WeeeSOURCE-domics-me
There are two
It looks like this is resetting the url to the beginning. I uploaded a log here: https://gist.github.com/Locke/e6fd4d1a1966980e27ce0e453b9436c2
Small excerpt:
Get: http://athalis.soup.io/since/613560173?mode=own
Looking for next Page
Process Posts
no next found. retry 1 of 10
Get: http://athalis.soup.io
Looking for next Page
Process Posts
no next found. retry 2 of 10
Get: http://athalis.soup.io
Looking for next Page
Process Posts
no next found. retry 3 of 10
Get: http://athalis.soup.io
Looking for next Page
Process Posts
no next found. retry 4 of 10
Get: http://athalis.soup.io
Looking for next Page
Process Posts
no next found. retry 5 of 10
Get: http://athalis.soup.io
Looking for next Page
...found script
Process Posts
Image:
Skip https://asset.soup.io/asset/14977/4363_42f4_600.jpeg: File exists
[...]
Get: http://athalis.soup.io/since/696310839?mode=own
Looking for next Page
Process Posts
no next found. retry 1 of 10
Get: http://athalis.soup.io
Looking for next Page
Process Posts
no next found. retry 2 of 10
Get: http://athalis.soup.io
Looking for next Page
...found script
Process Posts
Image:
Skip https://asset.soup.io/asset/14977/4363_42f4_600.jpeg: File exists
[...]
Originally posted by @Locke in #7 (comment)
(Windows) Python 3.8, how invoked:
python schlabber.py -d d:\download\astrid astrid
Traceback (most recent call last):
File "schlabber.py", line 361, in
main(args.soups, args.dir, args.continue_from)
File "schlabber.py", line 353, in main
soup.backup(cont_from)
File "schlabber.py", line 345, in backup
self.process_posts(page)
File "schlabber.py", line 332, in process_posts
self.process_regular(post)
File "schlabber.py", line 288, in process_regular
rf.write(content)
File "C:\Users\pulaski\AppData\Local\Programs\Python\Python38\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2764' in position 129: character maps to
Seems that if a character is not in the default windows character set it falls flat on its face and cannot converted from unicode, should be taken care of.
File "./schlabber.py", line 189, in process_file
meta["soup_url"] = linkelem.find('a').get('href')
AttributeError: 'NoneType' object has no attribute 'get'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.