Comments (4)
Hi Olegario,
I tried to just run the scrapy crawl quotes
command in the /bin/ash
shell, and got this error:
Kurts-MacBook-Pro:tutorial kurtpeek$ docker run -it scraper-compose_scraper /bin/ash
/scraper/tutorial # scrapy crawl quotes
2018-09-14 03:19:58 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: tutorial)
2018-09-14 03:19:58 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.7.0 (default, Sep 12 2018, 02:07:16) - [GCC 6.4.0], pyOpenSSL 18.0.0 (OpenSSL 1.0.2o 27 Mar 2018), cryptography 2.3.1, Platform Linux-4.9.93-linuxkit-aufs-x86_64-with
2018-09-14 03:19:58 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'tutorial', 'NEWSPIDER_MODULE': 'tutorial.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['tutorial.spiders']}
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/usr/local/lib/python3.7/site-packages/scrapy/cmdline.py", line 150, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/usr/local/lib/python3.7/site-packages/scrapy/cmdline.py", line 90, in _run_print_help
func(*a, **kw)
File "/usr/local/lib/python3.7/site-packages/scrapy/cmdline.py", line 157, in _run_command
cmd.run(args, opts)
File "/usr/local/lib/python3.7/site-packages/scrapy/commands/crawl.py", line 57, in run
self.crawler_process.crawl(spname, **opts.spargs)
File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 170, in crawl
crawler = self.create_crawler(crawler_or_spidercls)
File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 198, in create_crawler
return self._create_crawler(crawler_or_spidercls)
File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 203, in _create_crawler
return Crawler(spidercls, self.settings)
File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 55, in __init__
self.extensions = ExtensionManager.from_crawler(self)
File "/usr/local/lib/python3.7/site-packages/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python3.7/site-packages/scrapy/middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "/usr/local/lib/python3.7/site-packages/scrapy/utils/misc.py", line 44, in load_object
mod = import_module(module)
File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/usr/local/lib/python3.7/site-packages/scrapy/extensions/telnet.py", line 12, in <module>
from twisted.conch import manhole, telnet
File "/usr/local/lib/python3.7/site-packages/twisted/conch/manhole.py", line 154
def write(self, data, async=False):
^
SyntaxError: invalid syntax
It seems from scrapy/scrapy#3143 that this is an issue with Scrapy itself in Python 3.7 (in which async
is a reserved variable name). You might want to try choosing a different image to downgrade the version of Python; feel free to submit a PR if that works!
By the way, this is a fairly 'special' implementation of anonymous scraping which uses the Tor control port to periodically change your apparent IP address. If you don't need this functionality, you could use a simpler image like docker-tor-privoxy-alpine
.
from scraper-compose.
How can I downgrade to version 3.6?
from scraper-compose.
I managed to change the Python version using this dockerfile
# Adapted from trcook/docker-scrapy
FROM python:3.6-alpine
RUN apk --update add python3
RUN echo 'alias python=python3.6' >> ~/.bashrc
RUN apk --update add libxml2-dev libxslt-dev libffi-dev gcc musl-dev libgcc openssl-dev curl
RUN pip install scrapy scrapy-fake-useragent stem pyparsing python-dateutil requests
COPY tutorial /scraper/tutorial
COPY wait-for/wait-for /scraper/tutorial
WORKDIR /scraper/tutorial
CMD ["./wait-for", "tor:9050", "--", "scrapy", "crawl", "quotes"]
But the problem continues
I removed the --silent
from the curl
command and it says:
Received HTTP code 500 from proxy after CONNECT
from scraper-compose.
torrc file needs to be updated to work
add this line:
SOCKSport 0.0.0.0:9050
from scraper-compose.
Related Issues (2)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scraper-compose.