I cloned the repository and tried to execute the two steps from the <code class="notra

torrc file needs to be updated to work add this line: <p dir="au

scrapercompose_scraper_1 exited with code 1 about scraper-compose HOT 4 OPEN

khpeek commented on September 19, 2024

scrapercompose_scraper_1 exited with code 1

from scraper-compose.

Comments (4)

khpeek commented on September 19, 2024

Hi Olegario,

I tried to just run the scrapy crawl quotes command in the /bin/ash shell, and got this error:

Kurts-MacBook-Pro:tutorial kurtpeek$ docker run -it scraper-compose_scraper /bin/ash
/scraper/tutorial # scrapy crawl quotes
2018-09-14 03:19:58 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: tutorial)
2018-09-14 03:19:58 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.7.0 (default, Sep 12 2018, 02:07:16) - [GCC 6.4.0], pyOpenSSL 18.0.0 (OpenSSL 1.0.2o  27 Mar 2018), cryptography 2.3.1, Platform Linux-4.9.93-linuxkit-aufs-x86_64-with
2018-09-14 03:19:58 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'tutorial', 'NEWSPIDER_MODULE': 'tutorial.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['tutorial.spiders']}
Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 11, in <module>
    sys.exit(execute())
  File "/usr/local/lib/python3.7/site-packages/scrapy/cmdline.py", line 150, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/usr/local/lib/python3.7/site-packages/scrapy/cmdline.py", line 90, in _run_print_help
    func(*a, **kw)
  File "/usr/local/lib/python3.7/site-packages/scrapy/cmdline.py", line 157, in _run_command
    cmd.run(args, opts)
  File "/usr/local/lib/python3.7/site-packages/scrapy/commands/crawl.py", line 57, in run
    self.crawler_process.crawl(spname, **opts.spargs)
  File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 170, in crawl
    crawler = self.create_crawler(crawler_or_spidercls)
  File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 198, in create_crawler
    return self._create_crawler(crawler_or_spidercls)
  File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 203, in _create_crawler
    return Crawler(spidercls, self.settings)
  File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 55, in __init__
    self.extensions = ExtensionManager.from_crawler(self)
  File "/usr/local/lib/python3.7/site-packages/scrapy/middleware.py", line 58, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "/usr/local/lib/python3.7/site-packages/scrapy/middleware.py", line 34, in from_settings
    mwcls = load_object(clspath)
  File "/usr/local/lib/python3.7/site-packages/scrapy/utils/misc.py", line 44, in load_object
    mod = import_module(module)
  File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.7/site-packages/scrapy/extensions/telnet.py", line 12, in <module>
    from twisted.conch import manhole, telnet
  File "/usr/local/lib/python3.7/site-packages/twisted/conch/manhole.py", line 154
    def write(self, data, async=False):
                              ^
SyntaxError: invalid syntax

It seems from scrapy/scrapy#3143 that this is an issue with Scrapy itself in Python 3.7 (in which async is a reserved variable name). You might want to try choosing a different image to downgrade the version of Python; feel free to submit a PR if that works!

By the way, this is a fairly 'special' implementation of anonymous scraping which uses the Tor control port to periodically change your apparent IP address. If you don't need this functionality, you could use a simpler image like docker-tor-privoxy-alpine.

from scraper-compose.

olegario96 commented on September 19, 2024

How can I downgrade to version 3.6?

from scraper-compose.

olegario96 commented on September 19, 2024

I managed to change the Python version using this dockerfile

# Adapted from trcook/docker-scrapy
FROM python:3.6-alpine
RUN apk --update add python3
RUN echo 'alias python=python3.6' >> ~/.bashrc
RUN apk --update add libxml2-dev libxslt-dev libffi-dev gcc musl-dev libgcc openssl-dev curl
RUN pip install scrapy scrapy-fake-useragent stem pyparsing python-dateutil requests
COPY tutorial /scraper/tutorial
COPY wait-for/wait-for /scraper/tutorial
WORKDIR /scraper/tutorial
CMD ["./wait-for", "tor:9050", "--", "scrapy", "crawl", "quotes"]

But the problem continues

I removed the --silent from the curl command and it says:

Received HTTP code 500 from proxy after CONNECT

from scraper-compose.

argalasjr commented on September 19, 2024

torrc file needs to be updated to work

add this line:

SOCKSport 0.0.0.0:9050

from scraper-compose.

scrapercompose_scraper_1 exited with code 1 about scraper-compose HOT 4 OPEN

Comments (4)

Related Issues (2)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent