Comments (5)
I dont understand why Splash is needed in order to support phpbb3 style cookies? If autologin requires Splash, then it is no longer really a python module and requires greater architecture for it to function. While I am not well versed in phpbb3 style cookies - I do not see why faking a header request with all of the proper information cannot be done - which is pretty easy in Scrapy.
We have been very happy with integrating autologin in our scraping architecture, and I think the best use of the module will be to make it standalone as much as possible.
from autologin.
Thanks for the feedback, @madisonb! Do you use autologin as a library to get the request data and then send it with Scrapy?
The situation where splash support is helpful is when we use autologin as a service, perhaps even on a different host, and also crawl via a separate splash instance. In this case by using the same splash instance both in autologin and in the crawler we get the same ip and the same user-agent, and can also log in on sites that are hard to handle without splash (js heavy or tor).
from autologin.
Just to clarify - splash support it intended to be optional, not a requirement.
from autologin.
Precisely, we use autologin/formasaurus in library form and but could switch over to autologin as a service if needed, and then use the cookies generated within Scrapy. We dont use Splash instances to crawl the open web, and for Tor we have our spiders configured to work with the network.
Most sites in the past have not cared whether the cookie comes from a different IP, but the phpbb3 sites may and we may need extra engineering for work with that.
from autologin.
This is done in #8 by using scrapy and scrapy-splash.
from autologin.
Related Issues (18)
- If the login form redirects to the same URL, nologinform is returned HOT 2
- Very high memory usage HOT 1
- Support working via proxy HOT 1
- Support solving captchas during login HOT 2
- Sites that do not change cookies during login HOT 1
- Add an ability to find login form when it is not on the start page. HOT 1
- Support solving captchas that prevent getting to the login page
- Tracebacks are not shown and nothing happens when there is an error in the middleware
- Update flask-wtf
- "pending" in HTTP API if credentials are created in admin without login url
- Error when "extra_js" fails to run.
- Google oauth support? HOT 1
- Not able to find the captcha code in branch decaptcha
- error: legacy-install-failure HOT 1
- Tor support HOT 1
- v0.1.4: ImportError: cannot import name 'joblib' from 'sklearn.externals' HOT 1
- Merge crawler-integration HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from autologin.