Giter Club home page Giter Club logo

Comments (11)

alirezamika avatar alirezamika commented on August 15, 2024

Yes. You should pass the authorization headers/cookies via request_args parameter.

from autoscraper.

Narasimha1997 avatar Narasimha1997 commented on August 15, 2024

Yes, it is possible. But not directly.

  1. First, use web automation tools like selenium to simulate the login.
  2. Read the html contents from the private page you want to scrape.
  3. Pass the html content to the build() and also specify list of targets you are interested in. Like : scraper.build(html = your_private_content, wanted_list = [...])

Or if you can get any session object which maintains the state or auth headers you can pass it to the build( ) method via request_args parameter. Check examples for how to do it.

This issue does not fall in AutoScraper's context, but it'll not stop you from achieving your usecase.
Hope this helps.

from autoscraper.

NickGoto avatar NickGoto commented on August 15, 2024

Thank you,

I am new with authentication and this request_args. If I have the login:'Login' and the pwd: '123'. How can I structure to request_args?
Thank you.

from autoscraper.

Narasimha1997 avatar Narasimha1997 commented on August 15, 2024

Try looking for code snippets online. You can find many selenium code snippets for popular websites like Facebook, twitter etc. Once you get the html content of the webpage, use AutoScraper.

from autoscraper.

NickGoto avatar NickGoto commented on August 15, 2024

Ok thanks. I read some articles using

access = {
'email': user,
'pass': pwd
}
s = requests.Session()

p = s.post(url, data = access)
print(p.status_code)

I got access, but how I can set it with request_args.

Thanks

from autoscraper.

alirezamika avatar alirezamika commented on August 15, 2024

pass the headers and the cookies when calling build and get_results methods:

scraper.build(url, wanted_list, request_args=dict(headers=s.headers, cookies=s.cookies.get_dict()))

from autoscraper.

NickGoto avatar NickGoto commented on August 15, 2024

I did it, but I receive always None like an answer.
;(

from autoscraper.

alirezamika avatar alirezamika commented on August 15, 2024

You must check if the data is present in the html content returned by the request module or not. This depends on various conditions. Like if js is needed, what headers are needed, etc. I don't have enough information to help you here. Maybe you can share some so we can check the code or the page.

from autoscraper.

NickGoto avatar NickGoto commented on August 15, 2024

Sure, I tried to change my approach. I just copied the html page and tried to use the html content, instead of the url.

wanted_list = ['375146505']

scraper = AutoScraper()

result = scraper.build(wanted_list, html = html)

print(result)

I got this response

TypeError Traceback (most recent call last)
in
3 scraper = AutoScraper()
4
----> 5 result = scraper.build(wanted_list, html = html)
6
7 print(result)

~\Anaconda3\envs\autoscraper\lib\site-packages\autoscraper\auto_scraper.py in build(self, url, wanted_list, html, request_args, update)
176 self.stack_list = []
177
--> 178 for wanted in wanted_list:
179 children = self._get_children(soup, wanted, url)
180

TypeError: 'NoneType' object is not iterable

from autoscraper.

alirezamika avatar alirezamika commented on August 15, 2024

change your build line to result = scraper.build(wanted_list=wanted_list, html=html)

from autoscraper.

NickGoto avatar NickGoto commented on August 15, 2024

Thanks, it worked

from autoscraper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.