Giter Club home page Giter Club logo

loginform's Introduction

https://scrapy.org/img/scrapylogo.png

Scrapy

PyPI Version Supported Python Versions Ubuntu Windows Wheel Status Coverage report Conda Version

Overview

Scrapy is a BSD-licensed fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Scrapy is maintained by Zyte (formerly Scrapinghub) and many other contributors.

Check the Scrapy homepage at https://scrapy.org for more information, including a list of features.

Requirements

  • Python 3.8+
  • Works on Linux, Windows, macOS, BSD

Install

The quick way:

pip install scrapy

See the install section in the documentation at https://docs.scrapy.org/en/latest/intro/install.html for more details.

Documentation

Documentation is available online at https://docs.scrapy.org/ and in the docs directory.

Releases

You can check https://docs.scrapy.org/en/latest/news.html for the release notes.

Community (blog, twitter, mail list, IRC)

See https://scrapy.org/community/ for details.

Contributing

See https://docs.scrapy.org/en/master/contributing.html for details.

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct.

By participating in this project you agree to abide by its terms. Please report unacceptable behavior to [email protected].

Companies using Scrapy

See https://scrapy.org/companies/ for a list.

Commercial Support

See https://scrapy.org/support/ for details.

loginform's People

Contributors

dangra avatar david-tsc avatar orthographic-pedant avatar pablohoffman avatar ruairif avatar sagelliv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

loginform's Issues

One PyPI version

Hi, I don't found in readme if loginform have one version in PyPI?

If don't, have one thing i could do for help release one version in PyPI?

If already have one version released, i could make on PR adding how install loginform with pip

thanks

Fails for https://www.worldremit.com/en/account/login

% python -m loginform -u ***** -p ***** https://www.worldremit.com/en/account/login
Traceback (most recent call last):
  File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/umairashraf/.local/share/virtualenvs/base/lib/python3.7/site-packages/loginform.py", line 105, in <module>
    sys.exit(main())
  File "/Users/umairashraf/.local/share/virtualenvs/base/lib/python3.7/site-packages/loginform.py", line 98, in main
    values, action, method = fill_login_form(args.url, r.text, args.username, args.password)
  File "/Users/umairashraf/.local/share/virtualenvs/base/lib/python3.7/site-packages/loginform.py", line 77, in fill_login_form
    form = _pick_form(doc.xpath('//form'))
  File "/Users/umairashraf/.local/share/virtualenvs/base/lib/python3.7/site-packages/loginform.py", line 42, in _pick_form
    return sorted(forms, key=_form_score, reverse=True)[0]
IndexError: list index out of range

add examples of how to use with scrapy or beautifulsoup to readme

Can we add some examples on how to use loginform with web crawlers like scrapy.py?

Maybe I'm just new to web crawling but I haven't found many working examples on how to use this with current frameworks.

This repo would benefit immensely from such examples.

About authenticity_token

Hi where do you get the authenticity_token? It seems you have got it before the example code so that you can use it in the example.

Sorry if this is a dumb question. I actually can get the authenticity_token from the source code of the "https://github.com/login" page. But it seems to become invalid soon after I get it.

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 6: ordinal not in range(128)

I get this error when running python -m loginform -u myusername -p mypassword https://armenningar.felog.is/UsersLogin.aspx

url: https://armenningar.felog.is/UsersLogin.aspx
method: POST
payload:
- __VIEWSTATE: ZNDD7u5s1fSp3w0n0GJ90WQUHDDPrxriAQ8+zk0GjcbyUmz/1zkwW+6yC/R1hLWYHXAC41a+6b9vxtdYAPhaZypXhbWLPDGCHDzJjdSSej4RnaOGDsUODpeadYNhRF7PAXdMW05l+/eAEaGRsfJViDWKeMi1IOOEqjMbumIE5y59vxI0P+4elY61LK/W1PfM6adno7Ks3FJVYP1D1sThwPQCJs8ngbHVl2FkkC4Qs24Ci1SLUzNw9sC1J8xCU9oawRdhidR5iLmJx4IIzIH+D7HjZrS2bcHY8Lo1bzeY8GKhKx0pDJOu233u1QBBslJgNLCvcH8rQbhLLEk6ZJdkDOBIg3stHxIQT3BNhG24dv4xPkSpDvxzzIEYQxnQCmYPsSWr7ZBmq4AI5Y4+mprApAmRAU0nGk5YN67SHJAQoi9CIA/jarhWB7YM/CpcjQt52iBsbYMzz02zOBFl040D6By2PJSCKN7R+eUaUYBEouj39dFtiWo+C20dzg9BSdsii7FYpDfUPDBAiMUZfrPsIS02dyguxMfwVmI+3YnVNz6XUu+dgcqegBE6c7DQsaP3TKKMT81FDQbBt3AOy8pmpCK4KlB5M0Bi+be1UfyyCy+uty//wZ4pr7lCZ7Shyj8fFqyzKGJ/9zAwBa1fo3NRge9QEBo=
- __VIEWSTATEENCRYPTED: 
- __EVENTVALIDATION: QhZVGQrl99DLoBzHMjoJ+BlHDknr/D8N11RYWQvoLWV+EIBnYlr/Bs2TtzuCpOol5aAn5/M4mE0DstB3OEffO5c+7YB/DqIMT15WQqADdcOAJ3waUAq+rMXFGidE54HUeM8RfA==
- ctl00$ContentPlaceHolder1$textUsername: myusername
- ctl00$ContentPlaceHolder1$textPassword: mypassword
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/Users/nonni/Code/norix/norix/loginform/loginform.py", line 102, in <module>
    sys.exit(main())
  File "/Users/nonni/Code/norix/norix/loginform/loginform.py", line 98, in main
    print '- {0}: {1}'.format(k, v)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 6: ordinal not in range(128)

Changing line 99 in loginform.py to print(k + ': ' + v) solves the error.

Unrecognized loginform

fails to detect this valid form:

$ python -m loginform https://www.osibatteries.com/signin.aspx
Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/daniel/src/loginform/loginform.py", line 106, in <module>
    sys.exit(main())
  File "/home/daniel/src/loginform/loginform.py", line 99, in main
    values, action, method = fill_login_form(args.url, r.text, args.username, args.password)
  File "/home/daniel/src/loginform/loginform.py", line 79, in fill_login_form
    userfield, passfield = _pick_fields(form)
  File "/home/daniel/src/loginform/loginform.py", line 58, in _pick_fields
    raise ValueError("Unrecognized login form: %s" % dict(form.inputs))
ValueError: Unrecognized login form: {'__VIEWSTATE': <InputElement 12e82b8 name='__VIEWSTATE' type='hidden'>, 'ctl00$PageContent$ctl00$ctrlRecoverPassword$UserNameContainerID$UserName': <InputElement 12e8208 name='ctl00$PageContent$ctl00$ctrlRecoverPassword$UserNameContainerID$UserName' type='text'>, 'ctl00$PageContent$ctl00$ctrlLogin$Password': <InputElement 12e8100 name='ctl00$PageContent$ctl00$ctrlLogin$Password' type='password'>, 'ctl00$PageContent$ctl00$ctrlLogin$LoginButton': <InputElement 12e81b0 name='ctl00$PageContent$ctl00$ctrlLogin$LoginButton' type='submit'>, 'ctl00$PageContent$ctl00$ctrlLogin$RememberMe': <InputElement 12e8158 name='ctl00$PageContent$ctl00$ctrlLogin$RememberMe' type='checkbox'>, '__EVENTVALIDATION': <InputElement 12e8470 name='__EVENTVALIDATION' type='hidden'>, 'ctl00$PageContent$ctl00$ctrlLogin$UserName': <InputElement 12e80a8 name='ctl00$PageContent$ctl00$ctrlLogin$UserName' type='text'>, 'ctl00$PageContent$ctl00$ctrlRecoverPassword$UserNameContainerID$btnRequestNewPassword': <InputElement 12e8260 name='ctl00$PageContent$ctl00$ctrlRecoverPassword$UserNameContainerID$btnRequestNewPassword' type='submit'>}

add support for form selector

When the form selector is known, it should be possible to supply an xpath selector to the fill_login_form method.

Incorrect login form match for https://www.meed.com/sign-in/

I get this result while running test_sample.py for https://www.meed.com/sign-in/:

[
   "http://www.meed.com/sign-in/", 
   [
      [
         [
            "referrer", 
            ""
         ], 
         [
            "security_text", 
            "USER"
         ], 
         [
            "repostcheck", 
            "I77"
         ], 
         [
            "formID", 
            "ID7797"
         ], 
         [
            "submitcheck", 
            "submitform"
         ], 
         [
            "passWord", 
            "PASS"
         ], 
         [
            "remember_pword", 
            "on"
         ]
      ], 
      "http://www.meed.com/sign-in", 
      "POST"
   ]
]

Whereas the correct result should be:

[
   "http://www.meed.com/sign-in/", 
   [
      [
         [
            "referrer", 
            ""
         ], 
         [
            "repostcheck", 
            "I22"
         ], 
         [
            "formID", 
            "ID2186"
         ], 
         [
            "submitcheck", 
            "submitform"
         ], 
         [
            "email", 
            "USER"
         ], 
         [
            "passWord", 
            "PASS"
         ], 
         [
            "remember_pword", 
            "on"
         ]
      ], 
      "http://www.meed.com/sign-in", 
      "POST"
   ]
]

yooli.com: Loginform is not working for a given website

Looks like the problem is that this website does not have form element. Do you think there is a way to handle such cases?

Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1178, in mainLoop
self.runUntilCurrent()
File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 800, in runUntilCurrent
call.func(_call.args, *_call.kw)
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 362, in callback
self._startRunCallbacks(result)
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 458, in _startRunCallbacks
self._runCallbacks()
--- ---
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks
current.result = callback(current.result, _args, *_kw)
File "/usr/lib/pymodules/python2.7/slybot/spider.py", line 119, in parse_login_page
args, url, method = fill_login_form(response.url, response.body, username, password)
File "/usr/lib/python2.7/dist-packages/loginform.py", line 74, in fill_login_form
form = _pick_form(doc.xpath('//form'))
File "/usr/lib/python2.7/dist-packages/loginform.py", line 42, in _pick_form
return sorted(forms, key=_form_score, reverse=True)[0]
exceptions.IndexError: list index out of range

lxml.html does not find form

Page: https://www.gilt.com/login

a simple xpath query "//form" using selectors founds the form, but for some reason the method forms of an lxml.html document does not.

I think we should use another code for finding forms, or maybe better, as it is a simple task, our own code, so fix is under our control.

Not all form fields are included in the posting

In this login page, for example,

https://www.beyondtherack.com/auth/login

there is the submit button element with name _submit inside the login form, which is not being included in the posting, but it is required for the login to be successful (tested)

The problem is that the lxml method form.form_values() alone does not include all the involved fields in the form. Seems it only includes the input elements.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.