Giter Club home page Giter Club logo

user_agent's People

Contributors

cclauss avatar decaz avatar immerrr avatar jmb0z avatar lorien avatar pawelmhm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

user_agent's Issues

Question: what is build_id?

It's calculated in get_firefox_build() but i don't get why we need it

def get_firefox_build():
build_ver, date_from = randomizer.choice(FIREFOX_VERSION)
try:
idx = FIREFOX_VERSION.index((build_ver, date_from))
_, date_to = FIREFOX_VERSION[idx + 1]
except IndexError:
date_to = date_from + timedelta(days=1)
sec_range = (date_to - date_from).total_seconds() - 1
build_rnd_time = (
date_from + timedelta(seconds=randomizer.randint(0, int(sec_range)))
)
return build_ver, build_rnd_time.strftime('%Y%m%d%H%M%S')

Firefox template doesn't mention buid_id

'firefox': (
'Mozilla/5.0'
' ({system[ua_platform]}; rv:{app[build_version]})'
' Gecko/{app[geckotrail]}'
' Firefox/{app[build_version]}'
),

CHROME_BUILD is out dated

Can you update new CHROME_BUILD?

I think it is out dated, some website refuse current CHROME_BUILD your lib

add some argument to generate_user_agent() to generate only recent versions of browsers

Seems like some websites are not happy with outdated user agents, e.g.

curl -vk "https://www.jdsports.co.uk/" -H "User-Agent:  Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1910.0 Safari/537.36" -H "Accept-Language: en-US,en;q=0.8,pl;q=0.6" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" -H "Connection: keep-alive" --compressed > /dev/null

results in

> GET / HTTP/1.1
> Host: www.jdsports.co.uk
> Accept-Encoding: deflate, gzip
> User-Agent:  Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1910.0 Safari/537.36
> Accept-Language: en-US,en;q=0.8,pl;q=0.6
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
> Connection: keep-alive
> 
< HTTP/1.1 403 Forbidden
* Server AkamaiGHost is not blacklisted

This uses very old user agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1910.0 Safari/537.36.

If I "upgrade" my user agent to most recent chrome I get 200 OK:

curl -vk "https://www.jdsports.co.uk/" -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36" -H "Accept-Language: en-US,en;q=0.8,pl;q=0.6" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" -H "Connection: keep-alive" --compressed > /dev/null

response is:

GET / HTTP/1.1
> Host: www.jdsports.co.uk
> Accept-Encoding: deflate, gzip
> User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
> Accept-Language: en-US,en;q=0.8,pl;q=0.6
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
> Connection: keep-alive
> 
< HTTP/1.1 200 OK
< Content-Encoding: gzip

This is not first time I'm seeing behavior like this. There are also websites that do user agent checks server side and show some irritating messages to users about necessity of upgrading browsers. They should do this with JavaScript, but sometimes they do it server side and return different responses, which can confuse spider.

My idea is to either: limit number of ua versions to most recent by default. Or just add some argument that will specify some range of version numbers to go back to, e.g. range(5) would return only 5 most recent versions of Chrome.

Updating base.py

Im trying to update the Chrome Build version on the base.py
But Im having code errors, can you tell me please what this number means?

image

for windows not work

PACKAGE_DIR, 'data/smartphone_dev_id.json')))


SMARTPHONE_DEV_IDS = json.load(open(os.path.join(
    PACKAGE_DIR, 'data/smartphone_dev_id.json')))

this path on windows didnt exists

correct way:

SMARTPHONE_DEV_IDS = json.load(open(os.path.join(
    PACKAGE_DIR, 'data','smartphone_dev_id.json')))

Deprecation warning

randint uses randrange which was changed within Python version 3.10 and passing non-integer values is deprecated since now:

DeprecationWarning: non-integer arguments to randrange() have been deprecated since Python 3.10 
and will be removed in a subsequent version

So sec_range variable here should be casted to the int type:

sec_range = (date_to - date_from).total_seconds() - 1
build_rnd_time = (
date_from + timedelta(seconds=randomizer.randint(0, sec_range))
)

Broken `platfrom`

Just compare how it works in the initial commit (my first version) and now - it returns incorrect results

ua -e -n chrome -o linux

{
   ...
  "platform": "X11; Linux x86_64", 
   ....
}

When it should be Linux x86_64. The same thing with other OS.
For now the platform key of JS navigator returned by this lib is wrong.

Chrome versions outdated

Since the package doesn't pull in outside data, how often will the version numbers be updated? I understand forking may be a valid strategy, but maybe not. For example, in base.py, I see these tuples:

FIREFOX_VERSION = (
('45.0', datetime(2016, 3, 8)),
('46.0', datetime(2016, 4, 26)),
('47.0', datetime(2016, 6, 7)),
('48.0', datetime(2016, 8, 2)),
('49.0', datetime(2016, 9, 20)),
('50.0', datetime(2016, 11, 15)),
('51.0', datetime(2017, 1, 24)),
)
CHROME_BUILD = (
(49, 2623, 2660), # 2016-03-02
(50, 2661, 2703), # 2016-04-13
(51, 2704, 2742), # 2016-05-25
(52, 2743, 2784), # 2016-07-20
(53, 2785, 2839), # 2016-08-31
(54, 2840, 2882), # 2016-10-12
(55, 2883, 2923), # 2016-12-01
(56, 2924, 2986), # 2016-12-01
)
IE_VERSION = (
# (numeric ver, string ver, trident ver) # release year
(8, 'MSIE 8.0', '4.0'), # 2009
(9, 'MSIE 9.0', '5.0'), # 2011
(10, 'MSIE 10.0', '6.0'), # 2012
(11, 'MSIE 11.0', '7.0'), # 2013
)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.