Giter Club home page Giter Club logo

woothee-java's People

Contributors

dependabot[bot] avatar making avatar ryukobayashi avatar tagomoris avatar tell-k avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

woothee-java's Issues

iOS devices detected as Mac OS X devices

Woothee-java wrongly detects iOS devices as Mac OS X because of case sensitivity in the user agent parsing.
Example below is taken from a real web service getting requests from mobiles devices. Both user agents are valid user agents but only one is detected as an iOS device.

Current behaviour:

  • Mozilla/5.0 (iPhone; CPU iPhone OS 11_4_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15G77 -> iPhone
  • Mozilla/5.0 (iPhone; CPU IPhone OS 11_4_1 Like Mac OS X) AppleWebKit/605.1.15 (KHTML, Like Gecko) Mobile/15G77 -> Mac OS X

Expected behaviour:

  • Mozilla/5.0 (iPhone; CPU iPhone OS 11_4_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15G77 -> iPhone
  • Mozilla/5.0 (iPhone; CPU IPhone OS 11_4_1 Like Mac OS X) AppleWebKit/605.1.15 (KHTML, Like Gecko) Mobile/15G77 -> iPhone

Android 9 version not recognized

We noticed that the version of user agents for Android 9 wasn't parsed properly by Woothee. This because the user agent contains Android 9 and not Android 9.0.

For example, the version for the following user agent would be null instead of 9: Mozilla/5.0 (Linux; Android 9; SM-N960F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.105 Mobile Safari/537.36

We tested it with our own phones so we are sure that these are valid user agents for Android 9 devices (and not some fake user agents :) )

(A similar PR is open in the Ruby version of Woothee)

Android version is empty when FireFox Browser User Agent

I can not get Android version when FireFox Browser.

I request UserAgent to "Mozilla/5.0 (Android 7.0; Mobile; rv:50.0) Gecko/50.0 Firefox/50.0"

Result,
{name=Firefox, category=smartphone, os=Android, version=50.0, vendor=Mozilla}

not found "os_version" parameter

Unrecognized IE 11

We find about 1.6% of incoming traffic to be identified as Internet Explorer UNKNOWN.

The UA strings lookslike these:

select p.useragent, count(*) as cnt, count(distinct user_id)
from php_logs p
join woothee_useragent_dim w
   on p.useragent = w.useragent
where dt >= '20150401' and dt <= '20150430'
   and name = 'Internet Explorer' and version = 'UNKNOWN'
group by p.useragent
order by cnt desc
limit 30;

Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; LCJB; rv:11.0) like Gecko      12691436        13036
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; MASMJS; rv:11.0) like Gecko    5470585 4994
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; ASU2JS; rv:11.0) like Gecko    5052566 4996
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; MAARJS; rv:11.0) like Gecko    4481126 4056
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; MALNJS; rv:11.0) like Gecko    4194491 4110
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; NP06; rv:11.0) like Gecko      3969474 4023
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; MAAU; rv:11.0) like Gecko      3438683 3266
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; MATM; rv:11.0) like Gecko      3085392 3057
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko     2188720 2821
Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; LCJB; rv:11.0) like Gecko 1695819 1555
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; MASEJS; rv:11.0) like Gecko    1460236 1615
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; MATMJS; rv:11.0) like Gecko    1447890 1167
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; MDDCJS; rv:11.0) like Gecko    963311  620
Mozilla/5.0 (Windows NT 6.3; Trident/7.0; Touch; rv:11.0) like Gecko    927900  1599
Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; ASU2JS; rv:11.0) like Gecko       772250  821
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; Touch; LCJB; rv:11.0) like Gecko       751545  855
Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; MAARJS; rv:11.0) like Gecko       746599  617
Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; Touch; rv:11.0) like Gecko        677468  714
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; MATBJS; rv:11.0) like Gecko    522275  515
Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; MALNJS; rv:11.0) like Gecko       463573  458
Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; MASMJS; rv:11.0) like Gecko       462232  571
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; Touch; ASU2JS; rv:11.0) like Gecko     457776  848
Mozilla/5.0 (Windows NT 6.1; Trident/7.0; NP07; NP07; rv:11.0) like Gecko       385464  544
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; MALCJS; rv:11.0) like Gecko    369572  293
Mozilla/5.0 (Windows NT 6.1; Trident/7.0; MAMD; rv:11.0) like Gecko     352423  239
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; TNJB; rv:11.0) like Gecko      293674  342
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; MANM; rv:11.0) like Gecko      291762  218
Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; MAPBJS; rv:11.0) like Gecko    286097  234
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; MDDRJS; rv:11.0) like Gecko    245578  299
Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; Touch; LCJB; rv:11.0) like Gecko  226963  277

It seems to me that the problem is that the regular expression used in
https://github.com/woothee/woothee-java/blob/5a7de46936f5f6e4b0e76620bc4b991e344ff53d/src/main/java/is/tagomor/woothee/browser/MSIE.java does not allow for tokens between "Trident/7.0;" and "rv:11.0", such as "Touch;" or "MASMJS;".
The later seem to be manufacturer's codes http://www.whatismybrowser.com/developers/unknown-user-agent-fragments

Does Somebody need HiveUDF?

Honestly, I want to stop supporting the build option for Hive UDF. Its dependency problem in pom.xml is huge.
I don't use it in the workload around myself, and it's not so difficult to build your own UDF build with woothee-java if you need it.

Does someone have objectives about it?

Classifier.isCrawler misses on many bots

I tried out some Crawler UserAgent as described here:
https://www.keycdn.com/blog/web-crawlers/

Most of them failed to classify as Crawler, examples:

Mozilla/5.0 (compatible; Bingbot/2.0; +http://www.bing.com/bingbot.htm)
DuckDuckBot/1.0; (+http://duckduckgo.com/duckduckbot.html)
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
Sogou Pic Spider/3.0( http://www.sogou.com/docs/help/webmasters.htm#07)
Sogou head spider/3.0( http://www.sogou.com/docs/help/webmasters.htm#07)
Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)
Sogou Orion spider/3.0( http://www.sogou.com/docs/help/webmasters.htm#07)
Sogou-Test-Spider/4.0 (compatible; MSIE 5.5; Windows 98)
Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Exabot-Thumbnails)
ia_archiver (+http://www.alexa.com/site/help/webmasters; [email protected])

Issues generating jar file(with hive UDF)

I am getting this error while generating the jar using these two steps.

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project woothee-java: Compilation failure
[ERROR] /home/kakn01/wootheejar-test/woothee-java-master/src/main/java/is/tagomor/woothee/DataSet.java:[43,106] reached end of file while parsing
[ERROR] -> [Help 1]

  1. Install git, JDK and Maven
  2. Do mvn -P hiveudf with two -D options for hadoop-version and hive-version
    mvn package -P hiveudf -Dhadoop-version=0.23.11 -Dhive-version=0.13.0

I am assuming there is another step before the above two steps: download the source code from master, then run the "mvn" command from the root directory of the extracted source. We need a pom.xml file to run the mvn command right? Please let me know if I am wrong.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.