Giter Club home page Giter Club logo

lua-resty-woothee's Introduction

Project Woothee

Project Woothee is multi-language user-agent strings parsers.

You can try it on demo site: http://woothee.github.io/

Why new project?

We needs just same logic over 2 or more programming languages, for use on various frameworks, middlewares and environments.

Most important data of this project is only single set of return values, and set of test cases, for equality of results of another languages implementations.

Implementations:

Versions

  • v1.13.0
    • Add Android OS version (Android 11~)
  • v1.12.0
    • Add Samsung Browser
    • Add Google's AdsBot
  • v1.11.1
    • Update the GSA version in testsets
  • v1.11.0
    • Add Google Search App
  • v1.10.0
    • Add Microsoft Edge (based on Chromium), Edge for iOS/Android
  • v1.9.0
    • Add Android 9
  • v1.8.0
    • Add Yandex Browser
  • v1.7.0
    • Add trendictionbot crawler
    • Add Yeti 1.1 crawler user agent tests
  • v1.6.0
    • Add Android Webview
    • Add curl HTTP library
  • v1.5.0
    • Add BingPreview crawler
  • v1.4.0
    • Add Vivaldi
  • v1.3.0
    • Add Firefox for iOS
  • v1.2.1
    • Fix testset bug for BlackBerry 10
  • v1.2.0
    • Add Microsoft Edge, Webview of mobile phones
    • Add Windows 10, BlackBerry 10
    • Add Twitterbot
  • v1.1.0
    • Add specs for blank input data
  • v1.0.1
    • Add variations of MSIE11
  • v1.0.0
    • Some of implementations are not v1 yet.

Version numbers are used as:

  • vX.Y.Z
    • X and Y is major/minor number to control specs of dataset/testsets
    • Z is patch number for each language implementation release control
    • Z is also used for bug fix control of testsets (versions of each language implementations should be bumped up too)

Implementations

SYNOPSIS

in Java: (use java/woothee.jar)

// import is.tagomor.woothee.Classifier;
// import is.tagomor.woothee.DataSet;
Map r = Classifier.parse("user agent string");

r.get("name")
// => name of browser (or string like name of user-agent)

r.get("category")
// => "pc", "smartphone", "mobilephone", "appliance", "crawler", "misc", "unknown"

r.get("os")
// => os from user-agent, or carrier name of mobile phones

r.get("version");
// => version of browser, or terminal type name of mobile phones

r.get("os_version");
// => "NT 6.3" (for Windows), "10.8.3" (for OSX), "8.0.1" (for iOS), ....

in Hive: (copy woothee.jar into your CLASSPATH, and create function)

-- add jar to classpath
add jar woothee.jar;
-- create function
CREATE TEMPORARY FUNCTION parse_agent as 'is.tagomor.woothee.hive.ParseAgent';
-- count visits of bots
SELECT parsed_agent['name'] AS botname, COUNT(*) AS cnt
FROM (
  SELECT parse_agent(user_agent) AS parsed_agent
  FROM table_name
  WHERE date='today'
) x
WHERE parsed_agent['category'] = 'crawler'
GROUP BY parsed_agent['name']
ORDER BY cnt DESC LIMIT 1000;

in Perl: (cpanm Woothee)

use Woothee;
Woothee::parse("Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0)");
# => {'name'=>"Internet Explorer", 'category'=>"pc", 'os'=>"Windows 7", 'version'=>"8.0", 'vendor'=>"Microsoft", 'os_version'=>"NT 6.1"}

in Ruby: (gem install woothee)

require 'woothee'
Woothee.parse("Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0)")
# => {:name=>"Internet Explorer", :category=>:pc, :os=>"Windows 7", :version=>"8.0", :vendor=>"Microsoft", :os_version=>"NT 6.1"}

in Python:

import woothee
woothee.parse("Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0)")
# => {'name': 'Internet Explorer', 'category': 'pc', 'os': 'Windows 7', 'version': '8.0', 'vendor': 'Microsoft'}

in Javascript(HTML, copy from release/woothee.js)

<script src="./your/own/path/woothee.js"></script>
<script>
woothee.parse('Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0)')
// => {name: 'Internet Explorer', category: 'pc', os: 'Windows 7', version: '8.0', vendor: 'Microsoft', os_version: 'NT 6.1'}
</script>

in Node.js (npm install woothee)

var woothee = require('woothee');
woothee.parse('Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0)')
// => {name: 'Internet Explorer', category: 'pc', os: 'Windows 7', version: '8.0', vendor: 'Microsoft', os_version: 'NT 6.1'}

in PHP (composer require woothee/woothee:*)

<?php
include __DIR__ . '/vendor/autoload.php';
$classifier = new \Woothee\Classifier;
$classifier->parse('Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0)');
// => ['name' => 'Internet Explorer', 'category' => 'pc', 'os' => 'Windows 7', 'version' => '8.0', 'vendor' => 'Microsoft']

in Rust:

extern crate woothee;

use woothee::parser::Parser;

fn main() {
    let parser = Parser::new();
    let result = parser.parse("Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0)");
    println!("{:?}", result);
}
// => Some(WootheeResult { name: "Internet Explorer", category: "pc", os: "Windows 7", os_version: "NT 6.1", browser_type: "UNKNOWN", version: "8.0", vendor: "Microsoft" })

Todo

  • 'mobilephone' means Japanese mobile phone groups
    • For multi-region code, domestic pattern specifier (or another mechanism) needed

FAQ


Authors

License

Copyright 2012- TAGOMORI Satoshi (tagomoris)

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

lua-resty-woothee's People

Contributors

tell-k avatar toritori0318 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

lua-resty-woothee's Issues

Parse error on UA string

On a large-load system I am seeing the following error repeatedly.

This is on an Open-Resty platform, release 1.7.10.2.

The following is what we are seeing.
Is this an issue with the Woothee data being used?

We are seeing this intermittently.

2015/07/22 12:53:25 [error] 9783#0: *2702384 lua entry thread aborted: runtime error: ...ocal/openresty/luajit/share/lua/5.1/resty/woothee/os.lua:55: attempt to index local 'match' (a nil value)
stack traceback:
coroutine 0:
...ocal/openresty/luajit/share/lua/5.1/resty/woothee/os.lua: in function 'challenge_windows'
/usr/local/openresty/luajit/share/lua/5.1/resty/woothee.lua:45: in function 'try_os'
/usr/local/openresty/luajit/share/lua/5.1/resty/woothee.lua:165: in function 'exec_parse'
/usr/local/openresty/luajit/share/lua/5.1/resty/woothee.lua:222: in function 'parse'

global lua variable

In latest openresty 1.15.8.1 there is error
2019/06/15 13:08:51 [warn] 25052#0: *23886 [lua] _G write guard:12: __newindex(): writing a global lua variable ('taget') which may lead to race conditions between concurren
t requests, so prefer the use of 'local' variables
stack traceback:
/usr/local/share/lua/5.1/resty/woothee/util.lua:6: in function 'update_map'
/usr/local/share/lua/5.1/resty/woothee/browser.lua:101: in function 'challenge_firefox'
/usr/local/share/lua/5.1/resty/woothee.lua:41: in function 'try_browser'
/usr/local/share/lua/5.1/resty/woothee.lua:176: in function 'exec_parse'
/usr/local/share/lua/5.1/resty/woothee.lua:234: in function 'parse'

I think there is typo in util.lua "taget" instead of "target"

another global variable

hi, i catch another global variable

2019/06/21 19:45:45 [warn] 21777#0: *93664160 [lua] _G write guard:12: __newindex(): writing a global lua variable ('version') which may lead to race conditions between concurrent requests, so prefer the use of 'local' variables
/usr/local/share/lua/5.1/resty/woothee/browser.lua:47: in function 'challenge_safari_chrome'
/usr/local/share/lua/5.1/resty/woothee.lua:37: in function 'try_browser'
/usr/local/share/lua/5.1/resty/woothee.lua:176: in function 'exec_parse'
/usr/local/share/lua/5.1/resty/woothee.lua:234: in function 'parse'
/opt/ngx-openresty/nginx/conf/proxy/log.lua:3: in main chunk while logging request, client: 1.2.3.4, server: , request: "GET / HTTP/1.1", upstream: "http://10.1.1.1/", host: "xxx"

Wrong browser detected for user agent

Hi,
For Android: Mozilla/5.0 (Linux; Android 7.0; SM-G950U Build/NRD90M; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/60.0.3112.107 Mobile Safari/537.36 Snapchat/10.17.0.0 (SM-G950U; Android 7.0#G950USQU1AQG4#24; gzip), the browser is being detected as Webview instead of Chrome.
Using the demo from http://woothee.github.io/ says the browser is Chrome.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.