Giter Club home page Giter Club logo

uap-go's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

uap-go's Issues

Regexes sorting is incorrect

According to specification:

The list of regular-expressions regex shall be evaluated for a given user-agent string beginning with the first regex-item in the list to the last item. The first matching regex stops processing the list. Regex-matching shall be case sensitive.

Here is the proof that sorting of regexes will cause wrong detection results:

package main

import (
    "fmt"
    "log"

    "github.com/uap-go/uaparser"
)

const (
    // specificUA is matched by X and Y, X preceeds Y in regex list
    specificUA = "Opera/9.80 (VRE; Opera Mini/4.2/28.2794; U; en) Presto/2.8.119 Version/11.10"
    // broadUA is matched by Y
    broadUA    = "Opera/9.80 (Windows NT 5.1; U; ru) Presto/2.5.24 Version/10.53"
)

func main() {
    sortThreshold := 100001
    parser, err := uaparser.NewWithOptions("./uap-core/regexes.yaml", uaparser.EUserAgentLookUpMode, sortThreshold, 0, true, true)
    if err != nil {
        log.Fatal(err)
    }

    // specificUA is matched by X, everything is fine
    beforeSort := parser.Parse(specificUA).UserAgent

    // cause regexes sort by parsing broadUA many times: it will cause bubbling up of regex Y
    for i := 0; i < sortThreshold; i++ {
        parser.Parse(broadUA)
    }

    // specificUA is now matched by Y which bubbled up after sort. This causes wrong parsing results
    afterSort := parser.Parse(specificUA).UserAgent
    fmt.Printf("before sort:\t %#v\n after sort:\t %#v\n", beforeSort, afterSort)
}

Result:

$ go run test.go 
2016-08-10 18:21:37.284461949 +0300 MSK Sorting UserAgents slice
before sort:     &uaparser.UserAgent{Family:"Opera Mini", Major:"4", Minor:"2", Patch:""}
 after sort:     &uaparser.UserAgent{Family:"Opera", Major:"11", Minor:"10", Patch:""}

Incorrect UserAgent.Minor

I have the following user-agent:
Mozilla/5.0 (iPhone; CPU iPhone OS 11_4 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Coast/5.04.110603 Mobile/15F79 Safari/7534.48.3
BrowserMinorVersion (UserAgent.Minor) is parsed incorrectly, it is parsed as "04" instead of "4"

Test crashes.

Used as-is it crashes, any perquisites that are not mentioned in the readme?

regexes.yaml appears to be broken at HEAD with the go implementation

panic: regexp: Compile((008|Altresium|Argus|BaiduMobaider|BoardReader|DNSGroup|DataparkSearch|EDI|Goodzer|Grub|INGRID|Infohelfer|LinkedInBot|LOOQ|Nutch|PathDefender|Peew|PostPost|Steeler|Twitterbot|VSE|WebCrunch|WebZIP|Y!J-BR[A-Z]|YahooSeeker|envolk|sproose|wminer)/(\d+)(?:\.(\d+)(?:\.(\d+))?)?(?:(?!CFNetwork).)*$): error parsing regexp: invalid or unsupported Perl syntax: (?!

I will also file this on uap-core

Benchmarks

In #26, it shows that each op takes ~26ms. Is that still accurate?

I would like to see a version set up.

Thank you for providing us with a great library.

Could you please set up an official version for this wonderful program?

It would be great if you could give the version to github releases.

It would also shorten the letters written to go.mod and
Some vulnerability detection systems seem to incorrectly detect modules without a version tag, and
I would be very happy if you could support this.

Language support

#Trying to get familiar with this tool. Is there currently a way to get the language code when available ?

Stop using MustCompile, and instead return an error if any regex fails to compile

The use of regex.MustCompile (see:

uap-go/uaparser/parser.go

Lines 354 to 360 in daf92ba

func compileRegex(flags, expr string) *regexp.Regexp {
if flags == "" {
return regexp.MustCompile(expr)
} else {
return regexp.MustCompile(fmt.Sprintf("(?%s)%s", flags, expr))
}
}
) means that if any regex fails, there is a panic.

We pull down the master regex url daily, so that our regexes are always up to date, and load it right into our servers. The panic cause all our servers to crash last night, until we pinned the master regex url to a working version.

NewFromBytes already has the option to return an error, so why not return the error there?

uap-go/uaparser/parser.go

Lines 209 to 219 in daf92ba

func NewFromBytes(data []byte) (*Parser, error) {
var definitions RegexesDefinitions
if err := yaml.Unmarshal(data, &definitions); err != nil {
return nil, err
}
parser := &Parser{definitions, 0, 0, 0, (EOsLookUpMode|EUserAgentLookUpMode|EDeviceLookUpMode), false, false}
parser.mustCompile()
return parser, nil
}

32 Bit systems

Go does not provide automatic alignment for the 32 bit system
In sync/atomic/asm_386.s we see that the 64 bit operations have the following preamble:

MOVL    addr+0(FP), BP
TESTL   $7, BP
JZ      2(PC)
MOVL    0, AX // crash with nil ptr deref

so that means that it will crash on arm/32 bit systems your code due to the fact it uses

atomic.LoadUint64

we can use a lock or manual alignment if you insist on using the LoadUint64

Submodules is using ssh instead of https

Since the submodule is using ssh, that means that wherever you do go get that box requires an SSH public key on a Github account. This is problematic on our build server.
Instead if you used: https://github.com/ua-parser/uap-go.git then it would just get it over https and not require any keys.

why in the sample code the os info is android?

func main() {
  uagent := "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_3; en-us; Silk/1.1.0-80) AppleWebKit/533.16 (KHTML, like Gecko) Version/5.0 Safari/533.16 Silk-Accelerated=true"

  parser, err := uaparser.New("./regexes.yaml")
  if err != nil {
    log.Fatal(err)
  }

  client := parser.Parse(uagent)

  fmt.Println(client.UserAgent.Family)  // "Amazon Silk"
  fmt.Println(client.UserAgent.Major)   // "1"
  fmt.Println(client.UserAgent.Minor)   // "1"
  fmt.Println(client.UserAgent.Patch)   // "0-80"
  fmt.Println(client.Os.Family)         // "Android"
  fmt.Println(client.Os.Major)          // ""
  fmt.Println(client.Os.Minor)          // ""
  fmt.Println(client.Os.Patch)          // ""
  fmt.Println(client.Os.PatchMinor)     // ""
  fmt.Println(client.Device.Family)     // "Kindle Fire"
}

should not os family be mac?

Unable to recognize iPad OS

@elsigh Following user-agent strings never return "iPad OS" via client.Os.Family:

"Mozilla/5.0 (iPad; CPU OS 13_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/87.0.4280.77 Mobile/15E148 Safari/604.1"

"Mozilla/5.0 (iPad; CPU OS 11_0 like Mac OS X) AppleWebKit/604.1.34 (KHTML, like Gecko) Version/11.0 Mobile/15A5341f Safari/604.1"

"Mozilla/5.0 (iPad; CPU OS 11_2_5 like Mac OS X) AppleWebKit/604.1.34 (KHTML, like Gecko) Version/11.0 Mobile/15D60 Safari/604.1"

"Mozilla/5.0 (iPad; CPU OS 11_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/66.0.3359.122 Mobile/15E148 Safari/605.1"

"Mozilla/5.0 (iPad; CPU OS 13_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) FxiOS/15.0 Mobile/15E148 Safari/605.1.15"

Parser is very slow

We have it in production and it consistently takes ~30-40ms on a t3.micro AWS instance.
That's the slowest component in a request. Is there a way to speed it up?

Missing license file

This projects lacks a LICENSE file (aside from that ua-core). Adding one would let people use this package legally 😄

cache.go causing issue

The newly introduced cache.go is giving the following errors:

ua-parser/uap-go/uaparser/cache.go:10:17: undefined: lru.ARCCache
ua-parser/uap-go/uaparser/cache.go:11:17: undefined: lru.ARCCache
ua-parser/uap-go/uaparser/cache.go:12:17: undefined: lru.ARCCache
ua-parser/uap-go/uaparser/cache.go:23:22: undefined: lru.NewARC
ua-parser/uap-go/uaparser/cache.go:27:18: undefined: lru.NewARC
ua-parser/uap-go/uaparser/cache.go:31:25: undefined: lru.NewARC

Comparing to the previous working copy, this file is the only difference. Is it included by accident?

Parser panics on the latest version of `regexes.yaml` from `uap-core` (v0.7.0)

When parsing the newly-released version of regexes.yaml, from https://raw.githubusercontent.com/ua-parser/uap-core/master/regexes.yaml, uap-go panics, by way of regexp.MustCompile:

panic: regexp: Compile(`Android \d+?(?:\.\d+|)(?:\.\d+|); ([^;]+?)(?: Build|\) AppleWebKit).+?(?<! Mobile) Safari`): error parsing regexp: invalid or unsupported Perl syntax: `(?<`
	goroutine 1 [running]:
	regexp.MustCompile(0xc0000277a0, 0x59, 0xc00095ebe0)
	/usr/local/go/src/regexp/regexp.go:311 +0x152
	github.com/ua-parser/uap-go/uaparser.compileRegex(0x0, 0x0, 0xc0000277a0, 0x59, 0xc00095ebe0)
	...

As regexp in Golang doesn't support lookarounds of that form.

I see that this project has specifically uap-core v0.6.9 as a submodule. Is this on purpose, and is v0.7.0 not supported? (Or conversely, is uap-core going to have changes which don't work with uap-go from now on?)

bump uap-core to get newer detections

uap-core has a few updates since March which I was hoping to get by updating uap-go; namely the change to detect the new Edge Chromium version which was made in April. Mind updating to the newer version?

Parser fails to parse device_parsers.brand_replacement/model_replacement

The Go parser fails to parse the current version of the regexes.yaml file in ua-parser/uap-core. Specifically, there are brand_replacement and model_replacement fields in the current yaml that have no equivalent in the DevicePattern struct.

Attempting to parse the yaml yields the following stack trace snippet:

panic: reflect: call of reflect.Value.SetString on zero Value

goroutine 11 [running]:
reflect.flag.mustBeAssignable(0x0)
        /usr/local/Cellar/go/1.4.1/libexec/src/reflect/value.go:215 +0xa1
reflect.Value.SetString(0x0, 0x0, 0x0, 0xc208108bfa, 0x1)
        /usr/local/Cellar/go/1.4.1/libexec/src/reflect/value.go:1476 +0x28
github.com/ua-parser/uap-go/uaparser.ToStruct(0xc208248000, 0x23b, 0x23b, 0x418000, 0xc208165290, 0xc208257f00)
        <redacted>/src/github.com/ua-parser/uap-go/uaparser/parser.go:41 +0x2b9
github.com/ua-parser/uap-go/uaparser.func·003()
        <redacted>/src/github.com/ua-parser/uap-go/uaparser/parser.go:100 +0x141
created by github.com/ua-parser/uap-go/uaparser.New
        <redacted>/src/github.com/ua-parser/uap-go/uaparser/parser.go:107 +0x694

I added the following log line at line 42 in parser.go:

log.Printf("%v,%v GetExportedName(%v)=%v FieldByName=%v", key, value, key, GetExportedName(key), structVal.FieldByName(GetExportedName(key)))

and that yielded a bunch of output, the last of which indicates the missing field in the DevicePattern struct:

2015/02/03 14:26:36 regex_flag,i GetExportedName(regex_flag)=RegexFlag FieldByName=<invalid Value>

So at a minimum the fix is to add a filter for these two fields to unchoke the parser. Even better would be to extend support for these two additional fields in device_parsers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.