ua-parser / uap-go Goto Github PK
View Code? Open in Web Editor NEWGo implementation of ua-parser
License: Other
Go implementation of ua-parser
License: Other
According to specification:
The list of regular-expressions regex shall be evaluated for a given user-agent string beginning with the first regex-item in the list to the last item. The first matching regex stops processing the list. Regex-matching shall be case sensitive.
Here is the proof that sorting of regexes will cause wrong detection results:
package main
import (
"fmt"
"log"
"github.com/uap-go/uaparser"
)
const (
// specificUA is matched by X and Y, X preceeds Y in regex list
specificUA = "Opera/9.80 (VRE; Opera Mini/4.2/28.2794; U; en) Presto/2.8.119 Version/11.10"
// broadUA is matched by Y
broadUA = "Opera/9.80 (Windows NT 5.1; U; ru) Presto/2.5.24 Version/10.53"
)
func main() {
sortThreshold := 100001
parser, err := uaparser.NewWithOptions("./uap-core/regexes.yaml", uaparser.EUserAgentLookUpMode, sortThreshold, 0, true, true)
if err != nil {
log.Fatal(err)
}
// specificUA is matched by X, everything is fine
beforeSort := parser.Parse(specificUA).UserAgent
// cause regexes sort by parsing broadUA many times: it will cause bubbling up of regex Y
for i := 0; i < sortThreshold; i++ {
parser.Parse(broadUA)
}
// specificUA is now matched by Y which bubbled up after sort. This causes wrong parsing results
afterSort := parser.Parse(specificUA).UserAgent
fmt.Printf("before sort:\t %#v\n after sort:\t %#v\n", beforeSort, afterSort)
}
Result:
$ go run test.go
2016-08-10 18:21:37.284461949 +0300 MSK Sorting UserAgents slice
before sort: &uaparser.UserAgent{Family:"Opera Mini", Major:"4", Minor:"2", Patch:""}
after sort: &uaparser.UserAgent{Family:"Opera", Major:"11", Minor:"10", Patch:""}
I have the following user-agent:
Mozilla/5.0 (iPhone; CPU iPhone OS 11_4 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Coast/5.04.110603 Mobile/15F79 Safari/7534.48.3
BrowserMinorVersion (UserAgent.Minor) is parsed incorrectly, it is parsed as "04" instead of "4"
Used as-is it crashes, any perquisites that are not mentioned in the readme?
panic: regexp: Compile((008|Altresium|Argus|BaiduMobaider|BoardReader|DNSGroup|DataparkSearch|EDI|Goodzer|Grub|INGRID|Infohelfer|LinkedInBot|LOOQ|Nutch|PathDefender|Peew|PostPost|Steeler|Twitterbot|VSE|WebCrunch|WebZIP|Y!J-BR[A-Z]|YahooSeeker|envolk|sproose|wminer)/(\d+)(?:\.(\d+)(?:\.(\d+))?)?(?:(?!CFNetwork).)*$
): error parsing regexp: invalid or unsupported Perl syntax: (?!
I will also file this on uap-core
In #26, it shows that each op takes ~26ms. Is that still accurate?
The current definition is:
type Client struct {
UserAgent *UserAgent
Os *Os
Device *Device
}
I think a more accurate way would be:
type Client struct {
Browser *Browser
Os *Os
Device *Device
}
This is also how it's renamed in other uap implementations like uap-c++:
How can I get "Brand" and "Model" from user-agent ? as Deivce struct only contain "Family" field.
type Device struct {
Family string
}
Thank you for providing us with a great library.
Could you please set up an official version for this wonderful program?
It would be great if you could give the version to github releases.
It would also shorten the letters written to go.mod and
Some vulnerability detection systems seem to incorrectly detect modules without a version tag, and
I would be very happy if you could support this.
#Trying to get familiar with this tool. Is there currently a way to get the language code when available ?
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
#59 Related to this PR, the regexp package is very slow.
I suggest creating an interface and changing it so that processing related to regular expressions can be injected externally as a dependency.
The use of regex.MustCompile (see:
Lines 354 to 360 in daf92ba
We pull down the master regex url daily, so that our regexes are always up to date, and load it right into our servers. The panic cause all our servers to crash last night, until we pinned the master regex url to a working version.
NewFromBytes already has the option to return an error, so why not return the error there?
Lines 209 to 219 in daf92ba
Go does not provide automatic alignment for the 32 bit system
In sync/atomic/asm_386.s we see that the 64 bit operations have the following preamble:
MOVL addr+0(FP), BP
TESTL $7, BP
JZ 2(PC)
MOVL 0, AX // crash with nil ptr deref
so that means that it will crash on arm/32 bit systems your code due to the fact it uses
atomic.LoadUint64
we can use a lock or manual alignment if you insist on using the LoadUint64
PR #12 is going to allow these in the structs but not yet implement them.
Since the submodule is using ssh, that means that wherever you do go get
that box requires an SSH public key on a Github account. This is problematic on our build server.
Instead if you used: https://github.com/ua-parser/uap-go.git
then it would just get it over https and not require any keys.
func main() {
uagent := "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_3; en-us; Silk/1.1.0-80) AppleWebKit/533.16 (KHTML, like Gecko) Version/5.0 Safari/533.16 Silk-Accelerated=true"
parser, err := uaparser.New("./regexes.yaml")
if err != nil {
log.Fatal(err)
}
client := parser.Parse(uagent)
fmt.Println(client.UserAgent.Family) // "Amazon Silk"
fmt.Println(client.UserAgent.Major) // "1"
fmt.Println(client.UserAgent.Minor) // "1"
fmt.Println(client.UserAgent.Patch) // "0-80"
fmt.Println(client.Os.Family) // "Android"
fmt.Println(client.Os.Major) // ""
fmt.Println(client.Os.Minor) // ""
fmt.Println(client.Os.Patch) // ""
fmt.Println(client.Os.PatchMinor) // ""
fmt.Println(client.Device.Family) // "Kindle Fire"
}
should not os family be mac?
@elsigh Following user-agent strings never return "iPad OS" via client.Os.Family
:
"Mozilla/5.0 (iPad; CPU OS 13_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/87.0.4280.77 Mobile/15E148 Safari/604.1"
"Mozilla/5.0 (iPad; CPU OS 11_0 like Mac OS X) AppleWebKit/604.1.34 (KHTML, like Gecko) Version/11.0 Mobile/15A5341f Safari/604.1"
"Mozilla/5.0 (iPad; CPU OS 11_2_5 like Mac OS X) AppleWebKit/604.1.34 (KHTML, like Gecko) Version/11.0 Mobile/15D60 Safari/604.1"
"Mozilla/5.0 (iPad; CPU OS 11_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/66.0.3359.122 Mobile/15E148 Safari/605.1"
"Mozilla/5.0 (iPad; CPU OS 13_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) FxiOS/15.0 Mobile/15E148 Safari/605.1.15"
Is it possible to categorise device into Tablet, Phone and Desktop? I guess the regex doesn't really support it at this point?
We have it in production and it consistently takes ~30-40ms on a t3.micro AWS instance.
That's the slowest component in a request. Is there a way to speed it up?
Hi @elsigh
The user agent is:
Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+Trident/5.0;+AppInsights)
which is clearly a bot. But it is recognized by uap-go as:
Can we fix it? Thanks.
This projects lacks a LICENSE file (aside from that ua-core). Adding one would let people use this package legally 😄
The newly introduced cache.go is giving the following errors:
ua-parser/uap-go/uaparser/cache.go:10:17: undefined: lru.ARCCache
ua-parser/uap-go/uaparser/cache.go:11:17: undefined: lru.ARCCache
ua-parser/uap-go/uaparser/cache.go:12:17: undefined: lru.ARCCache
ua-parser/uap-go/uaparser/cache.go:23:22: undefined: lru.NewARC
ua-parser/uap-go/uaparser/cache.go:27:18: undefined: lru.NewARC
ua-parser/uap-go/uaparser/cache.go:31:25: undefined: lru.NewARC
Comparing to the previous working copy, this file is the only difference. Is it included by accident?
When parsing the newly-released version of regexes.yaml
, from https://raw.githubusercontent.com/ua-parser/uap-core/master/regexes.yaml, uap-go
panics, by way of regexp.MustCompile
:
panic: regexp: Compile(`Android \d+?(?:\.\d+|)(?:\.\d+|); ([^;]+?)(?: Build|\) AppleWebKit).+?(?<! Mobile) Safari`): error parsing regexp: invalid or unsupported Perl syntax: `(?<`
goroutine 1 [running]:
regexp.MustCompile(0xc0000277a0, 0x59, 0xc00095ebe0)
/usr/local/go/src/regexp/regexp.go:311 +0x152
github.com/ua-parser/uap-go/uaparser.compileRegex(0x0, 0x0, 0xc0000277a0, 0x59, 0xc00095ebe0)
...
As regexp
in Golang doesn't support lookarounds of that form.
I see that this project has specifically uap-core
v0.6.9 as a submodule. Is this on purpose, and is v0.7.0 not supported? (Or conversely, is uap-core
going to have changes which don't work with uap-go
from now on?)
uap-core has a few updates since March which I was hoping to get by updating uap-go; namely the change to detect the new Edge Chromium version which was made in April. Mind updating to the newer version?
The Go parser fails to parse the current version of the regexes.yaml file in ua-parser/uap-core. Specifically, there are brand_replacement and model_replacement fields in the current yaml that have no equivalent in the DevicePattern struct.
Attempting to parse the yaml yields the following stack trace snippet:
panic: reflect: call of reflect.Value.SetString on zero Value
goroutine 11 [running]:
reflect.flag.mustBeAssignable(0x0)
/usr/local/Cellar/go/1.4.1/libexec/src/reflect/value.go:215 +0xa1
reflect.Value.SetString(0x0, 0x0, 0x0, 0xc208108bfa, 0x1)
/usr/local/Cellar/go/1.4.1/libexec/src/reflect/value.go:1476 +0x28
github.com/ua-parser/uap-go/uaparser.ToStruct(0xc208248000, 0x23b, 0x23b, 0x418000, 0xc208165290, 0xc208257f00)
<redacted>/src/github.com/ua-parser/uap-go/uaparser/parser.go:41 +0x2b9
github.com/ua-parser/uap-go/uaparser.func·003()
<redacted>/src/github.com/ua-parser/uap-go/uaparser/parser.go:100 +0x141
created by github.com/ua-parser/uap-go/uaparser.New
<redacted>/src/github.com/ua-parser/uap-go/uaparser/parser.go:107 +0x694
I added the following log line at line 42 in parser.go:
log.Printf("%v,%v GetExportedName(%v)=%v FieldByName=%v", key, value, key, GetExportedName(key), structVal.FieldByName(GetExportedName(key)))
and that yielded a bunch of output, the last of which indicates the missing field in the DevicePattern struct:
2015/02/03 14:26:36 regex_flag,i GetExportedName(regex_flag)=RegexFlag FieldByName=<invalid Value>
So at a minimum the fix is to add a filter for these two fields to unchoke the parser. Even better would be to extend support for these two additional fields in device_parsers.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.