virustotal / gyp Goto Github PK
View Code? Open in Web Editor NEWgyp: A pure Go YARA parser
License: Apache License 2.0
gyp: A pure Go YARA parser
License: Apache License 2.0
While attempting to parse a valid yara ruleset with the latest version of master
, the following error was returned by gyp.Parse()
:
syntax error: unexpected _HEX_NUMBER_, expecting _NUMBER_
The relevant snippet of the line in the ruleset where the error occurred is as follows:
xor(0x01-0xff)
I've verified that no error is returned in v0.9.0
, so I believe the bug was introduced in the latest commit, when the new _HEX_NUMBER_
and _OCT_NUMBER_
tokens introduced an inconsistency with the official libyara grammar, and the xor string modifiers definition was not updated.
When using y2j to generate a json from a yara file and then j2y to convert the json back to a yara file an exception (panic) is generated.
This happens if a rule contains either filesize or entrypoint in the condition. It seems to be a problem in the protobuf (de)marshalling of the condition.
panic: unexpected node "*pb.Expression_Keyword"
goroutine 1 [running]:
github.com/VirusTotal/gyp/ast.expressionFromProto(0xc000108dc0, 0x2, 0x2)
/home/wayrick/code/external/VirusTotal/gyp/ast/serialization.go:475 +0x11cb
github.com/VirusTotal/gyp/ast.expressionsFromProto(0xc000163bd8, 0x2, 0x2, 0x400, 0x7f4b75086e00, 0x20300000000000)
/home/wayrick/code/external/VirusTotal/gyp/ast/serialization.go:381 +0x75
github.com/VirusTotal/gyp/ast.createOperationExpression(0x6b9cce, 0x1, 0xc000163bd8, 0x2, 0x2, 0xc000000180, 0xc000163ba8)
/home/wayrick/code/external/VirusTotal/gyp/ast/serialization.go:113 +0x4c
github.com/VirusTotal/gyp/ast.binaryExpressionFromProto(0xc000108c80, 0xc0000662a0, 0x90d7c0)
/home/wayrick/code/external/VirusTotal/gyp/ast/serialization.go:374 +0x4b4
github.com/VirusTotal/gyp/ast.expressionFromProto(0xc000108c00, 0x0, 0x0)
/home/wayrick/code/external/VirusTotal/gyp/ast/serialization.go:469 +0xe7b
github.com/VirusTotal/gyp/ast.RuleFromProto(0xc0001983f0, 0x1)
/home/wayrick/code/external/VirusTotal/gyp/ast/serialization.go:44 +0x25d
github.com/VirusTotal/gyp/ast.RuleSetFromProto(0xc000140c40, 0xd)
/home/wayrick/code/external/VirusTotal/gyp/ast/serialization.go:16 +0x94
main.main()
/home/wayrick/code/external/VirusTotal/gyp/cmd/j2y/main.go:49 +0x251
These are the 2 minimal rules that (either one) will generate the error:
rule test_entrypoint {
condition:
entrypoint > 0
}
rule test_filesize {
condition:
filesize > 0
}
It seems to only be tied to filesize and entrypoint.
The parser is handling multi-line comments inside hex strings incorrectly. Let's use the following rule as an example:
rule TEST {
strings:
$ = {
01 [5] /* comment 1 */
02 /* comment 2 */
}
condition:
all of them
}
The rule is correct, but the parser is returning the following error:
unexpected RBRACE, expecting BYTE or MASKED_BYTE or LBRACKET or LPARENS
This is because the once the parser finds the opening /*
for the first comment, it greedily consumes all the characters until the closing */
in the second comment. This causes the 02
byte to fall inside the comment, so the parser is actually seeing the string as: { 01 [5] }
, which is syntactically invalid.
Heya, parsing and then writing yara rules with multiple imports is broken. There is no newline emitted at the end of an import, leading to outputs such as this:
import "pe"import "elf"
Pull request is incoming. :)
After parsing a Yara Ruleset into a ast.RuleSet is there a way to manipulate a rule's condition?
Take this sample ruleset
rule rule_1 {
strings:
$header_v2 = {50 02}
$header_v3 = {50 03}
$header_v4 = {50 04}
$header_v5 = {50 05}
condition: (
for any of them: ($ at 0)
)
}
rule rule_2 {
strings:
$pattern_v4_4 = {8c 05 65 23 61 6c [0-32] 90}
$pattern_v4_5 = {8c 06 65 52 65 63 [0-32] 90}
condition:
any of them
}
I would like to make rule_2
's condition rule_1 and any of them
however I can't find an ast.Condition
or any similar struct to do this. Here is the code I have so far for getting the rules condition. Now that I have it is there a way to modify it?
r2 := r.Rules[1]
for _, cond := range r2.Condition.Children() {
fmt.Println("Condition", cond)
}
Are there plans to add additional functionality to the lib, so that eventually a full golang implementation exists?
When this rule is parsed with Gyp, it returns a rule with an nil
condition, no syntax error is returned as it should.
rule foo {
strings:
$a = "foo"
condition:
for all i in (0..(filesize - 10) : ($a at i))
}
Something is wrong with escape sequences. When transforming a yara rule via gyp (i.e. test.yara -> y2j -> test.json -> j2y -> test_out.yara) a TextString containing escape sequences is changed and can become broken.
Most basic example:
$ cat test.yara
rule test {
strings:
$s = "\""
condition:
$s
}
$ ./y2j -o test.json test.yara
$ ./j2y -o test_out.yara test.json
$ cat test_out.yara
rule test {
strings:
$s = "\"
condition:
$s
}%
In the output yara rule the string is modified and broken (no ending quotation mark).
Over the past year I’ve been using GYP fairly heavily to help manage a large YARA ruleset. One of the things I needed to do was to:
So I wrote code to accomplish both these tasks. I believe this code can be useful to others (I’ve been asked about it by multiple people) so I’d like to open source it. I think it would make a good addition to GYP, but I’d like to get feedback and see if the community thinks it should be added to GYP. Is this something that makes sense to move into GYP?
At a high level my implementation plan (very open to feedback) is to create a new dir at the root of GYP called utils
. Within this dir I’d add yara_diff.go
and yara_dependency_walker.go
. I’d also write two simple CLI go programs to serve as examples of how to use them (maybe throw them in an examples
dir).
The purpose of this code is to return a list of dependent rules for a given rule. This code takes in a ruleset and a list of rule identifiers to get dependencies for. For each identifier the code will recursively get the the rules that it depends on.
This code will take in two YARA rulesets and return a data structure that shows if a rule was modified, deleted, added, or is the same.
I had this same issue with yara-parser, in which serialization to JSON and then deserialization back to YARA produces a different ruleset.
Consider this rule:
rule test {
condition:
true or (false and true)
}
It serializes to this:
{
"rules": [
{
"modifiers": {
},
"identifier": "test",
"tags": [
],
"meta": [
],
"strings": [
],
"condition": {
"orExpression": {
"terms": [
{
"boolValue": true
},
{
"andExpression": {
"terms": [
{
"boolValue": false
},
{
"boolValue": true
}
]
}
}
]
}
}
}
]
}
It reserializes back to this:
rule test {
condition:
true or false and true
}
The groupings on the primary expressions seem to get lost. The easiest way to handle all of it seemed to be to throw parentheses around everything, but that makes formatting kind of ugly
In the process of working with your library I found out that when you parse a ruleset from string and then, at some point, you want to serialize one of the parsed rules back to string, numbers in condition section will be converted to decimal form, e.g:
rule := ruleset.Rules[0]
newRuleset := &pb.RuleSet{
Rules: []*pb.Rule{rule.AsProto()},
}
buf := &bytes.Buffer{}
serializer := gyp.NewSerializer(buf)
serializer.Serialize(newRuleset)
as a result, for condition like: uint16 ( 0 ) == 0x5a4d and filesize < 40KB
I got uint16(0) == 23117 and filesize < 40960
.
Is it possible to add some kind of flag or something to save condition section as string rather than as Expression? Or maybe there's any other way to get parsed rule as a string?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.