Comments (8)
I can't read the subject string.
If the subject string is "## in expression ##", the pattern match it.
kosako@VirtualBox:~/tmp$ gcc -Wall a.c -o a -lonig
kosako@VirtualBox:~/tmp$ ./a
match at 0
0: (0-3)
1: (0-2)
#include <stdio.h>
#include <string.h>
#include "oniguruma.h"
extern int main(int argc, char* argv[])
{
int r;
unsigned char *start, *range, *end;
regex_t* reg;
OnigErrorInfo einfo;
OnigRegion *region;
static UChar* pattern = (UChar* )"(?:^|\\G)(#{1,6})\\s*(?=[\\S[^#]])";
static UChar* str = (UChar* )"## in expression ##";
//static UChar* str = (UChar* )"##";
r = onig_new(®, pattern, pattern + strlen((char* )pattern),
ONIG_OPTION_DEFAULT, ONIG_ENCODING_UTF8, ONIG_SYNTAX_DEFAULT, &einfo);
if (r != ONIG_NORMAL) {
char s[ONIG_MAX_ERROR_MESSAGE_LEN];
onig_error_code_to_str(s, r, &einfo);
fprintf(stderr, "ERROR: %s\n", s);
return -1;
}
region = onig_region_new();
end = str + strlen((char* )str);
start = str;
range = end;
r = onig_search(reg, str, end, start, range, region, ONIG_OPTION_NONE);
if (r >= 0) {
int i;
fprintf(stderr, "match at %d\n", r);
for (i = 0; i < region->num_regs; i++) {
fprintf(stderr, "%d: (%d-%d)\n", i, region->beg[i], region->end[i]);
}
}
else if (r == ONIG_MISMATCH) {
fprintf(stderr, "search fail\n");
}
else { /* error */
char s[ONIG_MAX_ERROR_MESSAGE_LEN];
onig_error_code_to_str(s, r);
fprintf(stderr, "ERROR: %s\n", s);
return -1;
}
onig_region_free(region, 1 /* 1:free self, 0:free contents only */);
onig_free(reg);
onig_end();
return 0;
}
from oniguruma.
Hi, thanks for the quick response.
The subject string is
"## This is *great* stuff"
but without the quotes.
I am running version 6.1.2 with ONIG_SYNTAX_DEFAULT and ONIG_OPTION_CAPTURE_GROUP and am unable to match, though the same pattern and subject string match on Atom Node Oniguruma and on www.rubular.com
Is there perhaps a difference of default options between the two?
from oniguruma.
ONIG_OPTION_CAPTURE_GROUP has no effect for the pattern because it doesn't include named group.
At least in my environment, the string and pattern matches for ONIG_OPTION_CAPTURE_GROUP.
static UChar* pattern = (UChar* )"(?:^|\\G)(#{1,6})\\s*(?=[\\S[^#]])";
static UChar* str = (UChar* )"## This is *great* stuff";
r = onig_new(®, pattern, pattern + strlen((char* )pattern),
ONIG_OPTION_CAPTURE_GROUP,
ONIG_ENCODING_UTF8, ONIG_SYNTAX_DEFAULT, &einfo);
from oniguruma.
Being that it matches for you and your example in UTF8, but not for me in UTF16LE, could this be an encoding specific issue? I'll try converting my code to UTF8 and see if that mitigates the issue. By the way, if I remove the nested negative character match [^#] from the regex, it starts matching.
Let me see if I can do a better job of isolating my repro.
from oniguruma.
Maybe the following change fixes the problem:
k-takata/Onigmo@609d0ea
from oniguruma.
That's a great suggestion, but it didn't seem to do the trick. This commit however looks related and appeared to fixe the bug for me: k-takata/Onigmo@b5fec0b
from oniguruma.
It is a bug for UTF-16.
I want to fix it, but I don't have the time now.
from oniguruma.
Thank you.
Fixed by PR #39.
from oniguruma.
Related Issues (20)
- add oniguruma-devel to centos8.0+ HOT 2
- Dynamic library generated failed to support "(?-i)" HOT 3
- oniguruma not able to installed on PPC64le architecture HOT 3
- `.{0,99}` and `.*` behave differently on short input HOT 2
- mbc_enc_len(const UChar* p) to be improved HOT 4
- Does Oniguruma support POSIX collating symbols or equivalence classes? HOT 1
- New version?🤔 HOT 1
- Does "retry-limit-in-match over" mean NOT MATCH? HOT 23
- ONIG_SYN_CONTEXT_INDEP_REPEAT_OPS not working for ^* pattern HOT 1
- Literal escaped braces HOT 2
- what is escape code for `.` (literal dot)? HOT 1
- onig_match() can return wrong results with lookbehind at the end of the string HOT 2
- ONIG_SYNTAX_EMACS doesn't support "shy groups"
- With ONIG_SYNTAX_DEFAULT pattern '(?s)ABC' fails with 'undefined group option'. HOT 12
- Error 'unmatched range specifier in char-class' with hyphen in character class. HOT 1
- typos HOT 1
- :alpha: character class matches "non-letter" unicode characters. HOT 4
- Sample pattern and content to trigger error in onig_search? HOT 2
- Invalid utf8 sequence matches the pattern while it probably should not. HOT 1
- Test files ( 'e' function )
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from oniguruma.