Giter Club home page Giter Club logo

javaverbalexpressions's Introduction

JavaVerbalExpressions

release Maven Central Coverage Status

VerbalExpressions is a Java library that helps to construct difficult regular expressions.

Getting Started

Maven Dependency:

<dependency>
  <groupId>ru.lanwen.verbalregex</groupId>
  <artifactId>java-verbal-expressions</artifactId>
  <version>1.8</version>
</dependency>

You can use SNAPSHOT dependency with adding to pom.xml:

<repositories>
  <repository>
    <id>ossrh</id>
    <url>https://oss.sonatype.org/content/repositories/snapshots</url>
  </repository>
</repositories>

Examples

VerbalExpression testRegex = VerbalExpression.regex()
                                                .startOfLine().then("http").maybe("s")
	           				.then("://")
	           				.maybe("www.").anythingBut(" ")
	           				.endOfLine()
	           				.build();

// Create an example URL
String url = "https://www.google.com";

// Use VerbalExpression's testExact() method to test if the entire string matches the regex
testRegex.testExact(url); //True

testRegex.toString(); // Outputs the regex used:
                      // ^(?:http)(?:s)?(?:\:\/\/)(?:www\.)?(?:[^\ ]*)$
VerbalExpression testRegex = VerbalExpression.regex()
                                                .startOfLine().then("abc").or("def")
                                                .build();

String testString = "defzzz";

//Use VerbalExpression's test() method to test if parts if the string match the regex
testRegex.test(testString);       // true
testRegex.testExact(testString);  // false
testRegex.getText(testString);    // returns: def

Builder can be cloned:

VerbalExpression regex = regex(regex().anything().addModifier('i')).endOfLine().build();

Or can be used in another regex:

VerbalExpression.Builder digits = regex().capt().digit().oneOrMore().endCapt().tab();
VerbalExpression regex2 = regex().add(digits).add(digits).build();

Feel free to use any predefined char groups:

regex().wordChar().nonWordChar()
   .space().nonSpace()
   .digit().nonDigit()

Define captures:

String text = "aaabcd";
VerbalExpression regex = regex()
                .find("a")
                .capture().find("b").anything().endCapture().then("cd").build();

regex.getText(text)     // returns "abcd"
regex.getText(text, 1)  // returns "b"

More complex examples

Other implementations

You can view all implementations on VerbalExpressions.github.io

[ Javascript - PHP - Python - C# - Objective-C - Ruby - Groovy - Haskell - C++ - ... (moarr) ]

Project released with travis

With help of this tutorial: https://dracoblue.net/dev/uploading-snapshots-and-releases-to-maven-central-with-travis/

javaverbalexpressions's People

Contributors

bowbahdoe avatar bryant1410 avatar dependabot[bot] avatar devnied avatar dukky avatar es avatar go-oleg avatar gufengwyx8 avatar jcputney avatar joe-mojo avatar jvimr avatar lanwen avatar masih avatar maurice-betzel avatar maurodec avatar mgramin avatar mihai-vlc avatar msknapp avatar myjian avatar pathikrit avatar seblm avatar simschla avatar spencercjh avatar tavio avatar tdmitriy avatar vyacheslav-pushkin avatar zawataki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

javaverbalexpressions's Issues

Inefficient implementation

The current implementation (2289a9b) is really inefficient as the pattern is recompiled after each chained method call.

This is partly due to a flaw in the original JS API that does not follow properly the builder pattern. In Java/Scala, a proper builder class should be designed instead and return a java.util.regex.Pattern object.

Javadoc ?

Is there some documentation (JavaDoc) available somewhere ? I do realize that the api is probably pretty obvious and self-documentimg, but some docs would be nice.

What Testing Framework to use?

I'm planning on porting the tests from the JS implemntation to this one and I'm thinking we should use JUnit just to stay within the same family of testing frameworks. What's everyone's opinion on that?

zeroOrMore() and oneOrMore() do not generat correct results

The expression:

regex().add("Java").zeroOrMore().build();

should generate the regex: /(?:Java)*/, just as the expression:

regex().maybe("Java").build();

generates the regex: /(?:Java)?/. However, the first expression actually generates the regex: /Java*/, which is not correct. The oneOrMore() method suffers from the similar problem.

I propose to change both zeroOrMore() and oneOrMore() to accept a string, much as the maybe() method does so it can return the correct result.

lineBreak() has a capturing group

Hello,
The method lineBreak() contains a capturing group, which results in shifting of group numbers; the following test fails:

    final VerbalExpression expression = VerbalExpression.regex().
                        lineBreak()
                       .capture().find("some").endCapture().then(" text")
                       .build();
        assertEquals("some", expression.getText("\nsome text", 1));

Cheers

What is the range function supposed to do?

Currently the range function seems to only allow Integer ranges, though the javascript version seems to allow characters: https://github.com/VerbalExpressions/JSVerbalExpressions/wiki/.range()

In order to add the regex for charcters "0-9A-Za-z." I had to use add("[0-9A-Za-z.]+") but should this be possible with the range function if it worked as the javascript one does? I also understand that the class is pending a large pull request currently so it might be already fixed in that?

or() with capture()

It seems to me that the following expression:

VerbalExpression testRegex = regex()
                .capture()
                .find("abc")
                .or("def")
                .build();

should evaluate to the regex: /(?:((?:abc)|(?:(?:def))))/, but instead it evaluates to: /(?:((?:abc))|(?:(?:def)))/. As a result, if we test this expression against the string "def" it produces a match but no group is captured. This happens because the implementation of or() closes any opening parenthesis it finds in the string that precedes it.

I don't know why or() was implemented this way, but I believe the implementation of oneOf() I propose in #31 may fix this problem.

removeModifier() may switch on the flag

The public Builder removeModifier(final char pModifier) method in class Builder makes use of XOR operation such as modifiers ^= Pattern.CASE_INSENSITIVE;, which may accidentally switch on the flag and cause opposite result.

This issue has not been discovered in the tests.

E.g. the testWithAnyCaseIsFalse() method in class BasicFunctionalityUnitTest calls the withAnyCase(false) after withAnyCase(), therefore the modifier is accidentally reset.

@Test
public void testWithAnyCaseIsFalse() {
    VerbalExpression testRegex = regex()
            .withAnyCase()
            .startOfLine()
            .then("a")
            .withAnyCase(false)
            .build();

    assertThat(testRegex, not(matchesTo("A")));
}

It would be more robust using bitwise AND with NOT operation. For example,

modifiers &= ~Pattern.CASE_INSENSITIVE;

Missing Tests

The following tests are missing:

  • range method
  • word method
  • multiply method

Binary release

Why not just release a binary (a .jar file).
I got it all working with maven, but it's such a pain in the ass.

anyOf is not matched requirements

How can JavaVerbalExpressions build a simple regex like this: [\d,]+

VerbalExpression.regex().anyOf("0123456789,").oneOrMore().build()

It 's too ugly! How about this one: [a-k\s\d,]+ ?

It would be nice to add a anyOf method that can used like this:

VerbalExpression.regex().anyOf(DigitCharSet, WordCharSet, SpaceCharSet ,";:,\\").oneOrMore().build()

or for no string

Please clarify, can I build regex like "(\d{1} Aaa)|(\d{2} Bbb)" using JavaVerbalExpressions?

Word boundary(\b) support

As I see word boundary isn't supported(\b)?
Or how to write an expression to find just exactly word?
For example:

String text = "Test about me";
String regex = "\\babout\\b";

Wrong code for build target

The POM has the build source and target set to 1.6 but the code does not exist until 1.7.

Specifically, the code:
VerbalExpression::addModifier(char):161

contains:
Pattern.UNICODE_CHARACTER_CLASS;

which doesn't exist until 1.7, but the compiler plugin is specifying 1.6.

How to get the startIndex and endIndex of match results?

I have to find all occurrences of dates in an email: each occurrence and the startIndex and endIndex of the occurrence. I would like to use a human-readable regex builder like JavaVerbalExpressions, but looking at the API I see no way to extract the match details like startIndex and endIndex, I only see test and testExact returning a boolean.

Is it possible to loop over all matches and get the start and end indexes and all match details (group names etc)? Thanks!

Modifier constants

I think it would be better if we use constants for specifying modifiers. Inside the code, as it is constants are being used. I can send a PR maybe if you think this would be good.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.