Giter Club home page Giter Club logo

icalparser's Introduction

ICalParser   Build Status

An iCalendar (RFC 5545) parser backed up by an ANTLR v4 grammar.

Introduction

Although ANTLR v4 currently has one target implemented (Java), the grammar contains no target specific code. Therefor the Ant target responsible for generating the parser prepends a package declaration after the ANTLR tool has generated .java source files.

Note that this should be considered a pre-alpha release. Although most production rules in the parser are tested, much more thorough tests are preferred. My main reason for writing this grammar was to get acquainted with ANTLR v4, not to use the parser in production! However, feel free to send pull requests of fixes, or report issues you encountered. I am willing to put in some extra effort to make things more robust.

Getting started

Make sure you have Ant installed.

Do the following:

  • clone this repository: git clone https://github.com/bkiers/ICalParser.git
  • generate the parser source files: ant generate
  • run the Main class: ant run

The run target will parse src/ics/test.ics and will print some information about this iCalendar file. If you'd like to parse another file, provide it on the command line like this: ant -Dics=path/to/other/file.ics run

Walking the parse tree

After generating the parser files, ANTLR also creates some listener files that can be used to let a tree walker traverse the parse tree the parser creates of the input, and then will let you "listen" for certain enter or exit events that are fired whenever the tree walker enters or exits a production rule.

This might sound a bit vague, so I'll give a small demo. Let's say you're only interested in TODO components of an iCalendar file, and you'd like to know the DTSTAMP and ORGANIZER properties of this component.

You start by creating a class, TodoListener that extends (the generated) ICalendarBaseListener. This ICalendarBaseListener has empty methods for all production rules in the grammar. So we're only going to override the method that gets invoked whenever the tree walker enters the todoc parser rule:

class TodoListener extends ICalendarBaseListener {

    /*
     * The production of a `todoc` in the ANTLR grammar looks like this:
     *
     *   // 3.6.2 - To-Do Component
     *   todoc
     *    : k_begin COL k_vtodo CRLF
     *      todoprop*?
     *      alarmc*?
     *      k_end COL k_vtodo CRLF
     *    ;
     */
    @Override
    public void enterTodoc(ICalendarParser.TodocContext ctx) {

        // the first property is DTSTAMP
        String dtstamp = ctx.todoprop(0).dtstamp().date_time().getText();

        // the 4th property is ORGANIZER
        String organizer = ctx.todoprop(3).organizer().cal_address().getText();

        System.out.println("dtstamp   -> " + dtstamp);
        System.out.println("organizer -> " + organizer);
    }
}

To test this class, do the following:

public class TodoDemo {

    public static void main(String[] args) throws Exception {

        FileInputStream fis = new FileInputStream("src/ics/test.ics");
        ICalendarLexer lexer = new ICalendarLexer(new ANTLRInputStream(fis));
        ICalendarParser parser = new ICalendarParser(new CommonTokenStream(lexer));
        ParseTree tree = parser.parse();
        ParseTreeWalker walker = new ParseTreeWalker();
        walker.walk(new TodoListener(), tree);
    }
}

The file src/ics/test.ics contains the following:

BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//ABC Corporation//NONSGML My Product//EN
BEGIN:VTODO
DTSTAMP:19980130T134500Z
SEQUENCE:2
UID:[email protected]
ORGANIZER:mailto:[email protected]
ATTENDEE;PARTSTAT=ACCEPTED:mailto:[email protected]
DUE:19980415T000000
STATUS:NEEDS-ACTION
SUMMARY:Submit Income Taxes
BEGIN:VALARM
ACTION:AUDIO
TRIGGER:19980403T120000Z
ATTACH;FMTTYPE=audio/basic:http://example.com/pub/audio-
 files/ssbanner.aud
REPEAT:4
DURATION:PT1H
END:VALARM
END:VTODO
END:VCALENDAR

If you now run TodoDemo, you will see the following being printed to your console:

dtstamp   -> 19980130T134500Z
organizer -> mailto:[email protected]

See src/main/ical/TodoDemo.java for a working version of the demo above.

Grammar

I tried to follow the naming convention and grammar rules as used in RFC 5545 as much a possible. I did stray from the specs at times: '\n' will also be considered a CRLF.

The grammar defines very little lexer rules. This is because it is hard to properly define keyword- and identifier tokens in iCalendar. To define keywords on a lexical level, you'd also need to create a rule that matches an identifier. For example, if you create a lexer rule like this:

VERSION : 'VERSION';
...
ALPHA : [a-zA-Z];

you don't want input like "VERSIONs" to be tokenized as a VERSION token followed by an ALPHA. So there should also be a rule that matches identifiers:

VERSION : 'VERSION';
...
IDENTIFIER : [a-zA-Z-]+;

causing "VERSIONs" to be properly tokenized as a IDENTIFIER. However, the iCalendar RFC defines a dur-value as follows:

dur-value = (["+"] / "-") "P" (dur-date / dur-time / dur-week)
dur-time  = "T" (dur-hour / dur-minute / dur-second)
dur-hour  = 1*DIGIT "H" [dur-minute]

meaning that the input "-PT12H" is a valid dur-value. However, the substring "PT" would already be tokenized as an IDENTIFIER token, making it messy in production rule(s) to match these corner cases properly (there are much more of such cases!). Therefor the lexer tokenizes single characters and the parser will match keywords (and identifiers), making it behave much like a PEG.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.