In my uk_address_parser I've used
a custom parser to pull out the address parts. The elements are identified via
a fairly ugly case
statement, which relies on regular expresssions. I'd like
to improve this and think that replacing it with a dedicated parser would be a
worthwhile step forward. Also it will give me a chance to gain some parser
building experience.
I am using Treetop to build the address parser.
- Treetop Documentation
- A quick intro to writing a parser with Treetop
- Treetop Grammar Line Continuation
- Getting started with Treetop
To run the parsing script:
ruby parse_address.rb