Giter Club home page Giter Club logo

probableparsing's Introduction

probableparsing

Common methods for probable parsers

probableparsing's People

Contributors

fgregg avatar

Stargazers

Amsalu avatar  avatar Jeremy McMillan avatar Craig Bennett avatar Trevor Prater avatar

Watchers

James Cloos avatar Derek Eder avatar  avatar

probableparsing's Issues

Unicode objects with non-ascii characters throw UnicodeEncodingError exception (instead of RepeatedLabelError)

This code is only a couple dozen lines long, so how is it possible it has a bug? Unicode, that's how.

When this error is thrown and the input is a unicode object containing non-ascii characters, this code throws something like this:

Error
Traceback (most recent call last):
  File "/home/mlissner/Programming/intellij/courtlistener/cl/lib/tests.py", line 554, in test_normalize_atty_contact
    result = normalize_attorney_contact(pair['q'])
  File "/home/mlissner/Programming/intellij/courtlistener/cl/lib/pacer.py", line 439, in normalize_attorney_contact
    tag_mapping=mapping,
  File "/home/mlissner/.virtualenvs/courtlistener/local/lib/python2.7/site-packages/usaddress/__init__.py", line 178, in tag
    label)
  File "/home/mlissner/.virtualenvs/courtlistener/local/lib/python2.7/site-packages/probableparsing/__init__.py", line 22, in __init__
    repo_url=self.REPO_URL)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 55: ordinal not in range(128)

This is unfortunate because my calling code is designed to catch RepeatedLabelErrors not UnicodeEncodeErrors (and ideally it'd stay that way).

I guess the solution is to assume that people are going to use unicode within their code instead of strings, and to tweak the code by adding a u before the definition of MESSAGE and DOCS_MESSAGE. When I do that, it fixes the bug.

What I haven't thought through is what happens if people are using strings in the parser instead of using unicode objects. In that case, do things get better or worse?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.