Giter Club home page Giter Club logo

statedecoded's People

Contributors

cmbirk avatar dtrebbien avatar gsingers avatar rtsio avatar waldoj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

statedecoded's Issues

Clean up invalid cross references at the end of parsing

Having made the cross reference matching quite a bit looser—to avoid under-matches— now, at the end of the load-in process, we need to sweep through and remove any cross reference that doesn't match an actual, valid section.

Strip out past "Effective" dates from catch lines

Some catch lines contain text like "Effective October 1, 2011" in them. This is not useful as of October 1, 2011, yet the titles remain there even in subsequent year's codes. Figure out how to remove this without eliminating useful, actionable information.

On the other hand, some catch lines contain text like "Effective until January 1, 2014." This is really very useful.

This is basically a case of storing metadata in titles that properly belongs elsewhere. Perhaps a "notes" field in laws_meta is the correct place to store this material?

Include Disqus comments in search

Disqus provides great functionality to allow comments to be stored locally and synchronized periodically. Do so, and include those comments in the indexing process. Some great keywords and phrases are liable to be found in those comments.

Provide an option in settings for the path prefix to the site

Obviously, a lot more than putting this in the settings will be necessary—it'll need to be employed throughout the code.

The reason for this is because it's faster in PHP to provide absolute paths than to have to look it up over and over again.

There might be some sense in setting this as a constant, automatically, which could be stored in APC. That provides no benefit for installations that lack APC (other than that it doesn't require a user to set it), but it would be both speedy and zero-configuration for others.

Overhaul how definition case is dealt with

The replace_definitions() function has to attempt all reasonable lettercase scenarios in an effort to track down a definition for a word. While this works (mostly), it's a lousy approach. Rethink it.

Create the functionality to add laws to a portfolio

Let people maintain a sort of a shopping cart of laws (and chapters, titles, etc?).

  • create JavaScript functions to add, remove, and display the status of an individual law in local storage
  • create a JavaScript function to list every law in local storage
  • implement the add/remove/status functionality on the law page
  • create a new page to list every law in local storage
  • devise a name for this feature
  • display the menu item for this new page only if the user has 1+ laws in local storage

Provide a link to return to search results

When somebody goes to a section of the code via a search, on that section, display a link to return to the search results. Yes, people have back buttons, but some people aren't that smart.

Per Vivian P.

Record synonyms within definitions

From § 15.2-2201, “Zoning” is stored, but not “to zone,” which occurs in the same definition. Both of them should be matched, but they’re not.

Four months after filing this ticket, I can't find any place where "to zone" actually appears. I'm sure it does somewhere, but I haven't turned it up yet.

Optimize get_court_decisions() function

It's a quick modification. It's found in functions.inc.php. Right now this query is selecting from court_decision_laws based on law_section. That's because, as of its creation, the laws table hadn't been populated. But it's a much less efficient query than selecting based on law_id.

Store the ID of the element to which a definition applies

Right now the definition parser stores the scope (global, title, chapter, section, etc.) of a definition, requiring repeated determinations of where to apply a definition. Instead, store the ID of the thing (title, chapter, section, etc.) that is the scope of the definition. That will be more efficient, and more flexible, given the various structures of codes.

We'll have duplicate IDs between sections and containers (titles, chapters, articles, parts, etc.), so definitions will need an indicator as to whether they apply to just one section or to something broader, and then the applicability ID can be understood to refer to one of two tables.

Atomize brief definition listings

For example, § 21-112.22:

Whenever the words “circuit court” are used in this chapter, they shall also be construed to mean “circuit or corporation court” of a city; whenever the word “county” appears in this chapter, it shall also be construed to mean “city,” and whenever the words “governing body of a county” shall appear, they shall also be construed to mean “city council.”

The sentences probably can be broken up by semicolon. Look at the example—there are three definitions, first separated by a semicolon and then by a comma), but there's a clear pattern of saying "whenever," then some words in quotation marks, then some other words in quotation marks," and then the next use of a comma, period, or semicolon. We can break up the text on that basis.

Standardize method of providing links to official text

We’re going to need a custom function for each state (which means an entire custom function file for each state) that will return the URL for that section on the state's official website. For some states it’s easy, but for others (Florida), it’s tricky.

Handle PDF-based court opinions

The Supreme Court of Virginia has long provided only PDF-based opinions, and now it looks like the text-based opinions of the Court of Appeals have been abandoned, without any updates for 14 months, while the PDF-based opinions are current.

The trick here is that those decisions are PDFs, and the text needs to be scraped out.

See Juriscraper, CALI's Free Law Reporter, pdfminer, and the especially promising pdfextract.

(Strictly speaking, this is a feature needed on Open Virginia, not The State Decoded, but having a solution in place for this will be helpful to the many other states who will want to implement this.)

Eliminate overlapping definitions

Do not define a term if it is already contained within definition tags. This will necessitate modifying $definition_word_list within section.php, specifically the PCRE, to get it to ignore any text within span class="definition" tags.

We're using a PCRE for this, but it's not working. The reason it's not working is because it's looking for an individual word that's within tags. We're not going to encounter those, because we apply definitions from longest to shortest. So if we first apply the definition for "natural-born person," then "person" will be matched again, because "person" is followed by a tag, but it is not preceded immediately by one.

The solution to this is going to require more thought. The fix is not to apply definitions from the shortest to the longest, because the longer the term, the more specific it tends to be towards the section, chapter, and title.

Pluralized terms lack citations

For instance, “supervisors” in 15.2-2000. In the database, it’s recorded as “supervisor.” I notice the same problem with “funds”/“fund.” Determine what the cause of this problem is, and fix it.

Add definitions from a legal dictionary

For instance, 2.2-315 says that 37.2-100 shall apply "mutatis mutandis" to the terms used in that article. WTF?

http://www.courts.state.va.us/ has a glossary, and searching the site for the word turns up PDFs of both criminal and civil glossaries, intended for clerks of court.

http://www.uscourts.gov/Common/Glossary.aspx
http://www.nycourts.gov/lawlibraries/glossary.shtml

Consider using the 1910 edition of Black's Law Dictionary, since it's in the public domain.

Nolo has such a guide. Perhaps there are some terms under which they'd be willing to license it?

There's also http://definitions.uslegal.com/, but the terms of (re)use are totally unclear.

Also, Wiktionary.

http://en.wiktionary.org/wiki/Appendix:Legal_terms
http://en.wiktionary.org/wiki/Category:en:Law

Provide additional data for amendment attempts

In Virginia, for instance, we know (via Richmond Sunlight) whether the bill passed or failed, we know what year it took place, and we know about the legislator. All of that data should be displayed, some in tooltip form.

Deal with too-long chapter names

38.2, chapter 13 title is too long—ends with “N”. I speculate that the title is longer than the SGML can contain. The solution is probably to check whether the title is at that maximum limit and the last character is an “N” and, if so, replace it with an ellipsis.

Create a glossary section

Calculate the MD5 hash of every definition (use a trigger, on insert or update?). List all of the places each term is defined, and every unique definition, with some sort of an indicator of where in the code that each is used.

Reindex search periodically

For any content that changes (court cases, tags, comments), it'll be necessary to run a delta reindexing in Solr periodically. Figure out a reasonable schedule and then figure out a way to schedule that. (Cron is clearly the simplest solution, though it complicates installation slightly.)

Parse all sections for definitions.

The parser currently handles only sections entitled "Definitions." This prevents the parsing of definitions within the scope of a single section. Broaden the definition parser to examine all sections.

Clean up the URLs

API URLs are currently a mess—exposed .php extensions, a false "1.0" in the URL structure (do we need the version number in there at all?), etc. Get things cleaner.

Modify how the parser handles tables

It's important to preserve the line breaks in tables, but the text unwrapping functionality breaks that. Prevent carriage returns from being stripped out of tables.

Store history data in a separate table, atomized

Take the history data, currently stored as a single field, and break it down to its smallest units. For Virginia, it would be enough to store the year and the Acts of the General Assembly identifier, but for Florida we can see that it's rather more complicated. So the table storing this will need to be flexible.

Doing this will make it easier to perform bulk analysis—which laws were enacted in a given year, for instance.

Replace the table and chapter tables with a single, hierarchical, configurable table

Currently there is a title table and a chapter table. This obviously will not work for codes that use different structures. Flatten these into a single table with a parent/child relationship. Also, think through how to label each level in that hierarchy. A second table? One table, and actually insert the correct label ("part," "chapter," etc.) into the table for each entry? Keep it in an array in the config file?

Also, section.php should be renamed law.php, and chapter/title.php should be replaced with a single file to handle both (and more) layers of functionality.

It may be helpful to establish a view in the database that has one row for each structural endpoint, storing its entire pedigree. For instance, in the Virginia code there would be one row for each chapter, listing also its title.

Fix the history year iterator to allow for single updates

If a section was passed and then updated just once, it says "updated in and 1995." This is a problem resulting from section.php. This, of course, should say only "updated in 1995." Fix this pluralization problem, which is a result of using foreach to iterate through an object.

DL listings of chapters and sections don't work correctly in all browsers

Some sections, under circumstances not yet understood, have no titles. The use of DT/DD tags on title/chapter/etc. listings results in DDs flowing up, resulting in all subsequent items in the list being mismatched. While it's necessary to fix the underlying problem of missing data, it would also be wise to have the HTML accommodate missing section names.

Document the API

Provide example requests and example responses. For the responses, straight up display the JSON (or whatever)—don't sugarcoat it. API documentation is for big boys & girls. (See issue #12 for guidance.)

  • Create a table for each endpoint with columns for name, type, description, and notes.
  • Provide an action-based ToC. As in: "list all definitions of a term," "retrieve all information about a given law," "restrict the fields that are returned," "register for a key," etc.

Limit the length of definition pop-ups

Embedded definitions are sometimes simply too long. They need to be abbreviated. Although a good stopgap solution would be to truncate them after a particular character or word count, it would be best to come up with something more intelligent. For instance, to pick some random numbers, it might be determined that 50 words is a good limit, but 70 could be acceptable. That would allow long definitions to be truncated intelligently to break cleanly at the end of sentences.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.