Giter Club home page Giter Club logo

regulations-core's People

Contributors

anthonygarvan avatar ascott1 avatar cmc333333 avatar contolini avatar grapesmoker avatar gregoryfoster avatar jmcarp avatar khandelwal avatar marcesher avatar mustyoshi avatar noahkunin avatar tadhg-ohiggins avatar theresaanna avatar willbarton avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

regulations-core's Issues

Search always returns "no results"

Whether using elasticsearch (was able to get a successful import on 12 CFR 1008) or haystack/solr for storage, search always seems to return "not found." Am I missing something obvious?

Remove django-mptt

We've been using django-mptt to describe the schema of our nested set implementation. Unfortunately, it makes relatively strict demands on which versions of Django it supports (and doesn't encode those in its own dependencies). This required us to stop supporting Django 1.8 before its LTS expired, which is against our policy.

It should be pretty simple to replace the bits of schema that we use and remove mptt altogether.

Secure write access via hmac or other authentication

Right now, the only way to lock down write access is via including or not including django projects. Let's beef that up so that the api can be public facing. Options include HMACing the message, HTTP Auth, or simpler API key sharing.

Import_docs sequencing error

There are some dependencies between document types (e.g. layers pointing to regulations) which should be accounted for in the import_docs script.

Current work around: run the script twice.

Use pip-tools

Given that regulations-core is sometimes ran independently (i.e. as an application rather than a library), we should be pinning its requirements. These will be ignored when included as a library.

Consider using Whoosh in the example settings

The setup instructions and example settings files all assume Solr or Elasticache. Can we use Whoosh instead? It seems like it has fewer dependencies, making getting started much easier.

Use the nested set model for storage

Currently, when storing the regulation tree, we store each subtree, keyed by label. While this makes processing very simple, it leads to a great deal of redundancy and can be quite slow when importing the tree (as the structure must be walked and each subtree inserted).

An alternative is the nested set model, which stores each node once but makes grabbing subtrees (equivalent to subsets) painless. There's even a few implementations in django.

Elastic Search 'Amendments' Model Parsing Failure

When parsing 37 CFR 42 and core configured to use elastic search, every PUT to a notice URI fails the same way. Here are some snippets that don't make it to the console but provide a great deal of context, pulled from local variables in paused client.py post-exception:

Can't merge a non object mapping [amendments.changes] with an object mapping [amendments.changes]` [{'reason': '[YW_wNku][127.0.0.1:9300][indices:data/write/index[p]]', 'type': 'remote_transport_exception'}] '[YW_wNku][127.0.0.1:9300][indices:data/write/index[p]]'

As the request is made, regulations-core/regcore/db/es.py line 115 local variable notice has the following under the amendments key (ie. notice[‘amendments’]):

[

    {'authority': '35 U.S.C. 2(b)(2).', 'instruction': '1. The authority citation for 37 CFR part 1 continues to read as follows:', 'cfr_part': '1'},

    {'changes': [['1-301', [{'action': 'DELETE'}]]], 'instruction': '2. Section 1.301 is removed and reserved.', 'cfr_part': '1'},

    {'changes': [['1-302', [{'action': 'DELETE'}]]], 'instruction': '3. Section 1.302 is removed and reserved.', 'cfr_part': '1'},

    {'changes': [['1-303', [{'action': 'DELETE'}]]], 'instruction': '4. Section 1.303 is removed and reserved.', 'cfr_part': '1'},

    {'changes': [['1-304', [{'action': 'DELETE'}]]], 'instruction': '5. Section 1.304 is removed and reserved.', 'cfr_part': '1'},

    {'instruction': '6. Part 42 is added to read as follows:', 'cfr_part': '1'},
    {'instruction': '7. Part 90 is added to read as follows:', 'cfr_part': '90'}

]

With the debugger paused immediately after this failure, I attempted to pull what we already have there. There is no record:

$ curl 'http://localhost:9200/eregs/notice/2012-17900'
{"_index":"eregs","_type":"notice","_id":"2012-17900","found":false}

And pulling the schema didn't give me any hints about the preferred structure of amendments.

$ curl http://localhost:9200/eregs/_mapping/notice
{
  "eregs":{
    "mappings":{
      "notice":{
        "properties":{
          "cfr_parts":{
            "type":"text",
            "fields":{
              "keyword":{
                "type":"keyword",
                "ignore_above":256
              }
            }
          },
          "cfr_title":{
            "type":"long"
          },
          "dockets":{
            "type":"text",
            "fields":{
              "keyword":{
                "type":"keyword",
                "ignore_above":256
              }
            }
          },
          "document_number":{
            "type":"text",
            "fields":{
              "keyword":{
                "type":"keyword",
                "ignore_above":256
              }
            }
          },
          "effective_on":{
            "type":"date"
          },
          "footnotes":{
            "type":"object"
          },
          "fr_citation":{
            "type":"text",
            "fields":{
              "keyword":{
                "type":"keyword",
                "ignore_above":256
              }
            }
          },
          "fr_url":{
            "type":"text",
            "fields":{
              "keyword":{
                "type":"keyword",
                "ignore_above":256
              }
            }
          },
          "fr_volume":{
            "type":"long"
          },
          "meta":{
            "properties":{
              "start_page":{
                "type":"long"
              }
            }
          },
          "primary_agency":{
            "type":"text",
            "fields":{
              "keyword":{
                "type":"keyword",
                "ignore_above":256
              }
            }
          },
          "publication_date":{
            "type":"date"
          },
          "title":{
            "type":"text",
            "fields":{
              "keyword":{
                "type":"keyword",
                "ignore_above":256
              }
            }
          },
          "versions":{
            "properties":{
              "42":{
                "properties":{
                  "left":{
                    "type":"text",
                    "fields":{
                      "keyword":{
                        "type":"keyword",
                        "ignore_above":256
                      }
                    }
                  },
                  "right":{
                    "type":"text",
                    "fields":{
                      "keyword":{
                        "type":"keyword",
                        "ignore_above":256
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Consider consolidating backends

We currently support writing data to sql via django, and to elastic. We also support a second elastic / solr index via haystack. I also see that we're talking about an additional search backend in #10. I'm guessing I'm missing some context here, but why is it useful to have all these backend options? Do we have sufficiently different use cases that some users would want postgres full-text search, others postgres + haystack, and others elastic?

Update mention of example_settings.py in README.md

Hi, I'm new to regulations-core and a relative Python newb, so my apologies in advance should I miss the obvious. Helpful pointers welcome!

In walking through the documentation on building regulations-core from source, there's mention of an example_settings.py file which I'm unable to find in the source repository. Should this be a reference to regcore/settings/base.py?

Thank you!

Wrap http requests in transactions

We don't wrap our requests in transactions, currently. While concurrent writes/reads during writes haven't been a use case we've cared much about, it's a good practice.

import_reg command failing due to renamed import

Hello,
I'm new-ish to the project and was attempting to import the local output from the suggested regulations-parser example regulation (Title 27 Part 447) into my instance of regulations-core using the suggested import_reg command. I saw that Python 2.7 is required for that command due to its usage of urlparse, which may be worth documenting as a separate issue if/for when the project should move to depend only on Python 3. The command then presented the following error:

File "...regulations-core/regcore/management/commands/import_reg.py", line 13, in <module>
    from regcore_write.views import regulation, diff, layer, notice
ImportError: cannot import name regulation

Seeing there's no longer a regulation.py file in regcore_write.views I was able to identify a commit (2a5f8cd) which shows the object was renamed to Document. Changing the import plus the only reference I could see (line #201) resulted in the command running successfully and some rows added to the SQLite database - but no rows were added to regcore_document which seems suspicious to me. There are quite a few other variables in the command which reference "regulation" and perhaps some expectation of values named accordingly in the JSON files, so I wanted to check with the experts before submitting a minimal pull request.

Thank you for your important work on this project, more timely than ever!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.