eregs / regulations-core Goto Github PK
View Code? Open in Web Editor NEWAn engine that supplies the API that allows users to read regulations and their various layers.
License: Creative Commons Zero v1.0 Universal
An engine that supplies the API that allows users to read regulations and their various layers.
License: Creative Commons Zero v1.0 Universal
Whether using elasticsearch (was able to get a successful import on 12 CFR 1008) or haystack/solr for storage, search always seems to return "not found." Am I missing something obvious?
Several flake8 plugins were added to the parser in eregs/regulations-parser#346. Add them here, too.
We've been using django-mptt to describe the schema of our nested set implementation. Unfortunately, it makes relatively strict demands on which versions of Django it supports (and doesn't encode those in its own dependencies). This required us to stop supporting Django 1.8 before its LTS expired, which is against our policy.
It should be pretty simple to replace the bits of schema that we use and remove mptt altogether.
Right now, the only way to lock down write access is via including or not including django projects. Let's beef that up so that the api can be public facing. Options include HMACing the message, HTTP Auth, or simpler API key sharing.
In addition to the existing options of haystack and elastic search
(After #64)
There are some dependencies between document types (e.g. layers pointing to regulations) which should be accounted for in the import_docs
script.
Current work around: run the script twice.
Right now, if you get a 500 while uploading, it's very difficult to figure out why.
Ideally we'd be supporting the long term release and the current release of Django
Given that regulations-core is sometimes ran independently (i.e. as an application rather than a library), we should be pinning its requirements. These will be ignored when included as a library.
The setup instructions and example settings files all assume Solr or Elasticache. Can we use Whoosh instead? It seems like it has fewer dependencies, making getting started much easier.
Right now, we're vulnerable to timing attacks as we leak little bits of information about the auth string.
Currently, when storing the regulation tree, we store each subtree, keyed by label. While this makes processing very simple, it leads to a great deal of redundancy and can be quite slow when importing the tree (as the structure must be walked and each subtree inserted).
An alternative is the nested set model, which stores each node once but makes grabbing subtrees (equivalent to subsets) painless. There's even a few implementations in django.
When parsing 37 CFR 42 and core configured to use elastic search, every PUT
to a notice URI fails the same way. Here are some snippets that don't make it to the console but provide a great deal of context, pulled from local variables in paused client.py post-exception:
Can't merge a non object mapping [amendments.changes] with an object mapping [amendments.changes]` [{'reason': '[YW_wNku][127.0.0.1:9300][indices:data/write/index[p]]', 'type': 'remote_transport_exception'}] '[YW_wNku][127.0.0.1:9300][indices:data/write/index[p]]'
As the request is made, regulations-core/regcore/db/es.py
line 115 local variable notice
has the following under the amendments
key (ie. notice[‘amendments’]):
[
{'authority': '35 U.S.C. 2(b)(2).', 'instruction': '1. The authority citation for 37 CFR part 1 continues to read as follows:', 'cfr_part': '1'},
{'changes': [['1-301', [{'action': 'DELETE'}]]], 'instruction': '2. Section 1.301 is removed and reserved.', 'cfr_part': '1'},
{'changes': [['1-302', [{'action': 'DELETE'}]]], 'instruction': '3. Section 1.302 is removed and reserved.', 'cfr_part': '1'},
{'changes': [['1-303', [{'action': 'DELETE'}]]], 'instruction': '4. Section 1.303 is removed and reserved.', 'cfr_part': '1'},
{'changes': [['1-304', [{'action': 'DELETE'}]]], 'instruction': '5. Section 1.304 is removed and reserved.', 'cfr_part': '1'},
{'instruction': '6. Part 42 is added to read as follows:', 'cfr_part': '1'},
{'instruction': '7. Part 90 is added to read as follows:', 'cfr_part': '90'}
]
With the debugger paused immediately after this failure, I attempted to pull what we already have there. There is no record:
$ curl 'http://localhost:9200/eregs/notice/2012-17900'
{"_index":"eregs","_type":"notice","_id":"2012-17900","found":false}
And pulling the schema didn't give me any hints about the preferred structure of amendments
.
$ curl http://localhost:9200/eregs/_mapping/notice
{
"eregs":{
"mappings":{
"notice":{
"properties":{
"cfr_parts":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"cfr_title":{
"type":"long"
},
"dockets":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"document_number":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"effective_on":{
"type":"date"
},
"footnotes":{
"type":"object"
},
"fr_citation":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"fr_url":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"fr_volume":{
"type":"long"
},
"meta":{
"properties":{
"start_page":{
"type":"long"
}
}
},
"primary_agency":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"publication_date":{
"type":"date"
},
"title":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"versions":{
"properties":{
"42":{
"properties":{
"left":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"right":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
}
}
}
}
}
}
}
}
}
}
We currently support writing data to sql via django, and to elastic. We also support a second elastic / solr index via haystack. I also see that we're talking about an additional search backend in #10. I'm guessing I'm missing some context here, but why is it useful to have all these backend options? Do we have sufficiently different use cases that some users would want postgres full-text search, others postgres + haystack, and others elastic?
Hi, I'm new to regulations-core
and a relative Python newb, so my apologies in advance should I miss the obvious. Helpful pointers welcome!
In walking through the documentation on building regulations-core
from source, there's mention of an example_settings.py
file which I'm unable to find in the source repository. Should this be a reference to regcore/settings/base.py
?
Thank you!
We don't wrap our requests in transactions, currently. While concurrent writes/reads during writes haven't been a use case we've cared much about, it's a good practice.
This will prevent the need for #10
Hello,
I'm new-ish to the project and was attempting to import the local output from the suggested regulations-parser
example regulation (Title 27 Part 447) into my instance of regulations-core
using the suggested import_reg
command. I saw that Python 2.7 is required for that command due to its usage of urlparse
, which may be worth documenting as a separate issue if/for when the project should move to depend only on Python 3. The command then presented the following error:
File "...regulations-core/regcore/management/commands/import_reg.py", line 13, in <module>
from regcore_write.views import regulation, diff, layer, notice
ImportError: cannot import name regulation
Seeing there's no longer a regulation.py
file in regcore_write.views
I was able to identify a commit (2a5f8cd) which shows the object was renamed to Document
. Changing the import plus the only reference I could see (line #201) resulted in the command running successfully and some rows added to the SQLite database - but no rows were added to regcore_document
which seems suspicious to me. There are quite a few other variables in the command which reference "regulation" and perhaps some expectation of values named accordingly in the JSON files, so I wanted to check with the experts before submitting a minimal pull request.
Thank you for your important work on this project, more timely than ever!
We currently only support 1.8 and 1.9
Often we've wanted to clean up single parts or delete a single part. Consider pulling in:
cfpb/regulations-core#63
cfpb/regulations-core#64
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.