steiza / docstore Goto Github PK
View Code? Open in Web Editor NEWFor any civics-minded organization that needs a simple place to host documents publicly
Home Page: http://a2docs.org/
For any civics-minded organization that needs a simple place to host documents publicly
Home Page: http://a2docs.org/
I've looked at OSX and a few linux distros. Especially ones that support having both 2.x and 3.x pythons installed. It seems that the executable python2.7 exists on all of them.
On my server I made webserver.py executable and added the following to the top:
!/usr/bin/env python2.7
That looks for the python2.7 in path vs a hard path. Maybe give it a try on your dev box and see if it causes issues. Seems to work wherever I put it.
This allows it to just run as ./webserver.py which helps in writing a init.d script to control it. I need to polish up the init.d script and then I'll share that in a support scripts folder along with nginx config.
Related to #3 -
The current a2docs has a number of docs where a single document id is associated with multiple files, e.g.
http://a2docs.org/doc/292/ "Ann Arbor Fire Department response times"
which is different from
https://a2docs.aadl.org/view/292 "Ann Arbor Golf Proposal for Huron Hills"
I'm not sure where the ID skew is coming from, but the goal is to preserve the old URLs so that Arborwiki doesn't require a bunch of updates.
See https://github.com/minio/minio - the opportunity is to allow people to download files directly from a2docs without going through the web interface by using at Minio server. Minio provides an Amazon S3 compatible interface layer.
Urgency: low. Interestingness: high.
If you upload a file that has a comma in the name, it goes boom. System is Chrome, running against localhost.
The localhost page isn’t working
localhost sent an invalid response.
ERR_RESPONSE_HEADERS_MULTIPLE_CONTENT_DISPOSITION
_AAATA Board Packet November 19, 2015_Revised.pdf is the filename.
[Include details here]
The Stack Overflow answer to this is here:
and that issue has something to do with the comma (",") character in the filename.
I attempted to upload a 10.6 megabyte site plan from a project in front of the Planning Commission, and got this error message. @eby - the logs should show this from June 6 at about 12:50 pm.
The docs I can find about nginx
refer to a stanza declaring client_max_body_size
as the thing to change.
It would be nice to have the Content-type
response header set for attached files, which might make reading on e.g. Chrome, iOS webview, etc. more convenient. I'm not sure if setting content-disposition: attachment
prevents the webview from displaying the document in the native PDF viewer, but I can experiment with that if needed.
$ curl -v https://a2docs.org/file/570/2760+Stanton+-+FOIA+Final.pdf
> GET /file/570/2760+Stanton+-+FOIA+Final.pdf HTTP/2
> Host: a2docs.org
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/2 200
< date: Fri, 11 Dec 2020 16:44:08 GMT
< content-type: application/octet-stream
< content-length: 167169
< server: TornadoServer/6.0.3
< content-disposition: attachment; filename="2760 Stanton - FOIA Final.pdf"
< etag: "770df252e24b5b9c39539ec2a8a459da19a45e1e"
< strict-transport-security: max-age=15768000
A link that does display inline correctly:
$ curl -v https://cdn.ballotpedia.org/images/c/cf/2020_Hawaii_sample_ballot_%28Hawaii_County%29.pdf
> Host: cdn.ballotpedia.org
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/2 200
< content-type: application/pdf
< content-length: 648057
< date: Fri, 11 Dec 2020 16:45:16 GMT
< last-modified: Tue, 20 Oct 2020 16:35:33 GMT
< etag: "bd9648313b96686eb357f26a728f7914"
< accept-ranges: bytes
< server: AmazonS3
< x-cache: Miss from cloudfront
< via: 1.1 63b9a4cda82206b6b34aab8f3e958cbe.cloudfront.net (CloudFront)
< x-amz-cf-pop: ORD52-C1
< x-amz-cf-id: l2t0ZfreqWrmhldPoyPu70kdH7JORaGyjK_ZIRpP_U6cOaLV2gyJTQ==
Some notes on a transition:
We have an old URL (a2docs.org) and a new URL (a2docs.aadl.org). It would be good to have a plan to consolidate the two, and I think that the surviving URL is the .aadl.org domain.
I suspect that the long term answer is to transfer the a2docs.org domain handling to an nginx configuration which does whatever necessary domain mapping.
The main reason for wanting this is to ensure that all of the old links to a2docs.org that are in Arborwiki still work. An alternative plan is to identify all of those pages that have those links one by one and fix them, and then retire the old a2docs.org name entirely.
Looking at index.html
, org.html
, search.html
and probably others, the title typically includes "A2":
{% block title %}
Search A2 Government Document Repository
{% end %}
I note that base.html
uses {{region}}
. I'm too tired to fix this now and verify it actually works on my machine (and I am not familiar with Tornado's templating so I'll need to test this locally), so filing this as a note for later.
This is just here for discussion and is likely a long term change. Auth helps things from getting deleted and spam and the admin user is a good fit for that.
It seems from glancing at some of the uploads and the fields that a use case is tracking requests that have been places. So putting in a stub record of what was requested and date requested and coming back later and uploading the doc when it is received. Correct me if I'm wrong @vielmetti
If that is the case then might be worth discussing what a user management might look like along with views for managing requests.
Could probably do something external like basic webserver auth which the app just then associates the file with the login name. That would prevent the need for user admin interfaces.
On the evening of 2022-06-11 I set up an upload of 3 files to a2docs (FOIA 1258).
The upload failed with an Internal Server Error.
No idea what went wrong.
Feature request from @vielmetti
Old system had an autocomplete on the source organization field. It also had a couple common ones.
It my test of the old autocomplete it was still showing quite a few slight spelling differences so obviously isn't fool proof but would aid in some of the browsing if most things are uniform.
Feature request from @vielmetti
Original a2docs.org has the text
Add any relevant details about the documents. What are the documents about? Were there any problems or revelations? If your request was denied, what reason was given? What is the larger issue?
Could use this text as the alt or discuss a different form of the text.
The styling of the box should probably be more fluid for browser size.
This is a feature not in the current system, and needs a little thought.
Any given request might have multiple tracking numbers; e.g. the tracking number assigned by the reader to their own request, the tracking number assigned by the institution for internal use, and the tracking number kept by a third party like a2civictech or seeclickfix for external review.
Sometimes these tracking numbers have URLs too.
I don't know how to represent this.
In #35 it was noted that there was a server error (now fixed) when uploading to a2docs.
There are a couple of documents stuck in the queue as a result. Review them, and when they are reviewed, close this.
Maybe some distros has it by default but had to install pyyaml. I'm not sure if there is a minimum version so just doing issue instead of pull request. I installed 3.11
Just putting in a couple things to meet @vielmetti 's hope for parity with old version.
In a2docs.org when viewing a doc the source organization goes to a search for other things for that organization.
Compare http://a2docs.org/doc/382/
Note that there are 4 documents in this document set, and that each of them has a detail page URL, e.g.
http://a2docs.org/doc/382/view/496/ "BlockbyBlock_Ann Arbor DDA - OperatingBudget - 436 hours.xlsx"
While I can think of all kinds of features that might be on this page, the minimum necessary for it is to have a compatible URL so that a deep link to that particular record continues to work.
As I was doing an upload this a.m. I noticed that there were two semi-identical names for agencies that came up in the popup - "Ann Arbor Area Transportation Authority" and "Ann Arbor Area Transit Authority". Only one of those is correct.
The hope would be for some administrative way to remedy this, not sure the precise best way yet.
This is here for my tracking. Need to create a directory (support-scripts ??) and provide the following sample docs:
Should probably also do a systemd script but will have to throw up a VM to test.
Haven't had time to dig but guessing maybe a python 3+ issue? Could also be nginx needs specific config for that path but looking at some other posts it sounds like behaviour changed in 3.x and things have to be encoded manually.
Traceback (most recent call last):
File "/usr/lib/python3.8/base64.py", line 510, in _input_type_check
m = memoryview(s)
TypeError: memoryview: a bytes-like object is required, not 'str'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tornado/web.py", line 1702, in _execute
result = method(*self.path_args, **self.path_kwargs)
File "/var/www/a2docs/docstore", line 490, in get
auth_decoded = base64.decodestring(auth_header[6:])
File "/usr/lib/python3.8/base64.py", line 554, in decodestring
return decodebytes(s)
File "/usr/lib/python3.8/base64.py", line 545, in decodebytes
_input_type_check(s)
File "/usr/lib/python3.8/base64.py", line 513, in _input_type_check
raise TypeError(msg) from err
TypeError: expected bytes-like object, not str
It'd be neat to have an RSS/Atom feed of new documents.
Cf https://a2docs.aadl.org/view/292
A date field displayed to the reader should include not only "date requested" and "date received" but also "date posted". This ensures that there's at least one date displayed.
The relevant bit from the old system here:
See https://a2docs.aadl.org/view/408 especially
esp at the bottom
Download 3-16-2016 HDC Minutes with Live Links.pdf
where I get "500: Internal Server Error" as a response.
The three docs had been uploaded as a batch in a single transaction from the "upload" function on my Mac running Chrome.
In #24 there have been ongoing problems with "413 Request Entity Too Large" errors. Not seeing any right now, but it is anticipated that by switching to the tornado.web.stream_request_body
method this problem might go away entirely.
See e.g.
The current version of the code has autocomplete, but the aadl version doesn't have that yet.
Zach identified the question that we're not sure that his import script imported properly the files where there are multiple documents in a single entry, so a redeploy will need to track that issue too.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.