data-liberation-front / csvlint.io Goto Github PK
View Code? Open in Web Editor NEWCheck that your CSV files are valid
Home Page: http://csvlint.io
License: MIT License
Check that your CSV files are valid
Home Page: http://csvlint.io
License: MIT License
Part of theodi/shared#153
I just came across this in the schemas_controller:
schemas = Schema.all
@schemas = Kaminari.paginate_array(schemas).page(params[:page])
I think paginate_array
is for when you have a simple array object, with a AR query I think you could have
@schemas = Schema.all.page params[:page]
Which would have the advantage of calling limit/skip on the Schema query rather than bringing in all the items each time.
If a schema is entered, but then the checkbox is unticked, the schema is still used as the form is still filled in behind the scenes.
On homepage, and perhaps README on /about
e.g. take a CKAN url and validate all the CSVs in that dataset and then cross validate them.
Imagine you have a series of monthly CSVs, it would take a while to validate the whole lot. Further they may be valid individually, but as a collection the column titles are inconsistent etc.
Would be great to be able to detect errors in a dataset.
Because it's a pain in the bum.
CSVs that 404 currently produce a broken image in the list view. Perhaps a "not found" badge would be better?
Part of theodi/shared#155
I've had several occasion where the returned error complained about the header, I'm not sure it was always right.
An example might be:
https://directgovinnovate.s3.amazonaws.com/ext-search/200311.csv
404s are currently swallowed up: user redirected to homepage.
Found because of broken link here: http://data.gov.uk/dataset/bona-vacantia-estates-advertisements to:
I think the colours in the table have not been implemented quite right. Should be a grey background if the number is 0; otherwise the background should be the appropriate colour.
So in this example:
errors/context should be red background
warnings/structure should be grey background
warnings/schema should be grey background
messages/structure should be grey background
messages/schema should be grey background
Check your CSV files with CSVLint
CSV looks easy, but it can be hard to make a CSV file that other people can read easily. CSVLint helps you to check that your CSV file is readable. And you can use it to check whether it contains the columns and types of values that it should.
Just enter the location of the file you want to check, or upload it. If you have a schema which describes the contents of the CSV file, you can also give its URL or upload it. Read more...
Validating http://certificate.theodi.org/status.csv causes an error with a forbidden redirect to the HTTPS version.
In order to cope with big files, we need to:
http://csvlint.io/validation/530b5aa263737633f8000000 gives duplicate column name errors, as there's a title line on the first row, then a blank row, followed by headers.
It seems to be a common issue that people put information about the CSV in the first line (or few lines). Should we generate a separate error for this if the first few lines have one column? Something which in the front end would say:
"Your CSV seems to contain unstructured text at the beginning of the file, it is important that your CSV only contains structured data - any background information or metadata should be included on a referring web page or accompanying document"
If a server doesn't support if-modified-since, then it will be revalidated every time. We should make it only revalidate every few hours, perhaps, with a manual 'revalidate' button to override.
And that they can upload files if they want them to remain private.
Why
http://csvlint.io/validation/list
rather than
and why does the latter URL just give you the home page?
Need some feedback to help user identify how to fix up some of the errors/warnings.
We might be able to qualify this based on some additional information in the headers. For example, I just came across an interesting one testing the Land Registry CSV data. This file reports wrong content-type:
Its served as application/octet-stream. Doing a HEAD request I can see the Server
is being reported as AmazonS3. We could provide some specific guidance here, e.g. that application/octet-stream is the default for S3 unless you specify content type during upload.
When viewing a validation, the revalidation form doesn't work.
The 'browse' button somehow takes over everything,
Possibly using this? https://github.com/Jahdrien/FileReader
Validating https://certificate.theodi.org/status.csv gives an invalid SSL cert error. The app probably needs updated public certs or something. I forget.
Nothing terrifying, but would be good to clean up.
http://csvlint.io/validation/53283a016373767dab810000
See screenshot – think we were going to add some kind of shortening magic so the urls would read:
www.gov.uk...Spend_over__25K__2013_-_CSV.csv
<span class="translation_missing" title="translation missing: en.wrong_content_type">wrong content type</span>
<span class="translation_missing" title="translation missing: en.quoting">quoting</span> on row 12
Need some short examples on how to write schemas.
This is literally a brain dump, so apologies for the unstructured form.
Warning flags would go up if I'd find anything of the following:
and my favourite one:
Here's a terrific example:
http://data.gov.uk/data/dumps/data.gov.uk-ckan-meta-data-latest.csv.zip
Part of theodi/shared#154
While validating CSV from http://www.who.int/tb/country/data/download/en/
I got a service unavailable message
Just playing with layouts for homepage. @JeniT we talked about making it clear that you only have to fill in 1 field: do you think either of these layouts are clearer?
(Note that the 2nd layout would need some tweaking as there isn't much room for the url / filename )
Part of theodi/shared#154
The download button displays an error for fa-cloud-download
.
Following discussion on the CSV on the Web Working Group, I think we should downgrade issues with headers to warnings rather than errors, or at least in general. More specifically:
Content-Type
header or the Content-Type
header doesn't include the parameter header=absent
:empty_column_name
and :duplicate_column_name
should be warnings rather than errorsTasks:
· current 'File URL' column to change functionality on-click to 'view report'
· current 'view-report' becomes 'download file'
· possible to support an optional 'file name' in current 'File URL' column? will need separate column? how often is this column likely to be filled? - future functionality - consider supporting in design
We can use DataPipes for this, a la http://okfnlabs.org/bad-data/ex/gla-spending/
@JeniT as discussed we are proposing pulling out key information into a dashboard style.
Errors will continue to be grouped by category (probably in the form of an accordion) and colours will continue be used to indicate severity.
· include 'Messages' under errors & warnings in dashboard
· colour code the dashboard errors/warnings/messages inline with report
· change yellow of warnings to amber to bring further inline with 'traffic light' system
· create 'accordion' sections of report - structural/schema/context problem with breakdown of errors/warnings/messages
· include column confliction form under dashboard
· allow space for multiple 'badges' in dashboard render
They're a bit basic currently. Need to humanise it a bit.
I also have this problem in R that you can't read csv-files the normal way if they are served via a https link.
@JeniT As discussed we will remove the "Powered by the ODI" link in the header. Stephen will update this issue with a more high fidelity version soon.
Tasks;
· Remove ODI logo from top right of header nav
· Menu moves into header nav with same styling as ODI ( http://theodi.org )
· Add 'About' section to menu illustrated above
· Insert existing copy/edit of 'About' text
· remove 'Home' title
· Social links in footer?
· investigate possibility of cleaner solution to double tabbed 'upload/from URL
· Correct wording of validation form
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.