centraldedados / datacentral Goto Github PK
View Code? Open in Web Editor NEWTools for generating portable data portals
Tools for generating portable data portals
I think this is a very worthy initiative, and am glad that work has continued over the summer. What are your future plans, besides the couple of ideas mentioned in the README? Every project could use a bit of refactoring, and I am wondering if you have a plan to be able to respond to open issues, or maybe even start afresh at some point - and if not, what kind of community contribution you would especially welcome.
These can be easily achieved by getting Rufus Pollock's scripts to convert data packages to SQLite and PostgreSQL.
Maybe in README.md
?
I just read the blog post about Data Central and read the following sentence:
There is no API since itβs all just HTML. This might be the most evident shortcoming of a static approach.
Wouldn't it be relatively easy to expose a static JSON with information about all the available datasets as a kind of "static API"? This would probably just return the unterlying data packages.
Would you be interested in such a pull request?
$ make html
. `pwd`/.env/bin/activate; python generate.py
i | Directory repos doesn't exist, creating it.
Traceback (most recent call last):
File "generate.py", line 245, in <module>
generate()
File "/tmp/datacentral/.env/local/lib/python2.7/site-packages/click/core.py", line 384, in __call__
return self.main(*args, **kwargs)
File "/tmp/datacentral/.env/local/lib/python2.7/site-packages/click/core.py", line 370, in main
self.invoke(ctx)
File "/tmp/datacentral/.env/local/lib/python2.7/site-packages/click/core.py", line 524, in invoke
ctx.invoke(self.callback, **ctx.params)
File "/tmp/datacentral/.env/local/lib/python2.7/site-packages/click/core.py", line 236, in invoke
return callback(*args, **kwargs)
File "generate.py", line 137, in generate
shutil.rmtree(os.path.join(output_dir, "css"))
File "/usr/lib/python2.7/shutil.py", line 239, in rmtree
onerror(os.listdir, path, sys.exc_info())
File "/usr/lib/python2.7/shutil.py", line 237, in rmtree
names = os.listdir(path)
OSError: [Errno 2] No such file or directory: '_output/css'
make: *** [html] Error 1
Add an image for the site preview when the URL is shared.
The files at _output
are re-generated at every run. While there are checks to see if the Git repository of the data package has changed, currently we have no way to know if the files at _output
are stale or not.
This is an issue in the case of big datasets, taking some time to copy the CSV files to the download
dir.
The solution would be a cache file that registers the last commit from which a data package was generated the checksum of each CSV data file to determine if the files are identical (and if not, they should be overwritten).
There's been the intention to use a static site generator library instead of our custom code (which nevertheless works -- for now).
After looking through many options, staticjinja with a custom build script looks like a good fit for this purpose.
It's important to have a ready-to-use contact form to become a call to action for users and institutions to contact the site and provide references to datasets.
It's much faster, according to these benchmarks:
https://medium.com/@jyotiska/json-vs-simplejson-vs-ujson-a115a63a9e26
I am getting following error when running make install
(info from ~/.pip/pip.log):
...
Downloading async-0.6.1.tar.gz
Downloading from URL https://pypi.python.org/packages/source/a/async/async-0.6.1.tar.gz#md5=6f0e2ced1fe85f8410b9bde11be08587 (from https://pypi.python.org/simple/async/)
Running setup.py (path:/home/michal/project/datacentral/.env/build/async/setup.py) egg_info for package async
Traceback (most recent call last):
File "", line 17, in
File "/home/michal/project/datacentral/.env/build/async/setup.py", line 24
print "Ignored failure when building extensions, pure python modules will be used instead"
^
SyntaxError: invalid syntax
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 17, in
File "/home/michal/project/datacentral/.env/build/async/setup.py", line 24
print "Ignored failure when building extensions, pure python modules will be used instead"
^
SyntaxError: invalid syntax
Cleaning up...
Removing temporary dir /home/michal/project/datacentral/.env/build...
Command python setup.py egg_info failed with error code 1 in /home/michal/project/datacentral/.env/build/async
Exception information:
Traceback (most recent call last):
File "/home/michal/project/datacentral/.env/lib/python3.4/site-packages/pip/basecommand.py", line 122, in main
status = self.run(options, args)
File "/home/michal/project/datacentral/.env/lib/python3.4/site-packages/pip/commands/install.py", line 278, in run
requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
File "/home/michal/project/datacentral/.env/lib/python3.4/site-packages/pip/req.py", line 1229, in prepare_files
req_to_install.run_egg_info()
File "/home/michal/project/datacentral/.env/lib/python3.4/site-packages/pip/req.py", line 325, in run_egg_info
command_desc='python setup.py egg_info')
File "/home/michal/project/datacentral/.env/lib/python3.4/site-packages/pip/util.py", line 697, in call_subprocess
% (command_desc, proc.returncode, cwd))
pip.exceptions.InstallationError: Command python setup.py egg_info failed with error code 1 in /home/michal/project/datacentral/.env/build/async
Add a link to the dataset repository, in the dataset individual page.
i wonder if the current format for the api.json is a standard format, compatibile with CKAN.
If not, what about using the javascript, "static" version of the swagger api, instead?
http://swagger.io/
it could give access to a simple HTML documentation without much effort
Noticed when working on Python 3: there's no way to automatically verify that everything is still working as expected. Could just be some simple smoke tests for now.
It's not super hard to understand at the moment, but the config file ought to be self-documented through code comments.
We're editing the our datapackage.json files with links to dataset homepage and dataset repository. At the moment the individual dataset template page shows the link "homepage" from datapackage.json as the dataset repository link. Instead, the template should show the link from datapackage.json "repository".
Currently, the upload target is hardcoded into the Makefile, which is obviously not portable.
The config file ought to have entries for specifying (s)FTP credentials: server address, username, password. Sections like [ftp]
, [sftp]
and/or [rsync]
would fix this problem.
I was working on the templates for datacentral and was using 'make html' to see the layout changes. With 'make html' the individual dataset pages we're not being updated (because the datasets themselves didn't suffer any change).
Using 'make html-offline' all pages were updated but it should work too with 'make html' (even if the datasets are unchanged).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.