Comments (10)
With, perhaps, a direct link to the corresponding page on the portal.
from metalicious.
Here's a simple, relatively low-maintenance way to flag and link:
The databases in the the dictionary are the sources for all/most of the datasets on a city's data portal.
To get complete transparency, and to allow cities to keep track of what ETL scripts they have floating around that get data from these systems and put it on their portal, it would be excellent to be able to associate databases in metalicious with datasets on a data portal.
You could just associate a list of Socrata dataset id's with a given database in metalicious, and fetch any additional data you might want about these datasets - their name, number of downloads, when they were last updated, whatever - from the Socrata Open Data API.
This would be a similar approach to how we built the project repository for the open gov hack night website - by only entering in Github repos and fetching all other data from the Github API. The code for that is here.
from metalicious.
Come to think of it, there's probably some easyish way of pulling databases, data portal datsets, and apps together. If developers added civic.json (h/t @ryanbriones) files to their repos that simply pointed to the Socrata datasets they use, then you could imagine metalicious easily listing downstream project repos for each database by hitting the Github API.
Thoughts @derekeder, @kfogel, @evz?
from metalicious.
Well, it's always the question of who is the "you" in "you could just associate a list of
But moving from that to the crowd of unassociated developers who write apps that use a particular database is a different matter. Some of those developers will get the memo and use civic.json to associate their app with the upstream DB, and some won't. It's not so much a technical question as an organizational one.
Sorry; not sure if that's really a constructive comment. But my point is that the first scenario is possibly something that could be relied on, whereas the second probably isn't.
from metalicious.
@jpvelez I agree with the first comment and also agree with @kfogel regarding the latter thought.
Entering the socrata 4x4 is possible. A difficulty to consider--which I don't think is insurmountable, but considerable--is fields from multiple tables in databases may be combined to create one dataset on the data portal. I would consider this a hard-coded relationship.
I keep wondering if the better strategy is to match based on metadata elements produced by Socrata's JSON/XML/or RDF; having metalicious equate those fields with elements from it's own API. Presuming we ever get around to proper RDF tagging, that could be powerful.
/cc: @ianjkalin
from metalicious.
The simplest thing to do would be to have a many to many relationship between databases and Socrata datasets, and not get into the weeds of tables at all.
Matching columns to columns seems like a pain in the ass, unless there's some clever way to do it programmatically.
from metalicious.
So is the proposition to equate a Socrata dataset with a database? For instance, our ERP, FMPS, has purchasing data in addition to a lot of other stuff. We have a purchasing data on the portal. So equating the FMPS database with the portal dataset?
In my initial assessment, I was thinking of equating the columns of Socrata with the columns in Metalicious. Pain-in-the-ass, yes, but very robust and useful (e.g., ETL management, solid accountability for a gov't openness).
What would be the most useful for everyone else on the civic developer side? Could a database-to-dataset link be sufficient?
from metalicious.
If you added column to column reckoning, you'd kinda end up with a way to relate databases to data sets, right? While it does seem like a pain, I'd say it's probably going to be the best way to ensure you're not engineering yourself into a corner.
from metalicious.
On the Socrata side, I'd love to see the mapping from source columns to published dataset columns.
Tom, I think you are right that this could be very useful in terms of exposing the ETL needed and accountability.
Next step, though, would be to have tooling to hook up the originating columns to the Socrata columns on dataset import time.
from metalicious.
Putting my 2c in here:
Start with linking each database in datadictionary.cityofchicago.org to the relevant datasets on data.cityofchicago.org and vice-versa on each dataset in the data portal.
This will go a long ways towards figuring out what data is open/available and what each field means on datasets that have been released on the data portal. The current system of publishing one-off docs kinda works (lots of these seem to be missing now, btw), but the data dictionary is a clearly superior replacement for it.
from metalicious.
Related Issues (20)
- Add data model for GIS data
- Add list of all available data sources HOT 5
- Add field for "Data Types"
- Business function pages should list databases HOT 1
- Business function pages should list databases in sortable tables HOT 1
- Tying metalicious databases to data portal datasets HOT 2
- No CLA
- Importing databases using extracts from DB reverse-engineering programs HOT 1
- Port to Ruby ? HOT 9
- Data previews? HOT 1
- ETL scripts HOT 3
- Complete contributing.md documentation
- Document building and zoning data systems
- Document file-format for upload to Metalicious
- Create documentation in ReadTheDocs
- Missing create/assigning user type in README
- Metalicious_DB.sql: Procedure Definer
- Show relationships between tables
- Allow attachments
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from metalicious.