Giter Club home page Giter Club logo

Comments (10)

danxoneil avatar danxoneil commented on August 12, 2024

With, perhaps, a direct link to the corresponding page on the portal.

from metalicious.

jpvelez avatar jpvelez commented on August 12, 2024

Here's a simple, relatively low-maintenance way to flag and link:

The databases in the the dictionary are the sources for all/most of the datasets on a city's data portal.

To get complete transparency, and to allow cities to keep track of what ETL scripts they have floating around that get data from these systems and put it on their portal, it would be excellent to be able to associate databases in metalicious with datasets on a data portal.

You could just associate a list of Socrata dataset id's with a given database in metalicious, and fetch any additional data you might want about these datasets - their name, number of downloads, when they were last updated, whatever - from the Socrata Open Data API.

This would be a similar approach to how we built the project repository for the open gov hack night website - by only entering in Github repos and fetching all other data from the Github API. The code for that is here.

from metalicious.

jpvelez avatar jpvelez commented on August 12, 2024

Come to think of it, there's probably some easyish way of pulling databases, data portal datsets, and apps together. If developers added civic.json (h/t @ryanbriones) files to their repos that simply pointed to the Socrata datasets they use, then you could imagine metalicious easily listing downstream project repos for each database by hitting the Github API.

Thoughts @derekeder, @kfogel, @evz?

from metalicious.

kfogel avatar kfogel commented on August 12, 2024

Well, it's always the question of who is the "you" in "you could just associate a list of ${THINGS} with ${OTHER_THINGS}", I think. It might make sense to expect someone maintaining a metalicious database to also include Socrata dataset IDs -- after all, that person or org is already maintaining the metalicious database in the first place, and this is just another bit of metadata (more or less).

But moving from that to the crowd of unassociated developers who write apps that use a particular database is a different matter. Some of those developers will get the memo and use civic.json to associate their app with the upstream DB, and some won't. It's not so much a technical question as an organizational one.

Sorry; not sure if that's really a constructive comment. But my point is that the first scenario is possibly something that could be relied on, whereas the second probably isn't.

from metalicious.

tomschenkjr avatar tomschenkjr commented on August 12, 2024

@jpvelez I agree with the first comment and also agree with @kfogel regarding the latter thought.

Entering the socrata 4x4 is possible. A difficulty to consider--which I don't think is insurmountable, but considerable--is fields from multiple tables in databases may be combined to create one dataset on the data portal. I would consider this a hard-coded relationship.

I keep wondering if the better strategy is to match based on metadata elements produced by Socrata's JSON/XML/or RDF; having metalicious equate those fields with elements from it's own API. Presuming we ever get around to proper RDF tagging, that could be powerful.

/cc: @ianjkalin

from metalicious.

jpvelez avatar jpvelez commented on August 12, 2024

The simplest thing to do would be to have a many to many relationship between databases and Socrata datasets, and not get into the weeds of tables at all.

Matching columns to columns seems like a pain in the ass, unless there's some clever way to do it programmatically.

from metalicious.

tomschenkjr avatar tomschenkjr commented on August 12, 2024

So is the proposition to equate a Socrata dataset with a database? For instance, our ERP, FMPS, has purchasing data in addition to a lot of other stuff. We have a purchasing data on the portal. So equating the FMPS database with the portal dataset?

In my initial assessment, I was thinking of equating the columns of Socrata with the columns in Metalicious. Pain-in-the-ass, yes, but very robust and useful (e.g., ETL management, solid accountability for a gov't openness).

What would be the most useful for everyone else on the civic developer side? Could a database-to-dataset link be sufficient?

from metalicious.

evz avatar evz commented on August 12, 2024

If you added column to column reckoning, you'd kinda end up with a way to relate databases to data sets, right? While it does seem like a pain, I'd say it's probably going to be the best way to ensure you're not engineering yourself into a corner.

from metalicious.

willpugh avatar willpugh commented on August 12, 2024

On the Socrata side, I'd love to see the mapping from source columns to published dataset columns.

Tom, I think you are right that this could be very useful in terms of exposing the ETL needed and accountability.

Next step, though, would be to have tooling to hook up the originating columns to the Socrata columns on dataset import time.

from metalicious.

derekeder avatar derekeder commented on August 12, 2024

Putting my 2c in here:

Start with linking each database in datadictionary.cityofchicago.org to the relevant datasets on data.cityofchicago.org and vice-versa on each dataset in the data portal.

This will go a long ways towards figuring out what data is open/available and what each field means on datasets that have been released on the data portal. The current system of publishing one-off docs kinda works (lots of these seem to be missing now, btw), but the data dictionary is a clearly superior replacement for it.

from metalicious.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.