chicago / metalicious Goto Github PK
View Code? Open in Web Editor NEWAn open source data dictionary which can be deployed to track the metadata of one or more databases.
License: Other
An open source data dictionary which can be deployed to track the metadata of one or more databases.
License: Other
The field/variable element should include a flag of whether the data was published to an open data portal.
Data dictionaries are super useful, but a bit abstract. Showing a preview of the data itself would allow people to immediately grasp the issue. (Obviously, this would take a lot of effort.)
Here's an example: http://www.civicdata.com/dataset/generic_vrecord_1483/resource/27dcfbec-05a5-4993-bdc1-6e372bd764f3
While the data dictionary was designed to be driven primarily through a search box, we need to improve advanced search filtering.
The most basic aspects of the functionality would be to add filtering for 'database', 'table', or 'field' results.
Website will not accept login attempts. Generates error:
The user specified as a definer ('citydata'@'hyperion.chapinhall.org') does not exist
Bootstrap icons do not appear when Metalicious is deployed to a subdirectory. It does work when deployed to the root directory.
Add some feature that allows for listing all data sources available without having to search or drill down by Business Function. I could imagine people getting frustrated because they were not sure what search term to use or what business function covered whatever they were trying to find. Also, there is a certain transparency value to a complete list, with which people can do as they like, even if it ends up being really long.
The procedures defined in the Metalicious_DB.sql file are expecting a user citydata
@hyperion.chapinhall.org
to be defined in the database. Would be nice to define this myself during installation in a different fashion than using find/replace on the file to change the user/host.
The databases in the the dictionary are the sources for all/most of the datasets on a city's data portal.
To get complete transparency, and to allow cities to keep track of what ETL scripts they have floating around that get data from these systems and put it on their portal, it would be excellent to be able to associate databases in metalicious with datasets on a data portal.
You could keep this pretty lightweight - just associate a list of Socrata dataset id's with a given database in metalicious, and get fetch any additional data you might want about these datasets - their name, number of downloads, when they were last updated, whatever - from the Socrata Open Data API.
This would be a similar approach to how we built the project repository for the open gov hack night website - by only entering in Github repos and fetching all other data from the Github API. The code for that is here.
Allow users to add new users and create permission levels from the website through a UI.
The process of uploading a new database into the platform is rather complicated, especially if one hopes to document a large universe of databases. While the platform does a good job of displaying information, burdensome import procedures Meanwhile, a number of popular software programs will reverse-engineer a database schemas and provides a good first-step.
It would be ideal if one could (1) reverse-engineer databases quickly using software; (2) export that reverse-engineered schema; (3) upload that extract to ; (4) have parse and extract the relevant information.
We will begin to incorporate compatibility with Oracle Data Modeler and bulk-importing data obtained from the Data Modeler's "Export to CSV" option. Oracle Data Modeler is free and compatible for Oracle, MySQL, and Access databases. For City of Chicago operations, this will cover a large portion of the city portfolio.
Other platforms are also good candidates for export-to-upload functionality, such as ERwin or SQuirreL.
After creating a user, there needs to be a step to create the "Administrator" user type as well as assigning that user type to the newly created user.
The business function pages on metalicious do a great job listing the databases available within each corner of the city. But the tool overall is still oriented around search.
After logging-in with administrator credentials, a non-functional dropdown menu labeled "Rod Howard" appears.
Such a system will need to provide database information, schema, and associated descriptions. An API functionality will add machine-readable capabilities to the portal to enable third-party access and avoid the inevitable screen scraper (please put down the Python script).
Often, APIs could be used to create separate applications that provide different functionality, such as: an application that allows you to "upvote" fields to be the publish wishes to see on to the data portal, help integrate the data portal and data dictionary (see #19), integration with other tools (e.g., enterprise asset management), or simply redisplay the information.
One simple element is to display the schema of the associated DBF file or a GeoJSON representation.
Forgot to upload the CLA to the CONTRIBUTING file. Will upload ASAP.
Capture high-level classification of data types, e.g., PII, HIPPA, PCI, FOIA, etc. for each field.
Provide better documentation on the format for uploading files to Metalicious. Provide examples and document it within the README.
I think there might be more developer support in Chicago if this were written in Ruby. What do other contributors (existing and potential) think? Anybody reading this who would like to contribute and would prefer to work in Ruby?
Unable to login.
Apache logs shows the following notice:
PHP Notice: Trying to get property of non-object in /var/www/metalicious/login.php on line 19...
The instructions in the README.md file are becoming rather long. There is a need to move to a more appropriate, structured instruction method. Preference is to use ReadTheDocs to document installation, configuration, and documentation instructions.
ETL scripts are programs that take data from a database/spreadsheet/data source, maybe transform it a bit, and upload it to a data portal for public consumption.
The City of Chicago has hundreds of these running all the time. That's how the data portal stays up to data - for the most part, people aren't manually transferring data.
Since this is City code, shouldn't it be open source? (Possible security issues here, but just spitballing.) You could imagine a repo that would have all the ETL scripts, and a little JSON file tying each ETL script to it's data source on metalicious and its dataset on the portal.
This repo would help with ETL management. But there's more to it: if the ETL scripts were then linked to from metalicious, the data dictionary would provide complete transparency: here's what databases we have, here's where we make them public, and here's the code that does that. I imagine this would be most useful for other cities looking to start open data programs.
Some items still to be included:
Code is still not able to fully deploy in any location as it contains $_SERVER[DOCUMENT_ROOT] in several locations.
It would be extremely useful to understand the internals of City systems that touch on buildings and zoning. Specifically, whatever systems are powering the following websites:
There are lots of fields exposed across these systems that are not available in any of the relevant open datasets - building footprints, permits, violations, and the like - and it would be great to understand the data / software universe they come from.
A column called "Databases" is used, which creates issues when directly querying the table.
Fixing issue #10 created an error on the main homepage where the /include/dbconnopen could not be opened.
Could be used to provide Entity-Relationship Diagrams and other useful documentation.
Need to add SSL security for transmission of password at administrator login screen.
Settings drop-down menu not working on admin page. Does not execute any action. Works on other pages.
Importing data into the data dictionary is not ideal, with some substantial manual work.
Future iterations should be compatible with programs that reverse-engineer database schemas and export that information. One example is Oracle Data Modeler, which will reverse engineer Oracle and MySQL databases then can export to CSV. These should be able to be bulk-imported into the platform.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.