opendataservices / org-ids Goto Github PK

Front end application for http://org-id.guide

License: Other

Python 41.03% CSS 20.28% HTML 37.30% JavaScript 1.40%

org-ids's Introduction

org-id.guide

We are creating a simple process, tool and codelist to enable data publishers and users to create and use joined up data that identifies organisations.

This involves

Maintaining an list of organisation identifier lists;
Developing a methodology for updating the list
Providing simple lookup tools, and guidance on choosing the best identifiers to use

The register of organisation identifier lists

An organisation identifier list is any list that contains at least an identifier, and a name, for a collection of organisations.

Building on the IATI Organisation Registration Agency codelist we are creating an updated register of organisation identifier lists.

This list will contain detailed meta-data on the nature of the identifiers provided, the coverage of identifier lists. It will provide a unique code to identify each list.

This code can be used as a prefix to create simple identifier strings, or can be used as the 'scheme' in a two-part identifier.

For example:

The code for the organisation identifier list provided by UK Companies House is 'GB-COH'. The identifier assigned to Open Data Services Co-operative Ltd in this list is '09506232'. Putting this together allows a dataset to unambiguously identify Open Data Services Co-operative Ltd as:

GB-COH-09506232

or in a table such as:

Organisation ID Scheme	Organisation ID
GB-COH	09506232

Developing the list of lists

We are prototyping our updated register on GitHub: you can find codelists in the /codes/ directory.

These are structured based on the list-schema.json JSON Schema in the /schema/ directory.

We have imported codes from a range of sources, and have been updating these based on the process in our Researchers Handbook.

Only those entries with a "confirmed":true have been reviewed and should be relied upon. All others should be treated as provision.

Help us out

Pull requests to update any codes, or suggest new codes are welcome.

List Finder Django App

Installation

Steps to installation:

Clone the repository
Change into the cloned repository
Create a virtual environment (note this application uses python3)
Activate the virtual environment
Install dependencies
Apply migrations
Run the development server

git clone https://github.com/OpenDataServices/org-ids.git
cd org-ids
virtualenv .ve --python=/usr/bin/python3
source .ve/bin/activate
pip install -r requirements_dev.txt
python manage.py migrate
python manage.py runserver

Tools

Setup

The scripts in tools/ have a number of requirements.

Set-up a virtual environment to easily install these.

virtualenv --python=/usr/local/bin/python3 .ve
source .ve/bin/activate
pip install -r requirements.txt

org-ids's People

Contributors

Stargazers

Watchers

Forkers

andylolz practicalparticipation ccmdesign-archives idlemoor bobharper1 rowhit alexkapone theaschepers

org-ids's Issues

Retire airtable or update integration

The AirTable is now out of sync with the codelist schema and management approach.

We either need to retire it, or update it so that it reflects the list of lists appropriately.

Add a footer

I'd suggest we go with something similar to the one we have on CoVE for now. I'm happy to add that.

It would be useful to mention and link to partners @timgdavies ?

Create data sample

Deploy at new domain

Subject to final confirmation, we will be switching the branding of the project to 'org-id' by the end of this sprint.

I think this will require us to:

Register the new domain (after agreeing the exact domain name and tld to use)
Point it to the new site
Consolidate explanatory from identify-org.net into the new site
Move the identify-org.net blog content to run at blog.NEWDOMAIN (still via Wordpress)

Demonstrator of search via an embeddable widget

We should have a demonstration of how information from the Org ID API can be integrated into third party websites through simple pop-up widgets that help users fill in organisation identifier fields.

Consolidated list in XML, JSON and CSV formats

It should be possible for users to download copies of all confirmed codes (including deprecated codes, suitably marked) in:

IATI XML Codelist format
CSV
JSON

Containing at least:

Title
Description
Code
URL

And if appropriate for file-size etc. also:

Finding the identifiers
Example Identifiers
Data access details (e.g. CSV, JSON etc.)
Data license
Data features (shareholders, address etc.)

Add terms and conditions

Needs a terms and conditions page

Regional Jurisdiction Codes

At the moment we allow national jurisdictions only (e.g. each list must either be specific to one or more countries, or have jurisdiction left blank).

However, PublicBodies.org has a list of European Union bodies (http://publicbodies.org/eu)

Should we have jurisdiction codes for regional blocks? Or some sort of multilateral code?

Consider using language of 'Non-profit' rather than 'Charity'

For international work, many organisations we are currently structuring as 'charity' might be better represented as 'NGOs'.

A language of 'Non-profit' at the top level, with Charity and NGO as categories under this, might help.

Suggestion: Canada Government Identifiers

Heard from Madrid conversations that Canada does have an organisation identifier for government agencies.

Managed in public accounts systems. Need to ask if this can be opened up.

Consider asking Open Gov Canada twitter accounts.

Set up sentry and piwik for deploys

Create Django project

Kenya duplicate

ke-cr.json
ke-rco.json

These appear to be duplicates, both referring to the Register of Companies.

Updated search and results interface

Because the algorithm defined in #37 needs to be able to be shared with people independently of this implementation, consideration of how we present the UI for org-id.WHATEVER is being considered separately

Support for multiple languages in interface

All the codelists and the schema for the codes, now have multi-language support (using language maps).

As we build the new interface we should be i8ln aware.

Easily edited static pages

@timgdavies to supply copy

Access to the changelog for each list

Lists get updated over time. It would be useful from a list page for users to be able to see the history of changes to a list (and if any changes are pending).

One possible way to do this would be to access the GitHub Commit API for the relevant code and see what we can display to the user from that.

E.g. https://api.github.com/repos/opendataservices/org-ids/commits?path=codes/us/us-hi-cr.json

New theme

We just need to find something that looks pretty, makes static pages easy.

We needs a favicon!

Malasyian Identifier Sources

From Swee Meng on Sinar Project.

Company Commission Number
Construction Industry Development Board - own company register and identifiers (secondary)
Government Contracts Vendor ID (found on some procurement lists)

Non-profits:

Registrar of Society (but not many registered...)

Government Agency

Check with Sinar about source of documents. Sinar open spending has a source.

Filtering for sub-categories (e.g. sub-jurisdictions or sub-structures)

This is a new dimension to the algorithm (impacting #37) and UI (impacting #38)

Individual list pages

It should be possible to visit a URL such as:

http://org-id.TLD/lists/GB-COH

and get a page which includes all the meta-data about this list and ideally action-focussed UI that:

Links to where to search for IDs;
Provides details of how to suggest changes or updates;

README say to clone org-id not org-ids

Address of repo is wrong and therefore so is cd command

Move away from terminology of 'prefix'

At IODC someone suggested it could be more understandable to talk about finding the right registration body, rather than initially about "prefixes". (Although, "prefix" is probably still a useful term to use in the specifics of how identifiers get concatenated).

Netherlands Overheid.nl Web Metadata Standard

Based on conversations with @rolfkleef at OGP I've added:

NL-OWMS to the codelist (visible more easily in AirTable here)

@rolfkleef Would you be able to check that I've described the service there right.

This has also raised two interesting issues around:

Legal characters for identifiers (e.g. 's-Graveland as an identifier where it looks like the ' breaks even overheid.nl's own RDF and N3 downloads versions!)
The guidance we should give when URIs are available.

I'll open separate issues for these.

Output the codelist (investigation)

Eventually, this list will replace: http://iatistandard.org/202/codelists/OrganisationRegistrationAgency/

We should offer support for the formats and outputs that IATI currently provide - so providing the list accordingly will be a task

(however, there may be a number of related / pre tasks to this, but want to log this issue initially)

Establish user journeys for each of the user stories in the work plan

The end point that we discussed was a display of whether each ID was primary/secondary/tertiary and some guidance text about when a particular ID would be most appropriate (~50 words). Given the users, we think it's OK for them to have to do the final bit of filtering / decision making themselves, rather than just being given the answer

Add some tests

We should start to write some tests now.

Make favicon

Legal characters in a full identifier

We have agreed that prefixes should stick to a limited set of characters.

But what about the identifier portion itself?

For example, what if a source identifier contains extended unicode characters, punctuation (e.g. ' or / or +) which might not be valid in some systems.

Should we propose a substitution approach for these cases? Or accept that such data is valid?

Add a CONTRIBUTING.md doc

As I'm doing this elsewhere at the moment it's not hassle to add it in here now

See all entries (table view)

At IODC it was suggested that a "see all" of the entries on the list would be useful - akin to the current code list published at http://iatistandard.org/202/codelists/OrganisationRegistrationAgency/ -

API Access to Search Results

Users should be able to send queries to an API, and get back a JSON representation of a list of lists, with ranking already applied.

Piwik

Including Need to Know information in the interface (and schemas)

In our early AirTable prototype we introduced the concept of 'Need to Know' content - which is bound to country, sector and organisation type in the same way as a list, and should be shown to users when relevant selections are made.

Need To Know content contains things like explanations of the history of organisation registration in a given country, or caveats that someone should be aware of around charity registers in general etc.

There is work to be done on:

Editing interfaces for Need to Know content
Presenting Need to Know content in search results
Validating the utility of this content

Our list search interface should explain why one list is ranked above another

When presenting users with a suggested source of organisation identifiers to use, we apply a logic to 'rank' lists.

For example:

If the organisation has a UK Company number, this is preferable to a UK Charity number;
If a list is available as open data, then prefer that to a close data list;

We have now collected the meta-data to be able to provide these rankings (whether a list is primary or secondary; whether open data is available etc.) but it can be difficult for users to understand why one identifier is preferred over another. For example, in 360 Giving, where information is about charities, data publishers are confused as to why they should use a company, rather than charity number.

We should consider how our reference interface can explain to users the rankings, for example, I've mocked up a message (still a bit wordy) which could be shown between primary and secondary list entries.

If an identifier from 'Companies House' is available for this organisation, this should be used in preference to any of the following identifier sources.

This is because 'Companies House' provides a primary identifier: given to a legal entity at the moment it is created, and that persists for as long as the organisation exists.

This is different from a secondary identifier, which is about the status of an organisation (e.g. whether it is a registered charity; or is registered for tax). These identifiers are more likely to change over time, even when the organisation continues to exist.

Recommending use of URIs when available

Our guidance on recording organisation identifiers should include information on including URIs whenever these are available, particularly when they are officially published URIs with structured data available from them.

See e.g. #30.

URI-Compatible Naming Scheme

From @rolfkleef on #28

Re: URIs

I'd be in favour of pushing towards a URI-compatible naming scheme... in IATI, there are various other codelists as well that contain a vocabulary identifier with a code within that vocabulary.

An "org id vocabulary"-ontology, where information about a particular registrar could be made available as properties in e.g. the joinedupdata.org store, seems useful.

I think this links to our schema approach, and documentation of how URIs should be handled within data as well (see #31)

Create JSON Schema for our format

We should create a JSON Schema to describe the format we are using for codelist entries.

This would:

(a) Ensure we are have clearly defined the meaning of each field;
(b) Support third-parties to create valid information;
(c) Support testing of pull requests;
(d) Potentially help power future editing interfaces;

Questions to address:

How much do we embed the codelists into the schema? (e.g. organisation types and jurisdictions)

Update terminology from "Jurisdiction" to "Coverage"

We have been using the term 'Jurisdiction' to talk about the top-level country that a list applies within (as well as lists with XI (International) or XR (regional) scope).

This is not an entirely accurate use of the term, so I propose changing to talk of 'Geographic coverage' and to update the term in our data model to coverage also.

Interface for submitting new lists and edits to existing lists

From the list page it should possible for a user to report changes required to a list record.

From the homepage there should be an option to submit a new list.

The user could be offered two routes:

Submit a comment - which we would essentially be a public post/issue somewhere for us to action;
Submit new entry/updates in structured form - which would present an editing interface based on the schema for generating the JSON, and/or guidance on how to do this manually in GitHub.

Ideally these submissions would go directly into our GitHub workflows.

(I've done some testing with the Json Schema editor here and put relevant properties in the new schema here that gives us a possible out-of-the-box editing interface).

Including list<->list mappings in the interface and data

We have the concept of list mappings, to identify cases where the data may exist to translate one kind of list identifiers into another (e.g. where a lookup exists to convert from a charity number to a company number).

This would be useful information to display in the interface to users.

To get there we need to:

Improve how mapping information is captured within our data;
Find a way to display mapping information in the interface;

Reverse lookup

At IODC someone suggested it would be useful to be able to do the opposite lookup that the current tool provides. ie. given that I have a particular identifier, what can you tell me about the prefix that it has.

Decide on the SSOT for the list

as per #19 @Bjwebb says

I think how we do this depends quite a bit on where the SSOT of the data is stored. Currently for our project this is airtable, but I'm not convinced that's going to be the correct choice going forward.

opendataservices / org-ids Goto Github PK

org-ids's Introduction

org-id.guide

The register of organisation identifier lists

For example:

Developing the list of lists

Help us out

List Finder Django App

Installation

Tools

Setup

org-ids's People

Contributors

Stargazers

Watchers

Forkers

org-ids's Issues

Recommend Projects

Recommend Topics

Recommend Org