Giter Club home page Giter Club logo

org-ids's Introduction

org-id.guide

We are creating a simple process, tool and codelist to enable data publishers and users to create and use joined up data that identifies organisations.

This involves

  • Maintaining an list of organisation identifier lists;
  • Developing a methodology for updating the list
  • Providing simple lookup tools, and guidance on choosing the best identifiers to use

The register of organisation identifier lists

An organisation identifier list is any list that contains at least an identifier, and a name, for a collection of organisations.

Building on the IATI Organisation Registration Agency codelist we are creating an updated register of organisation identifier lists.

This list will contain detailed meta-data on the nature of the identifiers provided, the coverage of identifier lists. It will provide a unique code to identify each list.

This code can be used as a prefix to create simple identifier strings, or can be used as the 'scheme' in a two-part identifier.

For example:

The code for the organisation identifier list provided by UK Companies House is 'GB-COH'. The identifier assigned to Open Data Services Co-operative Ltd in this list is '09506232'. Putting this together allows a dataset to unambiguously identify Open Data Services Co-operative Ltd as:

GB-COH-09506232

or in a table such as:

Organisation ID Scheme Organisation ID
GB-COH 09506232

Developing the list of lists

We are prototyping our updated register on GitHub: you can find codelists in the /codes/ directory.

These are structured based on the list-schema.json JSON Schema in the /schema/ directory.

We have imported codes from a range of sources, and have been updating these based on the process in our Researchers Handbook.

Only those entries with a "confirmed":true have been reviewed and should be relied upon. All others should be treated as provision.

Help us out

Pull requests to update any codes, or suggest new codes are welcome.

List Finder Django App

Installation

Steps to installation:

  • Clone the repository
  • Change into the cloned repository
  • Create a virtual environment (note this application uses python3)
  • Activate the virtual environment
  • Install dependencies
  • Apply migrations
  • Run the development server
git clone https://github.com/OpenDataServices/org-ids.git
cd org-ids
virtualenv .ve --python=/usr/bin/python3
source .ve/bin/activate
pip install -r requirements_dev.txt
python manage.py migrate
python manage.py runserver

Tools

Setup

The scripts in tools/ have a number of requirements.

Set-up a virtual environment to easily install these.

virtualenv --python=/usr/local/bin/python3 .ve
source .ve/bin/activate
pip install -r requirements.txt

org-ids's People

Contributors

andylolz avatar bjwebb avatar bobharper1 avatar caprenter avatar dependabot[bot] avatar edugomez avatar idlemoor avatar kd-ods avatar kindly avatar odscjames avatar rbika avatar rhiaro avatar robredpath avatar timgdavies avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

org-ids's Issues

Retire airtable or update integration

The AirTable is now out of sync with the codelist schema and management approach.

We either need to retire it, or update it so that it reflects the list of lists appropriately.

Add a footer

I'd suggest we go with something similar to the one we have on CoVE for now. I'm happy to add that.

It would be useful to mention and link to partners @timgdavies ?

Deploy at new domain

Subject to final confirmation, we will be switching the branding of the project to 'org-id' by the end of this sprint.

I think this will require us to:

  • Register the new domain (after agreeing the exact domain name and tld to use)
  • Point it to the new site
  • Consolidate explanatory from identify-org.net into the new site
  • Move the identify-org.net blog content to run at blog.NEWDOMAIN (still via Wordpress)

Demonstrator of search via an embeddable widget

We should have a demonstration of how information from the Org ID API can be integrated into third party websites through simple pop-up widgets that help users fill in organisation identifier fields.

Consolidated list in XML, JSON and CSV formats

It should be possible for users to download copies of all confirmed codes (including deprecated codes, suitably marked) in:

  • IATI XML Codelist format
  • CSV
  • JSON

Containing at least:

  • Title
  • Description
  • Code
  • URL

And if appropriate for file-size etc. also:

  • Finding the identifiers
  • Example Identifiers
  • Data access details (e.g. CSV, JSON etc.)
  • Data license
  • Data features (shareholders, address etc.)

Regional Jurisdiction Codes

At the moment we allow national jurisdictions only (e.g. each list must either be specific to one or more countries, or have jurisdiction left blank).

However, PublicBodies.org has a list of European Union bodies (http://publicbodies.org/eu)

Should we have jurisdiction codes for regional blocks? Or some sort of multilateral code?

Suggestion: Canada Government Identifiers

Heard from Madrid conversations that Canada does have an organisation identifier for government agencies.

Managed in public accounts systems. Need to ask if this can be opened up.

Consider asking Open Gov Canada twitter accounts.

Kenya duplicate

ke-cr.json
ke-rco.json

These appear to be duplicates, both referring to the Register of Companies.

Updated search and results interface

Because the algorithm defined in #37 needs to be able to be shared with people independently of this implementation, consideration of how we present the UI for org-id.WHATEVER is being considered separately

New theme

We just need to find something that looks pretty, makes static pages easy.

We needs a favicon!

Malasyian Identifier Sources

From Swee Meng on Sinar Project.

  • Company Commission Number
  • Construction Industry Development Board - own company register and identifiers (secondary)
  • Government Contracts Vendor ID (found on some procurement lists)

Non-profits:

  • Registrar of Society (but not many registered...)

Government Agency

  • Check with Sinar about source of documents. Sinar open spending has a source.

Individual list pages

It should be possible to visit a URL such as:

http://org-id.TLD/lists/GB-COH

and get a page which includes all the meta-data about this list and ideally action-focussed UI that:

  • Links to where to search for IDs;
  • Provides details of how to suggest changes or updates;

Move away from terminology of 'prefix'

At IODC someone suggested it could be more understandable to talk about finding the right registration body, rather than initially about "prefixes". (Although, "prefix" is probably still a useful term to use in the specifics of how identifiers get concatenated).

Netherlands Overheid.nl Web Metadata Standard

Based on conversations with @rolfkleef at OGP I've added:

NL-OWMS to the codelist (visible more easily in AirTable here)

@rolfkleef Would you be able to check that I've described the service there right.

This has also raised two interesting issues around:

  • Legal characters for identifiers (e.g. 's-Graveland as an identifier where it looks like the ' breaks even overheid.nl's own RDF and N3 downloads versions!)
  • The guidance we should give when URIs are available.

I'll open separate issues for these.

Establish user journeys for each of the user stories in the work plan

The end point that we discussed was a display of whether each ID was primary/secondary/tertiary and some guidance text about when a particular ID would be most appropriate (~50 words). Given the users, we think it's OK for them to have to do the final bit of filtering / decision making themselves, rather than just being given the answer

Legal characters in a full identifier

We have agreed that prefixes should stick to a limited set of characters.

But what about the identifier portion itself?

For example, what if a source identifier contains extended unicode characters, punctuation (e.g. ' or / or +) which might not be valid in some systems.

Should we propose a substitution approach for these cases? Or accept that such data is valid?

API Access to Search Results

Users should be able to send queries to an API, and get back a JSON representation of a list of lists, with ranking already applied.

Including Need to Know information in the interface (and schemas)

In our early AirTable prototype we introduced the concept of 'Need to Know' content - which is bound to country, sector and organisation type in the same way as a list, and should be shown to users when relevant selections are made.

Need To Know content contains things like explanations of the history of organisation registration in a given country, or caveats that someone should be aware of around charity registers in general etc.

There is work to be done on:

  • Editing interfaces for Need to Know content
  • Presenting Need to Know content in search results
  • Validating the utility of this content

Our list search interface should explain why one list is ranked above another

When presenting users with a suggested source of organisation identifiers to use, we apply a logic to 'rank' lists.

For example:

  • If the organisation has a UK Company number, this is preferable to a UK Charity number;
  • If a list is available as open data, then prefer that to a close data list;

We have now collected the meta-data to be able to provide these rankings (whether a list is primary or secondary; whether open data is available etc.) but it can be difficult for users to understand why one identifier is preferred over another. For example, in 360 Giving, where information is about charities, data publishers are confused as to why they should use a company, rather than charity number.

We should consider how our reference interface can explain to users the rankings, for example, I've mocked up a message (still a bit wordy) which could be shown between primary and secondary list entries.

image

If an identifier from 'Companies House' is available for this organisation, this should be used in preference to any of the following identifier sources.

This is because 'Companies House' provides a primary identifier: given to a legal entity at the moment it is created, and that persists for as long as the organisation exists.

This is different from a secondary identifier, which is about the status of an organisation (e.g. whether it is a registered charity; or is registered for tax). These identifiers are more likely to change over time, even when the organisation continues to exist.

URI-Compatible Naming Scheme

From @rolfkleef on #28

Re: URIs

I'd be in favour of pushing towards a URI-compatible naming scheme... in IATI, there are various other codelists as well that contain a vocabulary identifier with a code within that vocabulary.

An "org id vocabulary"-ontology, where information about a particular registrar could be made available as properties in e.g. the joinedupdata.org store, seems useful.

I think this links to our schema approach, and documentation of how URIs should be handled within data as well (see #31)

Create JSON Schema for our format

We should create a JSON Schema to describe the format we are using for codelist entries.

This would:

(a) Ensure we are have clearly defined the meaning of each field;
(b) Support third-parties to create valid information;
(c) Support testing of pull requests;
(d) Potentially help power future editing interfaces;

Questions to address:

  • How much do we embed the codelists into the schema? (e.g. organisation types and jurisdictions)

Update terminology from "Jurisdiction" to "Coverage"

We have been using the term 'Jurisdiction' to talk about the top-level country that a list applies within (as well as lists with XI (International) or XR (regional) scope).

This is not an entirely accurate use of the term, so I propose changing to talk of 'Geographic coverage' and to update the term in our data model to coverage also.

Interface for submitting new lists and edits to existing lists

From the list page it should possible for a user to report changes required to a list record.

From the homepage there should be an option to submit a new list.

The user could be offered two routes:

  • Submit a comment - which we would essentially be a public post/issue somewhere for us to action;

  • Submit new entry/updates in structured form - which would present an editing interface based on the schema for generating the JSON, and/or guidance on how to do this manually in GitHub.

Ideally these submissions would go directly into our GitHub workflows.

(I've done some testing with the Json Schema editor here and put relevant properties in the new schema here that gives us a possible out-of-the-box editing interface).

Including list<->list mappings in the interface and data

We have the concept of list mappings, to identify cases where the data may exist to translate one kind of list identifiers into another (e.g. where a lookup exists to convert from a charity number to a company number).

This would be useful information to display in the interface to users.

To get there we need to:

  • Improve how mapping information is captured within our data;
  • Find a way to display mapping information in the interface;

Reverse lookup

At IODC someone suggested it would be useful to be able to do the opposite lookup that the current tool provides. ie. given that I have a particular identifier, what can you tell me about the prefix that it has.

Decide on the SSOT for the list

as per #19 @Bjwebb says

I think how we do this depends quite a bit on where the SSOT of the data is stored. Currently for our project this is airtable, but I'm not convinced that's going to be the correct choice going forward.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.