Giter Club home page Giter Club logo

ckanext-iati's Introduction

International Aid Transparency Initiative (IATI) Registry Extension for CKAN

https://api.travis-ci.org/ViderumGlobal/ckanext-iati.svg?branch=master

Installation

The current version of ckanext-iati has been developed and tested again CKAN 2.9.1. We assume a running CKAN 2.9.1 instance.

The installation has the following steps, assuming you have a running copy of CKAN:

  1. Install the extension from its source repository:

    (pyenv) $ pip install -e git+https://github.com/IATI/ckanext-iati#egg=ckanext-iati
    
  2. Install dependencies:

    (pyenv) $ pip install -r ckanext-iati/pip-requirements.txt
    

Set up the configuration options as described in the Configuration section.

Migrating from the old Registry version

The previous version of the registry run on CKAN 1.5.1. To upgrade the database follow the following steps:

  1. Backup the CKAN 1.5.1 database

  2. Run the normal update command:

    (pyenv) $ cd ckan
    (pyenv) $ paster db upgrade
    
  3. Run the SQL script to transform Groups to Organizations:

    sudo -u postgres psql -f ckanext-iati/scripts/groups_to_orgs.sql
    
  4. Edit the users_to_members.py script with a suitable API key and run it to create members for the migrated organizations:

    (pyenv) $ python ckanext-iati/scripts/users_to_members.py
    
  5. Run a final SQL script to cleanup the database (may take a long time):

    sudo -u postgres psql -f ckanext-iati/scripts/cleanup_db.sql
    

Configuration

Create a sysadmin user called iati-archiver and note down its API key, you will need to add it to the ini file:

(pyenv) $ cd ckan
(pyenv) $ paster sysadmin add iati-archiver

These are the configuration options used by the extension (generic options like ckan.site_id, solr_url, etc are not included):

# Load only these four plugins
ckan.plugins = iati_publishers iati_datasets iati_theme iati_csv

# Needed for the search facets to be displayed properly until #599 is
# fixed on CKAN core
search.facets.default=1000

# File preview service URL and CSV export service URL.
# If these are commented out, the links won't appear in the frontend
iati.preview_service = http://tools.aidinfolabs.org/showmydata/index.php?url=%s
iati.csv_service = http://tools.aidinfolabs.org/csv/direct_from_registry/?xml=%s

# User name and API key for the iati-archiver sysadmin user
iati.admin_user.name=iati-archiver
iati.admin_user.api_key={api-key}

# Google Analytics id to be used when inserting the code
# If this option is commented out, the code won't be added to the frontend
iati.google_analytics.id=UA-XXXXXXX-XX

# Email settings
# Make sure smtp_server is properly setted (normally to localhost) the rest
# of the defaults should be good enough:

# Address from where the email notifactions are sent, default is '[email protected]'
#iati.email=

# Subject of the email sent to publishers when activated, default is 'IATI Registry Publisher Activation'
#iati.publisher_activation_email_subject=

# Allowed values for the IATI Standard Version (iati_version) field, default is '1.01 1.02 1.03 1.04 1.05 2.01 2.02 2.03'
#iati.standard_versions

# Set a user agent string used by ckanext-archiver when making requests
ckanext.archiver.user_agent_string = "IATI (CKAN)"

To ensure that the logging for the archiver works fine and prevent permissions problems, use the following logging configuration:

## Logging configuration
[loggers]
keys = root, ckan, ckanext, iati_archiver

[handlers]
keys = console

[formatters]
keys = generic

[logger_root]
level = WARNING
handlers = console

[logger_ckan]
level = INFO
handlers = console
qualname = ckan
propagate = 0

[logger_ckanext]
level = INFO
handlers = console
qualname = ckanext
propagate = 0

[logger_iati_archiver]
level = DEBUG
handlers = console
qualname = iati_archiver
propagate = 0

[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic

[formatter_generic]
format = %(asctime)s %(levelname)-5.5s [%(name)s] %(message)s

To set up the Daily archiver and issue checker, you need to create a cron job that calls the command once a day. See the dedicated section for details.

General workflow

The registry holds Datasets for aid spending data following the IATI Standard. Each CKAN dataset has a single resource, an IATI XML file, which can be of type 'activity' or 'organisation'.

Datasets are created by Publishers, implemented with Organizations in CKAN.

Everyone can register as a User on the registry, and create a Publisher. When a publisher is created, it is set with a state of 'pending', and an email is sent to site administrators (all sysadmins).

Sysadmins can change the state of the Publishers to 'active' to approve it or 'deleted' to disapprove it. Once the Publisher is activated, the user that created it gets an email notification and from that moment they can create datasets.

Datasets can be created or updated via:

  1. The web form
  2. The CSV Importer / Exporter
  3. Third party apps that use the API (eg AidStream)

Main customizations

All different plugins are located in ckanext/iati/plugins.py.

Theme

Custom theme based on a design provided by the client. The main changes are the organization listing page, the search facets as dropdown in the main search page, the dataset page and the datasets listings.

Custom Organizations schema

A number of fields are added to the default group schema in CKAN to store extra metadata about the publishers, using IGroupForm (see the IatiPublishers plugin).

Note that this is not as polished as IDatasetForm, so we still need for instance to manually set up the /publisher routes to point to the group controller. This causes problems sometimes, as the redirects lose the query parameters (or also see eg the publishers_pagination helper function).

Custom Dataset schema

Datasets have also custom fields which are stored as extras (see the IatiDatasets plugin). Datasets also inherit fields from the Publisher they belong to (the ones starting with publisher_. This is done on the after_show hook.

The before_index hook is also used to index the human readable form for the facets.

There is a slightly modified auth function for package_create that checks that the org they user belongs to is active.

Email notifications

Emails notifications are sent:

  • To sysadmins when a new publisher is registered, so they can approve it or not.
  • To users when their publisher has been activated.

The code to actual send the emails is in ckanext/iati/emailer.py

CSV Importer / Exporter

Users can download all metadata for the datasets they have permissions on (ie the ones of their publisher) in a CSV file.

Once updated, the file can be reuploaded and new datasets will be created or existing ones updated.

The code that handles this is in ckanext-iati/ckanext/iati/controllers/spreadsheet.py

Daily archiver and issue checker

A script runs every night in order to download all files, check if they have changed and extract some metadata from the actual contents. It also checks for issues like missing files, wrong formats, etc.

If the contents of the file have changed, the new fields are stored as extras (right now these are number of activities activity_count and last modified date for the data data_updated). The file size is also updated.

Issues are stored as extras as well with three different fields: issue_type, issue_description and issue_date. These are later used to display the issue on the frontend, as well as a filter to find out which datasets have issues on the search page.

There is also an Issue Report for sysadmins that downloads a CSV listing all issues for all datasets (accessible at /report/issues).

To run the archiver manually for all datasets, run the following command (it will take a long time):

cd ckanext-iati
(pyenv) $ paster iati-archiver update -c ../ckan/development.ini

To run it just on a particular dataset:

(pyenv) $ paster iati-archiver update {dataset-name} -c ../ckan/development.ini

To run it on all datasets for a particular publisher:

(pyenv) $ paster iati-archiver update -p {publisher-name} -c ../ckan/development.ini

On a production or staging server you would want to set it up as cron job that runs the command once a day (eg 5 minutes after midnight ). Add the following to the relevant user crontab (generally okfn):

05 00  *   *   *  /usr/lib/ckan/iati/bin/paster --plugin=ckanext-iati iati-archiver update -c /etc/ckan/iati/production.ini >> /tmp/iati_archiver_2_out.log 2>&1

Nightly cronjobs runtimes

Documenting cronjob running times for future reference:

  • Purging of deleted datasets runs at 1 AM
  • Reindexing after purging runs at 2 AM
  • Archiver operation runs at 3 AM

*Times are UTC.

GitHub Repository - Production Code

Repo: https://github.com/IATI/ckanext-iati Branch: master

Copying and License

This material is copyright (c) 2010-2013 Open Knowledge Foundation.

It is open and licensed under the GNU Affero General Public License (AGPL) v3.0 whose full text may be found at:

http://www.fsf.org/licensing/licenses/agpl-3.0.html

This extension uses the TableSorter jQuery plugin by Christian Bach, released under the MIT license.

ckanext-iati's People

Contributors

amercader avatar amy-silcock avatar andreaszenasidi avatar andylolz avatar bjwebb avatar brew avatar cormachallinanderilinx avatar dalepotter avatar dekomote avatar derwas avatar dumyan avatar ericsoroos avatar goranmaxim avatar hayfield avatar jgulic avatar jodiegardiner avatar johnmartin avatar klikstermkd avatar kmbn avatar mbocevski avatar newkdukem avatar nigelbabu avatar polarp avatar pudo avatar sebbacon avatar ss-bhat avatar stevieflow avatar tino097 avatar visar avatar zoranpandovski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ckanext-iati's Issues

Some more frontend tweaks

Just a new issue to compile found issues

  • Remove "There is no description for this organization" from the org module on the publisher page and on the dataset page. Also the members count on the publisher page
  • Breadcrumb on dataset page says Organization
  • Hover menus on top bar (Don't worry about auth for the time being)
  • Add data fields to publisher About page (The same fields that are on the right sidebar + View in larger window on this page)
  • Missing members tab on publisher page

Publisher redirects

publisher/amrefuk > publisher/amrefha
amrefuk-* > amrefha-*

publisher/ausaid > publisher/ausgov
ausaid-ag > ausgov-ag

Declare / display IATI version found in files

Via the overnight script pull out the version="" attribute of the iati-activities or iati-organisation element for each file

Publish this in the record metadata - eg: http://iatiregistry.org/dataset/dfid-af


Only display the version, if it meets one of these values: http://iatistandard.org/codelists/Version/
If a version is not found, or not on this code list, then display n/a


NB: we could also look to a search term on http://iatiregistry.org/dataset - but the issue remains that not all values will be on the Version codelist - one to discuss

CSV uploading animation

As data is being uploaded by the CSV file, include a progress bar / spinning wheel (!) animation to indicate there is progress..........

Messages formatting

I added a couple of notices in the org and dataset pages. They need to have 15px margin-top and be orange/red but the bootstrap classes don't work.
Files are
ckanext/iati/theme/templates/package/read_base.html
ckanext/iati/theme/templates/organization/read_base.html

nbzjetc

r4qnumd

Data package file size limit

On the form for a new package edit this text for the URL field

from

This is the Internet link directly to the data - by selecting this link in a web browser, the user will immediately download the full dataset. All files should be in XML following the IATI standard. Note that datasets are not hosted on this site, but by the publisher of the data.

to

This is the Internet link directly to the data - by selecting this link in a web browser, the user will immediately download the full dataset. All files should be in XML following the IATI standard. Note that datasets are not hosted on this site, but by the publisher of the data. Please ensure files are less than 40MB.

Template issues on publisher read page

Fixes

  • .active state on publisher_read tabs not working
  • No icon for datasets tab on publisher_read (should be 'sitemap')

Notes

I think that both are related to the fact that the named routes for these are now 'publisher', therefore we need to give the new routes icons in the routing.

Publisher > About - data portal/user interface link

In the publisher profile, there is a question for a user interface (last one):

http://iatiregistry.org/publisher/about/unhabitat

This is often a URL, which does not become an active link when published. Either:

  • 1 - make such links become active in this question
  • 2 - we build another field for this, which can then be inserted into a more prominant position on the Registry page for the publsher (eg, underneath the logo).

Quite a few publishers have user interfaces now - but they are not evident

Option 2 would be preferable - but perhaps 1 can be addressed quickly...

Update CSV help text

In /help_csv-import - need to remove guidance on the fields that have been removed in #51 and #50

@amercader I could do this via a PR from the IATI fork (or it might be easier for you to just do)?

Revision ID should be updated if package is changed?

It would be much easier to check whether packages had changed if revision_id stored the most recent revision (including those revisions generated by the iati-archiver author) rather than only storing in the extras['hash'] variable.

For example, you could then do this:
http://iatiregistry.org/api/2/search/dataset?fl=id,name,revision_id&offset=0&limit=10

Rather than having to iterate over every package like this:
http://iatiregistry.org/api/2/rest/package
http://iatiregistry.org/api/2/rest/package/00045acb-a29d-4145-b235-ffe78a637fd4

Maybe there's a reason why this isn't done, and the data is only stored in hash rather than revision_id, but it would make things a lot easier - and reduce the load on the IATI Registry for nightly update operations.

Template tweaks

  • Move sidebar to right column
  • /dataset should have alternating backgrounds (think zebra)
  • Hide 'show more' in faceting when there is no more...
  • Style 'show more' link differently from the facets
  • Make sure the download size is within the listing
  • Make the 'external link' icon like the original
  • 500 on publisher about page http://iati2.staging.ckanhosted.com/publisher/about/aa

CKAN 2.x theme tweaks

Todo

  • Organization on dataset page doesn't need .context-info border
  • Publisher listing doesn't fit across full width
  • No style for pagination links
  • max-width: 100% on .context-info img
  • .active state on organization read tabs not working
  • Organizations needs to be renamed to publishers throughout templates
  • Change favicon
  • Change site title
  • .input-prepend needs to be imported from bootstrap 2.3
  • No icon for datasets on publisher_read

Country list out of synch with IATI codelist

I noticed that "AN - Netherlands Antillies" was missing from the Registry Country list at

https://raw.githubusercontent.com/okfn/ckanext-iati/master/ckanext/iati/countries.py

I could edit this, but wanted to check with @Bjwebb / @amercader if there could be a better way to synch with https://github.com/IATI/IATI-Codelists-NonEmbedded/blob/master/xml/Country.xml

@amercader this list is also available in various formats at http://iatistandard.org/201/codelists/Country/ -

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.