Giter Club home page Giter Club logo

datagovuk_publish's Introduction

Code Climate Test Coverage

data.gov.uk Publish

This repository contains the beta-stage publishing component of data.gov.uk.

Deployment

Continuous Integration has been setup using Github Actions.

  • Tests are run on pull requests.
  • Deployments to Integration happen automatically when marging branches into the main branch.
  • In order to carry out a release to production a developer in the govuk team will need to create a release tag with a leading v and approve of the deployment in Github Actions.

Integration

To deploy to integration merge a PR into main.

Staging & Production

To deploy to staging/production you need to tag the release, the tag needs to be in this format - v9.9.9 where 9 is a number and the leading v is required. E.g. v0.1.11 is valid, 0.1.11 is not.

This will create a PR on govuk-dgu-charts which you should be able to approve and merge into main for testing.

Test that your changes are working in staging by looking at the publish pod logs for evidence of jobs being processed.

Then merge in the production release PR.

Prerequisites

You will need to install the following for development.

Most of these can be installed with Homebrew on a Mac.

How to run this repo locally

There are currently 2 ways to run this repo locally:

  1. Via govuk-dgu-charts - An end to end setup from ckan to opensearch to Find. This is the presently most supported means for running Find and is recommended for local development. Instructions for how to setup and run Find this way available on the linked repo.
  2. Manual installation. Instructions for this below. (needs CKAN running)

Install requirements for this app using Homebrew

## PostgreSQL
brew install postgresql

## Redis
brew install redis

## Opensearch
brew tap caskroom/versions
brew cask install java8
brew install opensearch

Start the services on your machine

brew services start postgresql
brew services start opensearch
brew services start redis

Update config settings

Configure the base URL of your local CKAN in ./config/environments/development.rb:

config.ckan_v26_base_url = "http://localhost:4000"

Install dependencies, initialise the database and search index:

bin/setup

Start the web server

rails s

Then navigate to http://localhost:3000.

Run Sidekiq jobs

These need to be run to sync data from CKAN.

Set up the workers, these sync organisation data and their datasets:

bin/rails runner CKAN::V26::CKANOrgSyncWorker.new.perform
bin/rails runner CKAN::V26::PackageSyncWorker.new.perform

Then run Sidekiq to process the queue:

bundle exec sidekiq

When you create new organisations and datasets in Publish, you will have to run these commands again to trigger the sync. These should then appear in Find.

Clear the database

To completely clear the database:

bin/rails db:drop db:setup

Re-index Opensearch

To re-index Opensearch based on the current database contents, run:

bin/rails search:reindex

Troubleshooting

Flush Redis

This may be necessary if you're having issues trying to completely reset your CKAN stack and start over with no data. See the next section below as an example.

$ redis-cli flushall
OK

Check the database size is 0:

$ redis-cli
127.0.0.1:6379> dbsize
(integer) 0

Delete the index on opensearch

If when attempting to run bin/rails search:reindex you get An alias can not be assigned to index of the same name. Please delete index 'datasets-development' before continuing, then you should run the following command on the publish pod before running the search reindex:

$ curl -XDELETE $ES_HOST/datasets-development

Running the PackageSyncWorker sidekiq job attempts to sync non existent data

When running this sidekiq job it returns errors in the terminal such as:

404 Not Found excluded from capture: DSN not set
{"@timestamp":"2019-06-06T10:03:58Z","@fields":{"pid":43034,"tid":"TID-oxw3pfczg","context":" CKAN::V26::PackageImportWorker JID-3b2dff4c5d230d1d27cc5bea","program_name":null,"worker":"CKAN::V26::PackageImportWorker"},"@type":"sidekiq","@status":"fail","@severity":"INFO","@run_time":0.545,"@message":"fail: 0.545 sec"}
  1. Ensure you have the correct config settings - see Update config settings
  2. Try to flush redis
  3. You will also need to purge SOLR via CKAN
  4. Clear the Publish database
  5. Then re-run sidekiq jobs - see Run sidekiq jobs

Documentation

See here for all of our Architecture Decision Records.

datagovuk_publish's People

Contributors

1pretz1 avatar alangabbianelli avatar andrewgarner avatar barrucadu avatar benjamineskola avatar benthorner avatar brucebolt avatar cbaines avatar cdccollins avatar chrisbashton avatar deborahchua avatar dependabot-preview[bot] avatar dependabot-support avatar dependabot[bot] avatar edwardkerry avatar elliotcm avatar emmabeynon avatar govuklaurence avatar hannako avatar issyl0 avatar kentsanggds avatar maxf avatar maxgds avatar murilodalri avatar nimalank7 avatar pixeltrix avatar pudiva avatar rossjones avatar thomasleese avatar tobyret avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datagovuk_publish's Issues

Overdue tasks

  • Create a task when a dataset has an overdue datafile

Generate a data.json file

Publish Data (as it has all of the metadata) should generate a data.json file (on a schedule) that can be hosted somewhere but made available for Find-Data at https://..../data.json

This will allow other catalogues to harvest ours without hammering an API as it does currently.

Refactor task generation code

The code for task generation repeats itself quite a lot, and could be cleaned up in such a way as to be re-usable within the app at runtime (if necessary).

Fix manage links

The links in the manage pages for Add Data and Edit go nowhere, we need to make them go to the right place.

Heroku banner

My suggestion:

  1. Add a new rails env called nightly or staging (that just inherits from production).
  2. Detect in layout template if this env is active.
  3. Show banner if it is.
  4. Set RAILS_ENV=(nightly|staging) on Heroku

Dataset import

Import legacy datasets into Beta. Need to check models for the extended metadata where the dataset also has INSPIRE metadata attached.

Manage My/Organisation datasets

Manage datasets for users where they are assigned as owner to the dataset, and also show datasets for that user's organisation.

Broken datalink task

  • Create a task when a dataset has a broken datalink
  • Implement @brokendatasets in the tasks controller

Validation rules + content

Document validation rules from Alpha so we have a starting point. Validation is different when creating a dataset (for draft) than for publishing, which is much stricter. There are rules around when we can publish without a datafile, or with broken datafiles and we should make sure these are also documented.

Soft Deletion + Redirects for duplicate datasets

Sometimes we end up in the situation (because of harvesting) where an organisation publishes duplicates and the old ones need removing.

Rather than deleting the old datasets, where possible we should allow for them to be forwarded to the replacement dataset. Even if this is a manual or API driven task. We don't want to break people's bookmarks.

Location API

/api/locations?q=bl

returns a list of corresponding location names, eg:
["Blaby (local authority)", "Ribble Valley (local authority)", "Blackburn with Darwen (local authority)", "Blackpool (local authority)", "Hambleton (local authority)", "South Ribble (local authority)", "Blaenau Gwent (local authority)", "Blackburn with Darwen (NHS Clinical Commissioning Group area)", "Blackpool (NHS Clinical Commissioning Group area)", "Chorley and South Ribble (NHS Clinical Commissioning Group area)", "Hambleton, Richmondshire and Whitby (NHS Clinical Commissioning Group area)"]

List of locations is at https://github.com/datagovuk/publish_data_alpha/blob/master/src/datasets/fixtures/locations.json

  • Load data
  • Make API

Define abilities for permissions check

After the initial /dataset/new, or when editing, we should check the user is either the dataset creator, or is in the same organisation as the dataset.

Might want to wait until after edit is done to do this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.