Giter Club home page Giter Club logo

frictionless-ckan-mapper's Introduction

Frictionless CKAN Mapper

A library for mapping CKAN metadata <=> Frictionless metadata.

The library has zero dependencies (not even on Data Package libs). You can use it directly or use it for inspiration. Detailed outline of the algorithm is in the docs or you can read the code.

Travis Coveralls PyPi SemVer Chat on Discord

Installation

  • Python: install Python. The library is compatible with both Python 2.7+ and Python 3.3+.
pip install frictionless-ckan-mapper

Note: The package is installed as frictionless-ckan-mapper and then imported as frictionless_ckan_mapper.

Getting started

CKAN => Frictionless

# get a CKAN metadata item
ckan_dataset = {
  "name": "my-dataset",
  "title": "My awesome dataset",
  "url": "http://www.example.com/data.csv"
}

# or load from an API e.g.
# ckan_dataset = json.load(urllib.urlopen(
#     https://demo.ckan.org/api/3/package_show?id=my_dataset
# ))

from frictionless_ckan_mapper import ckan_to_frictionless as converter

# convert to frictionless
frictionless_package = converter.dataset(ckan_dict)

print(frictionless_package)

Frictionless => CKAN

frictionless = {
  'name': "f11s-dataset",
  'path': "https://datahub.io/data.csv"
}

from frictionless_ckan_mapper import frictionless_to_ckan as f2c

ckanout = f2c.dataset(frictionless)

print(ckanout)

Reference

This package contains two modules:

  • frictionless_to_ckan
  • ckan_to_frictionless

You can import them directly like so:

from frictionless_ckan_mapper import ckan_to_frictionless
from frictionless_ckan_mapper import frictionless_to_ckan

ckan_to_frictionless

resource(ckandict)

from frictionless_ckan_mapper import ckan_to_frictionless as converter

# ... Some code with a CKAN dictionary ...

output_frictionless_dict = converter.resource(ckan_dictionary)

dataset(ckandict)

from frictionless_ckan_mapper import ckan_to_frictionless as converter

# ... Some code with a CKAN dictionary ...

output_frictionless_dict = converter.dataset(ckan_dictionary)

frictionless_to_ckan

resource(fddict)

from frictionless_ckan_mapper import frictionless_to_ckan as converter

# ... Some code with a Frictionless dictionary ...

output_ckan_dict = converter.resource(frictionless_dictionary)

package(fddict)

from frictionless_ckan_mapper import frictionless_to_ckan as converter

# ... Some code with a Frictionless dictionary ...

output_ckan_dict = converter.package(frictionless_dictionary)

Design

Frictionless   <=>        CKAN
--------------------------------------
Data Package   <=>   Package (Dataset)
Data Resource  <=>   Resource
Table Schema   <=>   Data Dictionary?? (datastore resources can have schemas)

CKAN reference

Summary:

classDiagram

class Package
class Resource
class DataDictionary

Package *-- Resource
Resource o-- DataDictionary

mermaid-diagram-20200703112520

Source for CKAN metadata structure:

Algorithm: CKAN => Frictionless

See the code in frictionless_ckan_mapper/ckan_to_frictionless.py

Algorithm: Frictionless => CKAN

See the code in frictionless_ckan_mapper/frictionless_to_ckan.py

Developers

Install the source

  • Clone the repo:

    git clone https://github.com/frictionlessdata/frictionless-ckan-mapper.git
  • And install it with pip:

    pip install -e .

Run the tests

Use the excellent pytest suite as follows:

pytest tests

To test under both Python 2 and Python 3 environments, we use tox. You can run the following command:

make test

Note: Make sure that the necessary Python versions are in your environment PATH (Python 2.7 and Python 3.6).

Building and publishing the package

To see a list of available commands from the Makefile, execute:

make list

Build the distribution package locally for testing purposes

If a previous build exists, make sure to also remove it before building again:

make distclean

Then:

make dist

Alternatively, this command will accomplish the same to build packages for both Python 2 and Python 3:

python setup.py sdist bdist_wheel --universal

Test the package at test.pypi.org

python -m twine upload --repository testpypi dist/*

The package will be publicly available at https://test.pypi.org/project/frictionless-ckan-mapper/ and you will be able to pip install it as usual.

Tag a new Git release and publish to the official PyPi

Make sure to update the version of the package in the file frictionless_ckan_mapper/VERSION. Then:

make release

You can quickly review the version to release with make version, which will print the current version stored in VERSION.

frictionless-ckan-mapper's People

Contributors

aivuk avatar amercader avatar brew avatar geraldgrootroessink avatar luketully avatar pdelboca avatar roll avatar rufuspollock avatar wardi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

frictionless-ckan-mapper's Issues

validation error with extras

When I add this to my working datapackage.json:
"extras": {
"language": "nl"
},
I think this construction is correct. Alas. Any thoughts, please.
By the way: where can I find exceptions.errors?

[epic] Improve CKAN <=> Frictionless conversion (June 2020)

Improve existing converter and especially the docs.

Existing converter: https://github.com/frictionlessdata/ckan-datapackage-tools

Job Stories

Job story: When getting data in and out of CKAN I'm frequently using Frictionless formats and tools (its my default format for extraction from other systems) and I want to be able to do conversion to and from CKAN metadata structure so that I can do my work quickly and without having to dig through CKAN documentation.

Bigger context: frequently pulling data from other systems into CKAN -- and from CKAN into other systems. We want to use Frictionless as the intermediate format so we can convert MxN into M+N problem.

Acceptance

Tasks

  • Do the analysis
  • README driven development #13
    • Install / usage
    • Quick start
    • Reference
  • Implement CKAN => Frictionless #19
  • Implement Frictionless => CKAN #21
  • Update README -- Done: PR #35 merged.
  • Release to PyPI -- Done: Everything is ready after merging #36. More discussion in #18.

Analysis of CKAN mapping

Analysis

Frictionless   <=>     CKAN
---------------------------
Data Package  <=>   Package (Dataset)
Data Resource <=>   Resource
Table Schema  <=>   Data Dictionary?? (datastore resources can have schemas)

Example standalone script https://gist.github.com/rufuspollock/bd8ae3575950d180cce33da59c021299

[f2c] Expected behavior when mapping `contributors` and `licenses`

When converting from Frictionless to CKAN, we expect contributors and licenses to disappear from extras if there is already an author/maintainer or license at the "root" (outside extras). However, this leads to corner cases. The following is what I understand the correct behavior to be — please point out if I'm mistaken about something.


If contributors has...

ONE maintainer and/or ONE author (+ emails)

  • Keep everything at the root, remove from extras ⇒ remove duplicate information.

>= 2 authors and/or >= 2 maintainer (+ emails)

  • Keep the first one of each type and remove everything from extras ⇒ information lost.
  • @pdelboca's use case: there's always only one author and/or maintainer coming from Frictionless.

In both preceding cases...

  • Is it possible that CKAN will be storing one author/maintainer already that's different from the one(s) listed in extras? If so, do we just discard the keys in extras?

If licenses has...

ONE license

  • Discard from extras if already at the root ⇒ remove duplicate information.
  • If not at the root but in extras only, keep at the root and remove from extras.

>= 2 licenses

  • Discard in extras only the one already listed in CKAN or everything?
  • @pdelboca's use case: there's always only one license in the Frictionless licenses list.
  • If we don't have a license in CKAN but we have multiple licenses in licenses ⇒ keep the first one?

In both preceding cases...

  • As with authors/maintainers, can we have different licenses in extras that do not match what's at the root?

Acceptance / Task

  • Be clear on how to deal with extra information coming from licenses and contributors.

Analysis

Proposed algorithm for contributors:

  • if contributors has length 1 or 2 and number of authors and maintainer types <= 1 THEN delete

⇒ What if there are more contributors? Delete?
⇒ What if we don't have author/maintainer info at the root? (Will that ever happen or do we necessarily start with a valid CKAN package that's then converted to a Frictionless package?)

/cc @rufuspollock @pdelboca

Export diacritical tokens differs from import.

When I import this tekst as a value in a datapackage.json:

“Dit besturenbestand is uniek per ONDERWIJSSECTOR, JAAR en BEVOEGD_GEZAGNUMMER. Een bevoegd gezag (cq. bestuur) kan namelijk uit één of meerdere brins bestaan, waaraan een unieke onderwijssector (BO, S(B)O, VO en MBO) is verbonden. Met andere woorden een schoolbestuur kan in meerdere (sub)sectoren onderwijs verzorgen. In principe is dit besturenbestand redundant aan het instellingenbestand (d.w.z. het besturenbestand zou uit het instellingenbestand kunnen worden berekend).”

..and export it again, it comes out like this:

"Dit besturenbestand is uniek per ONDERWIJSSECTOR, JAAR en BEVOEGD_GEZAGNUMMER. \nEen bevoegd gezag (cq. bestuur) kan namelijk uit \u00e9\u00e9n of meerdere brins bestaan, waaraan een unieke onderwijssector (BO, S(B)O, VO en MBO) is verbonden. \nMet andere woorden een schoolbestuur kan in meerdere (sub)sectoren onderwijs verzorgen. In principe is dit besturenbestand redundant aan het instellingenbestand (d.w.z. het besturenbestand zou uit het instellingenbestand kunnen worden berekend)."

When validating this at (http://data.okfn.org/tools/validate)
I receive this error message:

image

Seems to me that export should equal import.

Resource names not converted to correct format

According to the Data Resource specifications https://specs.frictionlessdata.io/data-resource/#name a resource name:

MUST consist only of lowercase alphanumeric characters plus “.”, “-” and “_”.

Currently, if you provide a CKAN dataset to ckan_to_frictionless.py with a resource not following the above rule, the name will be written as it is, containing any character and not in lower case.

There are two options to solve the issue:

  1. Just raise an error to the user informing about the problem
  2. Convert the name to a valid one, removing invalid characters and putting in lowercase.

CKAN to frictionless: `path` is overridden if both `path` and `url` set in CKAN resource

Overview

If a CKAN data package resource has both path and url set, path will be overwritten. It is not clear what the expected behavior here, but it is possible that for path as well as for other keys, we should not be touching them if they exist, just unset the conflicting key, or maybe even through an exception if both keys are set so that using code can handle gracefully.

We should either:

  1. Change this so that existing keys are never overwritten, or
  2. Decide that existing keys are always overwritten, or
  3. Throw an exception if a target key already exists before conversion

In any case we should document the correct behavior.


Please preserve this line to notify @amercader (lead of this repository)

Refactor and improve CKAN => Frictionless Code and tests

We want to refactor and improve CKAN => Frictionless code e.g.

  • Extras handling including parsing of JSON values e.g. schema
  • Full set of mappings

Approach: no changes wherever possible so we support roundtripping. Add option to "enforce" more compatibility eg. name getting slugified on resource etc.

Also option for "generous" e.g. name => title on a resource.

Tasks

Design algorithm => incrementally implement based on tests

  • Design algorithm and document in docstring See ckan_to_frictionless.py
  • Migrate existing tests and have them pass
    • list existing tests
  • Refactor Resource stuff
  • Refactor Dataset/Package mapping
    • keys_are_removed_that_should_be -- DONE: See ddc9309
    • extras_expanded
    • unjsonify_values -- DONE: See #25
    • license_is_converted
    • author_is_converted
    • sources_is_converted sources does not exist in CKAN
    • keys_are_passed_through -- DONE: See #27
    • resources_are_converted: length check and one check on one resource -- DONE: See #23 and #24 for more details (both are now merged).
    • null_values_are_removed -- In Progress: Ready to merge → #26
  • Add instructions for adding tests in future Obvious from code style / structure

Analysis

Existing tests

Dataset

test_basic_dataset_in_setup_is_valid       # can drop probably
test_dataset_author_and_source
test_dataset_ckan_url
test_dataset_extras
test_dataset_license
test_dataset_maintainer
test_dataset_name_title_and_version
test_dataset_notes
test_dataset_only_requires_a_name_to_be_valid
test_dataset_tags

Resource

test_resource_description
test_resource_format
test_resource_hash
test_resource_name_converts_unicode_characters
test_resource_name_lowercases_the_name
test_resource_name_slugifies_the_name
test_resource_path_is_set_even_for_uploaded_resources
test_resource_schema
test_resource_schema_string
test_resource_schema_url
test_resource_url

Full CKAN `pkg_dict` example

Do we want to operate on full pkg_dict and, if so, what is the mapping?

Vanilla CKAN example of pkg_dict

{u'author': u'',
 u'author_email': u'',
 u'creator_user_id': u'34b0aaa0-aaef-4f12-8324-6d2e4ab912f4',
 u'extras': [],
 u'groups': [],
 u'id': u'aa8684d7-180d-4e04-955a-58317fe54d76',
 u'isopen': True,
 u'license_id': u'cc-by',
 u'license_title': u'Creative Commons Attribution',
 u'license_url': u'http://www.opendefinition.org/licenses/cc-by',
 u'maintainer': u'',
 u'maintainer_email': u'',
 u'metadata_created': u'2020-06-11T12:14:48.100492',
 u'metadata_modified': u'2020-06-11T12:15:20.486248',
 u'name': u'markdown-link',
 u'notes': u'',
 u'num_resources': 1,
 u'num_tags': 0,
 u'organization': {u'approval_status': u'approved',
                   u'created': u'2020-06-10T22:57:28.858074',
                   u'description': u'',
                   u'id': u'45f3dfcc-9748-4daf-a694-83a745e9fc8d',
                   u'image_url': u'',
                   u'is_organization': True,
                   u'name': u'odc',
                   u'revision_id': u'0118663b-69f7-462b-898f-1edc0574f953',
                   u'state': u'active',
                   u'title': u'ODC',
                   u'type': u'organization'},
 u'owner_org': u'45f3dfcc-9748-4daf-a694-83a745e9fc8d',
 u'private': True,
 u'relationships_as_object': [],
 u'relationships_as_subject': [],
 u'resources': [{u'cache_last_updated': None,
                 u'cache_url': None,
                 u'created': u'2020-06-11T12:15:15.126163',
                 u'datastore_active': False,
                 u'description': u'',
                 u'format': u'CSV',
                 u'hash': u'',
                 u'id': u'c10f801b-cb36-4aa4-b145-480170bf8923',
                 u'last_modified': u'2020-06-11T12:15:15.094050',
                 u'mimetype': u'text/csv',
                 u'mimetype_inner': None,
                 u'name': u'mini-csv.csv',
                 u'package_id': u'aa8684d7-180d-4e04-955a-58317fe54d76',
                 u'position': 0,
                 u'resource_type': None,
                 u'revision_id': u'602cc8db-75b9-4ffb-b60b-6d26c9c55e62',
                 u'size': 40,
                 u'state': u'active',
                 'tracking_summary': {'recent': 0, 'total': 0},
                 u'url': u'http://ckan:5000/dataset/aa8684d7-180d-4e04-955a-58317fe54d76/resource/c10f801b-cb36-4aa4-b145-480170bf8923/download/mini-csv.csv',
                 u'url_type': u'upload',
                 u'versions_upload_timestamp': u'2020-06-11T12:15:15.093303'}],
 u'revision_id': u'602cc8db-75b9-4ffb-b60b-6d26c9c55e62',
 u'state': u'active',
 u'tags': [],
 u'title': u'Markdown Link',
 'tracking_summary': {'recent': 0, 'total': 0},
 u'type': u'dataset',
 u'url': u'',
 u'version': u''}

Frictionless => CKAN: Refactor and improve code and tests

We want to refactor and improve Frictionless => CKAN code, e.g.

Acceptance

  • Extras handling including parsing of JSON values, e.g. schema.
  • Full set of mappings.

Approach: no changes wherever possible so we support roundtripping.

Tasks

DONE: Everything ticked as done with no explanation next to it is being addressed in the PR #30.

Design algorithm => incrementally implement based on tests

Analysis

Existing tests

Data package

test_basic_datapackage_in_setup_is_valid        # probably not needed
test_datapackage_only_requires_some_fields_to_be_valid
test_datapackage_name_title_and_version
test_name_is_lowercased
test_datapackage_description
test_datapackage_license_as_string
test_datapackage_license_as_unicode    # does not seem too useful, skipping for now
test_datapackage_license_as_dict
test_datapackage_sources
test_datapackage_author_as_string
test_datapackage_author_as_unicode
test_datapackage_author_as_string_without_email
test_datapackage_author_as_dict
test_datapackage_keywords
test_datapackage_extras

Resource

test_resource_name_is_used_if_theres_no_title
test_resource_title_is_used_as_name
test_resource_url
test_resource_url_is_set_to_its_remote_data_path
test_resource_description
test_resource_format
test_resource_hash
test_resource_schema
test_resource_path_is_set_to_its_local_data_path

Additional tests to consider

  • path_is_mapped_to_url
    • even if relative POSIX path!
test_author_is_converted
test_extras_is_converted
test_license_is_converted

Values get JSONified by CKAN

I just tried that for an existing package I had and it got JSONified.

I sent this:

curl -X POST https://demo.ckan.org/api/3/action/resource_create -H "Authorization: CORRECT-KEY-HERE" -d '{    7 ↵
  "package_id": "ckan-to-frictionless-conversion",
  "url":  "https://raw.githubusercontent.com/frictionlessdata/test-data/master/files/csv/100kb.csv",
  "description": "This is the best resource ever!" ,
  "name": "brand-new-resource",
  "more": {"key": "not jsonified"}
}'

And received the response:

{
   "help":"https://demo.ckan.org/api/3/action/help_show?name=resource_create",
   "success":true,
   "result":{
      "cache_last_updated":null,
      "cache_url":null,
      "mimetype_inner":null,
      "hash":"",
      "description":"This is the best resource ever!",
      "format":"CSV",
      "url":"https://raw.githubusercontent.com/frictionlessdata/test-data/master/files/csv/100kb.csv",
      "created":"2020-06-25T14:10:29.264869",
      "state":"active",
      "name":"brand-new-resource",
      "package_id":"99575b35-8a88-4fd9-b0dc-b9d0479c9b2c",
      "last_modified":null,
      "mimetype":null,
      "url_type":null,
      "position":2,
      "revision_id":"19f22475-ff52-46aa-8918-627e3359dc31",
      "size":null,
      "datastore_active":false,
      "id":"331a9e6e-2230-4bd8-84a4-3e2b66682f98",
      "resource_type":null,
      "more":"{'key': 'not jsonified'}"
   }
}

If you search for the keyword more you can see it here: https://demo.ckan.org/api/3/action/package_show?id=ckan-to-frictionless-conversion

[f2c] Licenses not converted correctly

See frictionlessdata/ckanext-datapackager#62

When I upload this datapackage.zip to this ckan site the licenses aren't unpacked correctly

screenshot 2017-12-06 22 19 03

@amercader suggests following algorithm when ingesting DP

  • If licenses is present check if the first item exists in the licenses registry.
  • If so, set license_id
  • If it doesn't exist in the registry, or there is more than one license, store the licenses object as an extra, so instances with the datapackager extension can handle it as they see fit. They will need to take care of keep license_id (the default field) and the licenses extra in sync.

This is good. However, it is only something one can do "inside" CKAN so it is outside of scope here. I would suggest we just convert first license.name in licenses to license_id and license.title to license_title

[inbox] Inbox of bonus features

  • Special modes for ckan => f11s conversion b/c default mode is very unmagical. e.g.
    • Expansive mode: add title from name on resource, ...
    • Strict mode: do work to make sure it is a valid f11s object e.g. name is lower-cased on resource and package, add resources empty array if no array, add name
  • Map a CKAN data dictionary to table schema

Strict mode

Name and title properties

We have existing tests in test_converter.py that ensure "name" and "title" are processed stricly:

    def test_resource_name_is_used_if_theres_no_title(self):
        resource = {
            'name': 'gdp',
            'title': None,
        }
        self.datapackage.resources[0].descriptor.update(resource)
        result = converter.datapackage_to_dataset(self.datapackage)
        resource = result.get('resources')[0]
        self.assertEquals(result.get('resources')[0].get('name'),
                          resource['name'])

    def test_resource_title_is_used_as_name(self):
        resource = {
            'name': 'gdp',
            'title': 'Gross domestic product',
        }

        self.datapackage.resources[0].descriptor.update(resource)
        result = converter.datapackage_to_dataset(self.datapackage)
        self.assertEquals(result.get('resources')[0].get('name'),
                          resource['title'])

Those have been refactored like this for now:

    def test_name_is_used_if_theres_no_title(self):
        indict = {'name': 'gdp'}
        out = converter.resource(indict)
        assert out.get('name') == indict['name']

    def test_resource_title_is_used_as_name(self):
        indict = {
            'name': 'gdp',
            'title': 'Gross domestic product',
        }

        out = converter.resource(indict)
        assert out.get('name') == indict['title']

Sluggifying tags

In strict mode, we may have to sluggify CKAN tags when converting keywords from Frictionless to tags in CKAN. Something like:

outdict['tags'] = [
    {'name': slugify.slugify(keyword).lower()}
    for keyword in outdict['keywords']
]

Map a CKAN data dictionary to table schema

CKAN => FD issues re bytes, id, empty tags

When I convert a CKAN dataset to a Frictionless package and then a Frictionless package to a CKAN dataset again, I expect the mapping to be coherent and remap from Frictionless to CKAN what is necessary so I can use this converted CKAN dataset as if it were the original dataset or as close to it as possible. Possible issues noted:

  • Frictionless resource has bytes key and it is not being converted when inside a dataset. FIXED in cf9e424
  • If a CKAN package has the id key, this key is stored in extras in a Frictionless package (this is OK) but stays in extras when converting again to a CKAN package. Doesn't CKAN need to identify a package with an id? It could be retrieved from extras in strict mode. FIXED in b6feb94
  • Having no "tags" in CKAN creates the new key/value pair "keywords": [] when doing a round trip (the new converted CKAN dataset has a keywords key it didn't have before and it's empty). Maybe we could check against empty lists and dictionaries to make sure we do not copy those over (as they are basically equivalent to null), especially when it's not a key recognized by CKAN (keywords is specific to Frictionless)? FIXED in b2de64c

Acceptance

  • Mappings from CKAN => FD => CKAN are consistent and produce the expected results.

Tasks

  • Make sure that everything that is being remapped from CKAN to Frictionless is also remapped correctly from Frictionless to CKAN.
    • Make sure that all the properties in a CKAN resources are named correctly (e.g. "size" and "bytes").

Roundtripping tags from package_show leads to unexpected behaviour

When doing a round tripping, if I add a tag to the package in CKAN the result of package_show is the following one:

 'tags': [{'display_name': u'FancyTag',
           'id': u'b273a236-e4f8-498b-a64c-fc8f10384afd',
           'name': u'FancyTag',
           'state': u'active',
           'vocabulary_id': None},
          {'display_name': u'NewFancyTag',
           'id': u'8d9e96e0-cf58-44e1-90eb-1b34f778f442',
           'name': u'NewFancyTag',
           'state': u'active',
           'vocabulary_id': None}],

However, the library is only mapping the name of the tag:

 'keywords': [u'FancyTag', u'NewFancyTag'],

So if I do a reconversion using ftc.package() it will return:

'tags': [{'name': u'FancyTag'}, {'name': u'NewFancyTag'}],

Which is missing id, display_name, state and vocabulary_id.

Some items kept in `extras` after frictionless => CKAN conversion cause CKAN to fail with ValidationError

Overview

When a package is converted from CKAN to frictionless and back again🦶, some items are saved in extras such as id and contributors. Some of the extras may be read and used to set values when converting from frictionless to CKAN, but they are not popped out of the extras list and are kept there.

This can then cause issues when trying to pass the resulting CKAN package dict through CKAN's own validation (e.g. when calling package_update() on the same package), because CKAN will throw an error:

ValidationError: {u'extras': [{}, {}, {'key': [u'There is a schema field with the same name']}]}

This happens because extras now contains keys that are already defined in the CKAN dataset schema such as id.

For ckanext-versioning, we worked around this by filtering out values from extras that already exist as properties of the package dict, but others may be hitting this as well. See our workaround in https://github.com/datopian/ckanext-versioning/pull/19/files#diff-3a58033cd97812191417c36622c5aa7bR123


Please preserve this line to notify @amercader (lead of this repository)

Improve frictionless-ckan-mapper README.md documentation

Overview

Hello guys!!! First of all congratulation on the amazing work you are doing here. We've been using frictionless specifications a lot and now this frictionless-ckan-mapper package because we're going to integrate some frictionless datasets into our own CKAN instance.

Following frictionless-ckan-mapper README.md documentation I've found a little typo and a reference to an invalid attribute in frictionless_to_ckan module. Also, I would like to take the opportunity to suggest a small modification in the pattern used during one frictionless_to_ckan importation example.

  • Little typo: In CKAN => Frictionless the "ckan_dataset" dict created isn't used as a argument in converter.dataset(), instead ckan_dict is provided, providing the error below:

image

So, change "ckan_dict" to "ckan_dataset" solves the problem:

image

  • Reference to an invalid attribute in frictionless_to_ckan module: In Frictionless => CKAN frictionless_to_ckan' has no attribute 'dataset'

image

I think the correct "package" attribute is shown in package(fddict), so change "dataset" to "package" solves the problem:

image

I would like to take the opportunity to suggest changing "f2c" to "converter" in this last example to follow the documentation pattern

Obs.: All suggestions are implemented in PR 47


Please preserve this line to notify @amercader (lead of this repository)

datapackage dependency

I'm planning to include this library as a dependency for ckanapi to fix the ckanapi datapackage creation feature, but the setup.py here includes datapackage>=1.0,<2.0 in order to run the tests, and that results in lots of extra packages installed even when I don't want to run the ckan-datapackage-tools tests

Successfully installed attrs-19.1.0 cchardet-2.1.4 ckan-datapackage-tools-0.0.3 datapackage-1.6.0 et-xmlfile-1.0.1 functools32-3.2.3.post2 ijson-2.3 isodate-0.6.0 jdcal-1.4.1 jsonlines-1.2.0 jsonpointer-2.0 jsonschema-3.0.1 linear-tsv-1.1.0 openpyxl-2.4.11 pyrsistent-0.15.1 rfc3986-1.3.1 tableschema-1.4.1 tabulator-1.20.0

Any objection to installing datapackage only when running tests instead of as a install_requires?

Should other dataset properties be made available somehow when downloading datapackage?

Hi @amercader
I note this issue
when considering dataset to datapackage conversion. From the comments, I'm guessing that by adjusting schema? somehow (using ckanext-schema?) a ckan instance can allow import of properties other than simple string key/value pairs.

But then looking at dataset 'package_show' call there are many dataset properties that are left out of the datapackage download? I realise this question probably came up (and was resolved) early, but should these other dataset properties be made available somehow (sure, we could get them from a rest call or creating another ckan extension to merge both), but just wanted to know your thoughts around this, as requiring other dataset properties (that aren't directly in the vanilla datapackage schema) is a need for working with datapackages downloaded from ckan offline, even if only to have them available on subsequent ckan imports after editing resource data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.