Giter Club home page Giter Club logo

dim's Introduction

dim

codecov Github All Releases Github All Releases

Data Installation Manager: Manage the open data in your project like a package manager.

8bket-vzuiv

Join community

We are looking for members to develop together as an open source community.

Slack

Features

Document

For more information about how to use it, please refer to this document.

Quick Start

Install the dim

Install the dim from binary files or Run the dim using Deno

Install the dim from binary files

Download the dim from binary files.

aarch64-apple-darwin

curl -L https://github.com/c-3lab/dim/releases/latest/download/aarch64-apple-darwin-dim -o /usr/local/bin/dim

x86_64-apple-darwin

curl -L https://github.com/c-3lab/dim/releases/latest/download/x86_64-apple-darwin-dim -o /usr/local/bin/dim

x86_64-pc-windows-msvc

curl https://github.com/c-3lab/dim/releases/latest/download/x86_64-pc-windows-msvc-dim.exe -o C:\Users\user-name\dim.exe

x86_64-unknown-linux-gnu

curl -L https://github.com/c-3lab/dim/releases/latest/download/x86_64-unknown-linux-gnu-dim -o /usr/local/bin/dim

Grant user execution permission

chmod u+x /usr/local/bin/dim

New Project

  1. init the project

Generate dim.json, dim-lock.json and data_files/ by init command.

$ dim init
  1. Install a data

This command stores information about installed data in dim.json and dim-lock.json.

$ dim install https://example.com -n "example"
  1. Installed data is saved in data_files/.
$ ls ./data_files

Install all data written to dim.json shared by members

Install all data written to dim.json shared by members.

r30y7-wcbx7

  1. Make sure existing the dim.json in current directory
$ ls ./

dim.json  ....
  1. Install all data written in the dim.json
$ dim install
  1. Installed data is saved in data_files/.
$ ls ./data_files

Command Usage

Generate dim.json, dim-lock.json and data_files/.

$ dim init

Install the all data.

Install all data written to dim.json.

$ dim install

Install from a specified local dim.json.

$ dim install -f ./path/dim.json

Install from a specified remote dim.json in internet.

$ dim install -f https://raw.githubusercontent.com/xxxx/xxxx/main/dim.json

Install the specified data.

$ dim install https://example.com -n "example"

Specify headers.

$ dim install https://example.com -n "example" -H "Authorization: 1234567890abc" -H "Fiware-Service: example"

Specify the installation post-process

Postprocess unzip

$ dim install https://example.com -n "example" -p unzip

Postprocess encoding

$ dim install https://example.com -n "example" -p "encode utf-8"

Postprocess xlsx-to-csv

$ dim install https://example.com -n "example" -p xlsx-to-csv

Postprocess csv-to-json

$ dim install https://example.com -n "example" -p csv-to-json

Postprocess custom command

You can specify a custom command after "cmd".

$ dim install https://example.com -n "example" -p "cmd ******"

The file path will be passed as an argument at the end of the specified command.

$ dim install https://example.com -n "example" -p "cmd python ./tests/test_custom_command.py"

Command to be executed during postprocessing.

$ python ./tests/test_custom_command.py ./data_files/***/***.xx

Install by specifying the HTML page

Install data by specifying regular expressions for links within a specified page.

$ dim install -P https://example.com -e ".pdf" -n "example"

Forced execution

Forced install. Overwrite already exist data file.

$ dim install https://example.com -n "example" -F

Delete data with the specified name from dim.json, dim-lock.json and data_files/.

$ dim uninstall [name]

Display the information described in dim-lock.json.

$ dim list

Simple List

$ dim list -s

Verify the data

$ dim verify

Update the all data.

$ dim update

Update the specified data.

$ dim update [name]

Clean

Clean the project. Delete the data_files and init the project.

$ dim clean

Search data from package_search CKAN API.

Use データカタログ横断検索システム by default to do the search.

$ dim search 避難所

Specify the number of data to get by option -n (default 10).

$ dim search 避難所 -n 3

Interactive installation

Write data information to dim.json from ckan.

Store the data to data_files.

$ dim search -i "東京 避難所"

131105_東京都_目黒区_大地震時における地域避難所
  - Catalog URL        : https://www.geospatial.jp/ckan/dataset/131105-002
  - Catalog Description: ####大地震時における地域避難所のデータです。
####東京都目黒区のオープンデータです。【リソース】大地震時における地域避難所 / ####大地震時における地域避難所のXLSXです。
【キーワード】東京都 / 目黒区 / 避難所
  - Catalog License    : クリエイティブ・コモンズ 表示
    1. 大地震時における地域避難所
      * Resource URL        : https://www.geospatial.jp/ckan/dataset/1e07b569-80a5-4c31-8a7b-be88d1e8f327/resource/8d8de117-2342-4c61-a98d-8f7a9c5b71a2/download/131105evacuationspace.xlsx
      * Resource Description: ####大地震時における地域避難所のXLSXです。
      * Created             : 2018-10-30T02:55:40.179726
      * Format              : XLSX

131059_東京都_文京区_緊急避難場所・避難所
  - Catalog URL        : https://www.geospatial.jp/ckan/dataset/131059-025
  - Catalog Description: ####緊急避難場所・避難所のデータです。
####東京都文京区のオープンデータです。【リソース】緊急避難場所・避難所 / ####文京区の避難所・緊急避難場所の一覧データのCSVです。####更新日:2018年10月23日 / ####文京区の避難所・緊急避難場所の一覧データのXLSXです。
####更新日:2018年10月23日【キーワード】文京区 / 東京都 / 避難場所 / 避難所
  - Catalog License    : CC-BY2.1
    2. 緊急避難場所・避難所
      * Resource URL        : https://www.geospatial.jp/ckan/dataset/b17c1f51-ce1c-4e6a-8ff9-5ff0203b1e43/resource/008d34ad-61a5-4dbd-8996-fa6d647c2986/download/kinkyuhinanbasyo-hinanjo.csv
      * Resource Description: ####文京区の避難所・緊急避難場所の一覧データのCSVです。
####更新日:2018年10月23日
      * Created             : 2018-10-30T05:44:44.623645
      * Format              : CSV
    3. 緊急避難場所・避難所
      * Resource URL        : https://www.geospatial.jp/ckan/dataset/b17c1f51-ce1c-4e6a-8ff9-5ff0203b1e43/resource/0c4942d4-a149-4091-a52f-69b7da8fa143/download/kinkyuhinanbasyo-hinanjo.xlsx
      * Resource Description: ####文京区の避難所・緊急避難場所の一覧データのXLSXです。
####更新日:2018年10月23日
      * Created             : 2018-10-30T05:44:46.127915
      * Format              : XLSX
...
? Enter the number of data to install > 1
? Enter the name. Enter blank if want to use CKAN resource name. > 
? Enter the post-processing you wish to add. Enter blank if not required. > xlsx-to-csv
? Is there a post-processing you would like to add next? (Y/n) > No
Convert xlsx to csv.
Installed to ./data_files/131105_東京都_目黒区_大地震時における地域避難所_大地震時における地域避難所/131105evacuationspace.xlsx

Auto-generate code about target data using GPT-3. For example, conversion processing, visualization processing, etc..

Export APIKey of OpenAI to OPENAI_API_KEY.

$ export OPENAI_API_KEY=xxxxxxxxxxxxxxxxxxxxx

You can get APIKey URL: https://platform.openai.com/account/api-keys

Combine the specified target data and prompt, send it to GPT-3 API, output the code, and save it.

$ dim generate -t "./data.csv" "Python code that converts this csv data to geojson"

Specify the data name managed by dim using -t

$ dim generate -t "shelter" "Python code that converts this csv data to geojson"

Example prompt List

Python code that converts this csv data to geojson
Python code that remove id column from this csv data
Python code that visualizes this csv data as a map
Python code that visualizes this csv data as a map
Python code that visualizes this csv data as HTML page
Python code that saves this csv data to PostgreSQL
Python code that converts full-width numbers in this csv file to half-width numbers
$ dim help

Run the dim using Deno

  1. Install Deno
  • Deno == 1.28.2
$ curl -fsSL https://deno.land/install.sh | sh
$ echo 'export DENO_INSTALL=~/.deno' >> ~/.bashrc
$ echo 'export PATH="$DENO_INSTALL/bin:$PATH"' >> ~/.bashrc
$ source ~/.bashrc
  1. Clone the repository
$ git clone https://github.com/c-3lab/dim.git
$ cd dim
  1. Run the dim commands
$ deno run -A dim.ts init
$ deno run -A dim.ts install https://xxxxxx/data.json -n 'data_name'
  1. Install dim
$ deno install --unstable --allow-read --allow-write --allow-run --allow-net --allow-env dim.ts

Run test and display coverage

  1. Run test
$ deno test -A --coverage=tests/coverage
  1. Display coverage
$ deno coverage ./tests/coverage

Upgrade the dim version

You need to be able to run Deno in a local environment.
If you don't have an execution environment of Deno, please re-install.
Install the dim from binary files

$ dim upgrade

Contributers

Made with contributors-img.

LICENSE

MIT LICENSE

dim's People

Contributors

champierre avatar jqinglong avatar k-oizumi-abel avatar minheibis avatar mkyutani avatar osoken avatar ryo-ma avatar sheile avatar syuparn avatar t-kurasawa avatar ta-hirose avatar takahashim avatar takayasukoura avatar tanimuranaomichi avatar to-ki-o avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

dim's Issues

Cannot target files after conversion when multiple postProcesses are specified.

dim install http://example.com/example.xlsx -p "xlsx-to-csv" -p "encode SJIS"

Currently, it is not possible to convert an xlsx file to a csv file and then to SJIS by executing the above command.

This is because if multiple postProcesses are specified, the path to the converted file is not passed to the next process.

Deno update is preventing "Check type" from passing.

The version of Deno was updated to 1.26.0 on 2022.09.28.
Subsequently, the following error occurred in the Check type of CI.

error: TS1477 [ERROR]: An instantiation expression cannot be followed by a property access.
          () => Promise<number>.resolve(4),
                       ~~~~~~~~
    at file:///home/runner/work/dim/dim/tests/libs/actions.search.test.ts:451:24

Add support for deno 1.30.0

test fails on current code base running on deno 1.30.x with the following error message:

InstallAction ...
  with URL ...
    download and check that data_files, dim.json and dim-lock.json are saved. ... ok (20ms)
    exit with error when name is not specified ... ok (5ms)
    exit with error when run with "name" not recorded in dim.json ... ok (7ms)
    overwrite existing files when specified name is duplicated and force is true ... ok (7ms)
    download using request headers and check that they are recorded in dim.json and dim-lock.json when specify headers option ... ok (6ms)
    encode downloaded file to Shift-JIS and record in dim.json, dim-lock.json when specify "encode sjis" as postProcesses ... ok (8ms)
    exit with error when specify "encode utf-8 sjis" as postProcesses, and download ... ok (7ms)
    exit with error when specify "encode" as postProcesses, and download. ... ok (6ms)
    check that the command for darwin to extract the downloaded file is entered and recorded in dim.json and dim-lock.json. ... ok (6ms)
    check that the decompress method is called with two arguments when the os is not darwin. ... ok (6ms)
    exit with error when specify "unzip a" as postProcess and download ... ok (4ms)
    convert downloaded file from xlsx to csv and record in dim.json and dim-lock.json when specify "xlsx-to-csv" as postProcesses ... ok (26ms)
    convert downloaded file from xls to csv and record in dim.json and dim-lock.json when specify "xlsx-to-csv" as postProcesses ... ok (16ms)
    exit with error when specify "xlsx-to-csv a" as postProcesses and download ... ok (14ms)
    download file and execute echo command with downloaded file path as standard output when specify "cmd echo" as postProcesses ... ok (7ms)
    download file and execute echo command with "a" and downloaded file path as standard output when specify "cmd echo a" as postProcesses ... ok (6ms)
    exit with error when specify "cmd" as postProcesses and download ... ok (6ms)
    output log and ignore error when specify error command such as "cmd aaa" as postProcesses ... ok (10ms)
    exit with error when specify "aaa" as postProcess and download ... ok (5ms)
    exit with error when if the URL is incorrectly described. ... FAILED (6ms)
      error: AssertionError: spy not called with expected args:
      
      
          [Diff] Actual / Expected
      
      
          [
            "\x1b[31mFailed to install.\x1b[39m",
      -     "\x1b[31mInvalid URL: 'aaa'\x1b[39m",
      +     "\x1b[31mInvalid URL\x1b[39m",
          ]
      
              throw new AssertionError(
                    ^
          at assertSpyCall (https://deno.land/[email protected]/testing/mock.ts:542:15)
          at Object.<anonymous> (file:///home/osoken/Documents/works/projects/cfj/dim/repo/dim/tests/libs/actions.install.test.ts:798:9)
          at async Function.runTest (https://deno.land/[email protected]/testing/_test_suite.ts:358:7)
          at async Function.runTest (https://deno.land/[email protected]/testing/_test_suite.ts:346:9)
          at async Function.runTest (https://deno.land/[email protected]/testing/_test_suite.ts:346:9)
          at async fn (https://deno.land/[email protected]/testing/_test_suite.ts:316:13)
    exit with error when failed to download ... ok (7ms)
    exit with error when execute with URL and file path ... ok (6ms)
  with URL ... FAILED (204ms)

Post-processing CMD of the install command to accommodate redirection of results

">" is not recognized as a redirect sign.
The deno.run command probably treats ">" as a string.

Proposals:

First draft

  1. "> xxxx" and other parts are extracted using regular expressions to obtain the file name of the redirect destination
  2. Specify "piped" to stdout and stderr during deno.run to enable the program to handle standard output
  3. If a redirection was specified, Deno.writeFileSync saves the output to stdout

Pros

A single code can be used in a variety of environments.

Cons

Have to handle complex file names and redirects yourself. (>>, 2>>, 1>&2, etc.)

Second draft

If -p "cmd wc -c > /tmp/test.txt" is specified, start using /bin/sh as follows.

Deno.run({ cmd: ["/bin/sh", "-c", "wc -c data_files/xxx/xxx.zip > /tmp/test.txt"]})

Pros

/bin/sh handles redirects, so no need to implement your own processing.
If the function to send downloaded files as standard input is implemented, the string received with the -p option can be used as is.

Deno.run({ cmd: ["/bin/sh", "-c", "wc -c > /tmp/test.txt"], stdin: xxxx })

Cons

Need to change commands for each environment. (/bin/sh for Linux and Mac, cmd for Windows)

Can't download file when specify URL without filename

Currently, downloaded file is created at data_files/{name}/{filename}.
If URL is matched with following patterns, current logic can't get filename and occur an error.

$ dim install https://www.example.com -n example1
Failed to install. Is a directory (os error 21), open './data_files/example1/'

Proposals:

Use Content-Disposition response header to determine filename.
And fallback to use --name option as filename if don't serve it.

error of open api request is not shown as the error message

} catch (error) {
console.error(
Colors.red(`\n${error.message}`),
);

この部分について、
エラーが発生すると、"Request failed with status code 404 Not Found"
のように表示されますが、これは ky のエラーメッセージを表示しており、open AI 側のレスポンスのエラーメッセージ(例えば以下を参照)が表示されていないようです。

コマンド例

curl -i https://api.openai.com/v1/completions -H \
    "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
         "model": "gpt-3.5-turbo",
         "prompt": "generate the code that prints the following messange in python: this is a test",
         "max_tokens": 7,
         "temperature": 0
    }'

出力

HTTP/2 404 
date: Thu, 15 Jun 2023 12:13:02 GMT
content-type: application/json
content-length: 227
access-control-allow-origin: *
openai-organization: albert-inc-1
openai-processing-ms: 268
openai-version: 2020-10-01
strict-transport-security: max-age=15724800; includeSubDomains
x-ratelimit-limit-requests: 3500
x-ratelimit-limit-tokens: 90000
x-ratelimit-remaining-requests: 3499
x-ratelimit-remaining-tokens: 89992
x-ratelimit-reset-requests: 17ms
x-ratelimit-reset-tokens: 4ms
x-request-id: 9bdd4ddbcedcce031c323eb331076f87
cf-cache-status: DYNAMIC
server: cloudflare
cf-ray: 7d7ab9887db0afdb-NRT
alt-svc: h3=":443"; ma=86400

{
  "error": {
    "message": "This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions?",
    "type": "invalid_request_error",
    "param": "model",
    "code": null
  }
}

このメッセージの"message": "This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions?"を表示したいです。

[WIP] Change the structure of dim.json

dim.json

  • name
  • url
  • title
  • post_process (pre_prcocess)
  • revision
  • source
  • source_url
  • source_resouce_id

dim-lock.json

  • name
  • url
  • title
  • post_process (pre_prcocess)
  • revision
  • source
  • source_url
  • source_resouce_id
  • Integrity

Install failed with error TS1192

dimインストール中にTS1192が発生するようになってしまいました。(再現性あり)

  • エラー内容
error: TS1192 [ERROR]: Module '"https://jspm.dev/xlsx.js"' has no default export.
import xlsxlib from 'https://jspm.dev/xlsx'
       ~~~~~~~
    at https://deno.land/x/[email protected]/src/xlsx.ts:1:8
  • エラー生起タイミング:deno install
  • バージョン等
# deno --version
deno 1.19.1 (release, x86_64-unknown-linux-gnu)
v8 9.9.115.7
typescript 4.5.2
# git for-each-ref
5383d922b715002dc8706fb6af8e4a53b125b8bd commit refs/heads/main
5383d922b715002dc8706fb6af8e4a53b125b8bd commit refs/remotes/origin/HEAD
5383d922b715002dc8706fb6af8e4a53b125b8bd commit refs/remotes/origin/main
8c54c493a4588103a9f715812e6f7dad467a9853 commit refs/tags/v0.1.3
5f4309e2f10008ad01582ccbc92ec7327a858df8 commit refs/tags/v0.1.4
b3092d7d4a9208437c60cae05de43a747503fbea commit refs/tags/v0.1.5

操作ログ

以下のような新しいubuntuコンテナの操作で再現しました。

$ sudo docker run -it --rm ubuntu /bin/bash

以下、コンテナ内操作。

  • 必要なものをインストール
# apt update; apt upgrade
# apt install git curl unzip
  • git config & SSH設定
  • denoインストール
# curl -fsSL https://deno.land/install.sh | sh
# echo 'export DENO_INSTALL="/root/.deno"' >> ~/.bashrc
# echo 'export PATH="$DENO_INSTALL/bin:$PATH"' >> ~/.bashrc
# source ~/.bashrc
  • dimインストール
# git clone [email protected]:ryo-ma/dim.git
# cd dim
# deno install --unstable --allow-read --allow-write --allow-run --allow-net dim.ts
Download https://cdn.skypack.dev/encoding-japanese
Download https://deno.land/[email protected]/fmt/colors.ts
Download https://deno.land/[email protected]/fs/mod.ts
...
Download https://deno.land/x/[email protected]/src/xlsx-types.ts
Download https://jspm.dev/xlsx
...
Check file:///root/dim/dim.ts
error: TS1192 [ERROR]: Module '"https://jspm.dev/xlsx.js"' has no default export.
import xlsxlib from 'https://jspm.dev/xlsx'
       ~~~~~~~
    at https://deno.land/x/[email protected]/src/xlsx.ts:1:8

Add dim verify

Check for corruption under data_files using integirity in dim-lock.json.

SHA-512 is 128 characters in hexadecimal notation, so it is a little difficult to see.

If you are using it for checking corruption rather than for security, consider using a shorter notation such as SHA-1.

Since this is not a file that many people will see, using SHA-512 may not be too much of a problem.

Support for interactive installation

  • Add interaction option to search command
$ dim search -i xxxx

package_title1
- package_url
- package_description
- package_license
   1.resource_name1
    * resource_url1
    * resource_description1
    * created1
    * format
   2.resource_name2
    * resource_url2
    * resource_description2
    * created2
    * format
package_title2
- package_url
- package_description
- package_license
   3.resource_name3
    * resource_url3
    * resource_description3
    * created3
    * format
   4.resource_name4
    * resource_url4
    * resource_description4
    * created4
    * format
...
Enter the number of the data to install
> 1

Enter the name. Enter blank if not required.
> 

Enter the post-processing you want to add separated by spaces.
Enter blank if not required.
(ex.: > unzip xlsx-to-csv)
> unzip

installing...
unzip
Installed to /xxx/xxx
  1. Write data information to dim.json from ckan
  2. Store the data to datafiles

Change the structure of dim.json and dim-lock.json.

dim.json

{
    "fileVersion": "1.1",
    "contents": [{
      "name": "xxxxxxx", // install時に指定したname 指定しなかった場合はURL
      "url": "https://xxxx.xxx.xx", //install時に指定したurl
      "catalogUrl": "https://ckan.xxx.xx", // search -i で取得した場合は packageのカタログURLを保管 それ以外の場合はnull
      "catalogResourceId": "123456abcd", // search -i で取得した場合は resourceのidを保管 それ以外の場合はnull
      "postProcesses": [
        { "type": "unzip", "arguments": { "password": "dummy", ... } },
        "csv_to_json"
      ], // install時に指定したpost_process 文字列かObject
      "headers": { "Fiware-Service": "servicce1" }, // install時に指定したheader key:value形式
    }]
}

dim-lock.json

{
  "lockfileVersion": "1.1",
  "contents": [{
    "name": "xxxxxxx", // install時に指定したname 指定しなかった場合はURL
    "url": "https://xxxx.xxx.xx", //install時に指定したurl
    "path": "xxx/xxx/xx.json" // installした際の保存先
    "catalogUrl": ""https://ckan.xxx.xx"", // search -i で取得した場合は packageのカタログURLを保管 それ以外の場合はnull
    "catalogResourceId": "123456abc", // search -i で取得した場合は resourceのidを保管 それ以外の場合はnull
    "lastModified": "2022-07-06T02:28:06.556Z", // 取得するデータのResponse headerのlast_modifiedから取得 フォーマットはISO8601 取得できない場合はnull
    "eTag": "xxx-xxxxx", // 取得するデータのResponse headerのe-tagから取得 取得できない場合はnull 提供されるデータの変更確認などに使用
    "lastDownloaded": "2022-07-06T02:28:06.556Z", //ダウンロードを実施した時刻 旧last_updatedから変更 フォーマットはISO8601
    "integrity": "sha1-xxxxxxxx", // npmのintegrityを参考 ダウンロードしてきた時点でのファイルのハッシュ化(sha1)を行う ダウンロード後のファイル変更確認などに使用
    "postProcesses": [
      { "type": "unzip", "arguments": { "password": "dummy", ... } },
      "csv_to_json"
    ], // install時に指定したpost_process 文字列かObject
    "headers": { "Fiware-Service": "service1" }, // install時に指定したheader key:value形式
  }]
}

Python library to simplify the interaction with dim

Thank you for creating this project :)
Data installation manager is absolutely required for open source community. I faced some difficulties when developing a dataset and a data analysis tool with Python regarding COVID-19.

Is it possible to add Python (+R?) library to simplify the interaction with dim?
(I'm not sure we can call Deno from Python...)
Users may use the new library as follows.

  1. Install the library, like poetry add dim-python
  2. Write settings on "pyproject.toml" with commands. This TOML format file is the standard Python library management file currently.
[tool.dim]
directory = './data_files'
datasets = [
    {
        name = 'example',
        url = 'https://example.com',
        unzip = true,
        forced = true,
        encoding = 'utf-8',
        postprocess = ["poetry run python ./tests/test_custom_command.py",],
    },
]
  1. Update datasets with poetry run dim update, or

  2. Update/load the dataset with Python scripts.

import dim
dim.config(settings='./data_files/dim-lock.json')
data = dim.load(name='example')

I'm just a new user, but very interested in this project.

Add "XLS to CSV" converter to preprocess

It seems that "XLS to CSV" converter is somehow half working.

I tried

dim install https://www.city.chofu.tokyo.jp/www/contents/1489047638868/simple/1.xls -n "東京都調布市市立小・中学校一覧" -p xlsx-to-csv

and the result was only 1.xls was downloaded but it was converted to CSV.
I would like to see that both 1.xls and 1.csv are placed in the data_files folder.

catalogResourceId/Url is removed by `dim update`

catalogResourceId and catalogResourceUrl field are replaced with null by dim update when the data was fetched by dim search -i.

before

// dim.json
{
  "fileVersion": "1.1",
  "contents": [
    {
      "url": "https://www.geospatial.jp/ckan/dataset/30b5f8dc-8957-4b4b-880f-f348e272f591/resource/f2d3ad73-83db-45e4-a11d-48bdd15fe60b/download/14nagayotownhinan.csv",
      "name": "42_長崎県_長与町避難所_長与町避難所",
      "catalogUrl": "https://www.geospatial.jp/ckan/dataset/42000-013",
      "catalogResourceId": "f2d3ad73-83db-45e4-a11d-48bdd15fe60b",
      "postProcesses": [],
      "headers": {}
    }
  ]
}
// dim-lock.json
{
  "lockFileVersion": "1.1",
  "contents": [
    {
      "name": "42_長崎県_長与町避難所_長与町避難所",
      "url": "https://www.geospatial.jp/ckan/dataset/30b5f8dc-8957-4b4b-880f-f348e272f591/resource/f2d3ad73-83db-45e4-a11d-48bdd15fe60b/download/14nagayotownhinan.csv",
      "path": "./data_files/42_長崎県_長与町避難所_長与町避難所/14nagayotownhinan.csv",
      "catalogUrl": "https://www.geospatial.jp/ckan/dataset/42000-013",
      "catalogResourceId": "f2d3ad73-83db-45e4-a11d-48bdd15fe60b",
      "lastModified": "2018-08-27T14:32:21.000Z",
      "eTag": "ff6b437fe66ac28b776a16a249f62b36",
      "lastDownloaded": "2023-05-27T04:45:49.037Z",
      "integrity": "d3db097cb5c1213821bb79730d5c895160302f6b",
      "postProcesses": [],
      "headers": {}
    }
  ]
}

after

// dim.json
{
  "fileVersion": "1.1",
  "contents": [
    {
      "name": "42_長崎県_長与町避難所_長与町避難所",
      "url": "https://www.geospatial.jp/ckan/dataset/30b5f8dc-8957-4b4b-880f-f348e272f591/resource/f2d3ad73-83db-45e4-a11d-48bdd15fe60b/download/14nagayotownhinan.csv",
      "catalogUrl": null,
      "catalogResourceId": null,
      "postProcesses": [],
      "headers": {}
    }
  ]
}
// dim-lock.json
{
  "lockFileVersion": "1.1",
  "contents": [
    {
      "name": "42_長崎県_長与町避難所_長与町避難所",
      "url": "https://www.geospatial.jp/ckan/dataset/30b5f8dc-8957-4b4b-880f-f348e272f591/resource/f2d3ad73-83db-45e4-a11d-48bdd15fe60b/download/14nagayotownhinan.csv",
      "path": "./data_files/42_長崎県_長与町避難所_長与町避難所/14nagayotownhinan.csv",
      "catalogUrl": null,
      "catalogResourceId": null,
      "lastModified": "2018-08-27T14:32:21.000Z",
      "eTag": "ff6b437fe66ac28b776a16a249f62b36",
      "lastDownloaded": "2023-05-27T04:46:51.049Z",
      "integrity": "d3db097cb5c1213821bb79730d5c895160302f6b",
      "postProcesses": [],
      "headers": {}
    }
  ]
}

Necessity of -A option for update when name is specified

When name is specified and update is performed on a single data, the presence or absence of the -A option has no effect on the operation.

The operation of the following two commands is identical.

dim update name
dim update name -A

Proposals:

Disallow the name and -A option to be specified at the same time.

Support to search function

  • Create a search command

  • Create a search function

    • Search data from package_search CKAN API
    • Specify the number of data to get by option -n (default 10)

Search Results

$ dim search xxxxxx
package_title1
- package_url
- package_description
- package_license
   1.resource_name1
    * resource_url1
    * resource_description1
    * created1
    * format
   2.resource_name2
    * resource_url2
    * resource_description2
    * created2
    * format
package_title2
- package_url
- package_description
- package_license
   3.resource_name3
    * resource_url3
    * resource_description3
    * created3
    * format
   4.resource_name4
    * resource_url4
    * resource_description4
    * created4
    * format

Add dim clean

Delete only data_files.
To delete all, rm and init.

Destination of the unzip decompression seems to be wrong

If the install command is executed with unzip in the -p option, the unzipped file is not generated in the data_file as in xlsx-to-csv, but in the current directory.

Proposals:

Change it so that it is generated in the same directory as the file before the change, as in xlsx-to-csv.

バージョンがアップデートされていない

以下でバージョンを定数として定義していますが、v1.0.4が最新版であるにもかかわらずv1.0.3のままになっています。これにより、リリースにあるバイナリの最新版をインストールしても New version available: v1.0.4 と出てきます。

export const VERSION = "v1.0.3";

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.