Giter Club home page Giter Club logo

twinkle-parser's Introduction

twinkle-parser

npm version Code Climate maintainability Codecov

Parse CSV from https://kdb.tsukuba.ac.jp to structured JSON.

Usage

From CLI

Quick use

npx twinkle-parser data_from_kdb.csv

Global Install

# Install with NPM
npm install -g twinkle-parser

# Install with yarn
yarn add --global twinkle-parser

# Then
twinkle-parser data_from_kdb.csv

Use inside repo

# * Inside repo directory
# Install dependencies
yarn

# Parse
yarn run parse data_from_kdb.csv

As API

# Install with NPM
npm install twinkle-parser

# Install with yarn
yarn add twinkle-parser
const parse = require('twinkle-parser')
const data = parse('CSV string here') // -> KDBData

CLI options

Option
-o PATH / --output PATH Export result to a file at the PATH instead of to stdout.
-p / --pretty Prettify json output.
--fields Fields to be included (comma-separated, specifing all if not set)
-h / --help Print help & usage.
-v / --version Print version info

Output Format

{
  "COURSE_ID": {

    "title": "Twinkle",

    // Class type (defined value by original data)
    "type": 1,

    // Course unit
    "unit": 1,

    // Course target grades
    "targets": [1, 2],

    // Terms & Modules
    // 0 = Spring A, 1 = Spring B, ...
    "termStr": "春AB",
    "terms": [ 0, 1 ],

    // Day & Period sets
    "periodStr": "月1-3\n水4-6",
    "periods": [
      // [ Days( 0 = Sun. 1 = Mon. ... ), Periods ]
      [ [ 1 ], [ 0, 1, 2 ] ],
      [ [ 3 ], [ 4, 5, 6 ] ]
    ],

    // Rooms
    "rooms": [ "7A106", "7C202" ],

    // Instructors
    "instructors": [ "筑波 太郎" ],

    // Overview & Remarks
    "overview": "",
    "remarks": ""

    // Last update in unix time
    "updatedAt": 1583390263000
  }
}

TypeScript Support

TypeScript supported! 🎉

// This will be imported with types
import parse from 'twinkle-parser'

// And types for output data are also available
import { KDBData, KDBCourse } from 'twinkle-parser'

Contribution

Issue or PR submissions are welcome.

twinkle-parser's People

Contributors

dependabot-preview[bot] avatar dependabot[bot] avatar eggplants avatar github-actions[bot] avatar mergery[bot] avatar mimori256 avatar nandenjin avatar renovate[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

mimori256

twinkle-parser's Issues

Encoding conversion failure

EB10043: in teachers field does not be converted correctly (in output: ?)

It seems to be problem of encoding conversion, from Shift_JIS to UTF-8.

CsvError: Invalid Closing Quote: got "T" at line 6955 instead of delimiter, record delimiter, trimable character

Problem

twinkle-parser csv_of_full_courses_of_2021.csv

nandenjin@dhcp8-124 Downloads % twinkle-parser kdb_20210401123849.csv
/Users/nandenjin/workspace/twinkle-parser/node_modules/csv-parse/lib/sync.js:21
  if(err1 !== undefined) throw err1
                         ^

CsvError: Invalid Closing Quote: got "T" at line 6955 instead of delimiter, record delimiter, trimable character (if activated) or comment
    at Parser.__parse (/Users/nandenjin/workspace/twinkle-parser/node_modules/csv-parse/lib/index.js:599:17)
    at Object.module.exports [as default] (/Users/nandenjin/workspace/twinkle-parser/node_modules/csv-parse/lib/sync.js:20:23)
    at parse (/Users/nandenjin/workspace/twinkle-parser/dist/index.js:41:32)
    at Object.<anonymous> (/Users/nandenjin/workspace/twinkle-parser/bin/twinkle-parser.js:58:16)
    at Module._compile (internal/modules/cjs/loader.js:999:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1027:10)
    at Module.load (internal/modules/cjs/loader.js:863:32)
    at Function.Module._load (internal/modules/cjs/loader.js:708:14)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:60:12)
    at internal/main/run_main_module.js:17:47 {
  code: 'CSV_INVALID_CLOSING_QUOTE',
  column: 13,
  empty_lines: 0,
  error: undefined,
  header: false,
  index: 13,
  invalid_field_length: 0,
  quoting: false,
  lines: 6955,
  records: 6954
}

Cause

0A00506 contains invalid-formatted string. Characters " are not escaped correctly.

"0A00506","「考える」動物としての人間―東西哲学からの考察","1"," 1.0","1 - 5","春季休業中","集中","","吉水 千鶴子,井川 義次,津崎 良典,志田 泰盛,土井 裕人","「考える」のは人間の特性である。人間は言葉を使って知性によって「考え」る。だが「考える」とはどのような営為なのか、東西の哲学がどのように「考え」てきたのかを参照しながら「考える」ことについて「考え」る。","対面","","","The Human Being as a "Thinking" Animal: Viewed from the Perspectives of Philosophy East and West","01ZZ622","「考える」動物としての人間-東西哲学からの考察","2020-12-23 11:45:03"

bug: CsvError: Invalid Closing Quote: got " " at line 1 instead of delimiter, row delimiter, trimable character (if activated) or comment

KdB was renewed and became to insert table head. Relating with this, twinkle-parser throws an error when loading CSV:

CsvError: Invalid Closing Quote: got " " at line 1 instead of delimiter, row delimiter, trimable character (if activated) or comment
    at Parser.__parse (/Users/nandenjin/workspace/twinkle-parser/node_modules/csv-parse/lib/index.js:533:17)
    at Object.module.exports [as default] (/Users/nandenjin/workspace/twinkle-parser/node_modules/csv-parse/lib/sync.js:20:23)
    at parse (/Users/nandenjin/workspace/twinkle-parser/dist/index.js:29:32)
    at Object.<anonymous> (/Users/nandenjin/workspace/twinkle-parser/bin/twinkle-parser.js:64:16)

Feature Request: Support array-format output

Current output is key-value dictionary with key as course id. This is request for implemention of output with array of KDBCourse.

New format

{
  [
    "id": "AB00000",
    "title": "Twinkle",
    
    // Terms & Modules
    // 0 = Spring A, 1 = Spring B, ...
    "termStr": "春AB",
    "terms": [ 0, 1 ],
    
    // Day & Period sets
    "periodStr": "月1-3\n水4-6",
    "periods": [
      // [ Days( 0 = Sun. 1 = Mon. ... ), Periods ]
      [ [ 1 ], [ 0, 1, 2 ] ],
      [ [ 3 ], [ 4, 5, 6 ] ]
    ],
    
    // Rooms
    "rooms": [ "7A106", "7C202" ],
    
    // Instructors
    "instructors": [ "筑波 太郎" ],
    
    // Overview & Remarks
    "overview": "",
    "remarks": ""
    
  }
]

Current

{
  "COURSE_ID": {
  
    "title": "Twinkle",
    
    // Terms & Modules
    // 0 = Spring A, 1 = Spring B, ...
    "termStr": "春AB",
    "terms": [ 0, 1 ],
    
    // Day & Period sets
    "periodStr": "月1-3\n水4-6",
    "periods": [
      // [ Days( 0 = Sun. 1 = Mon. ... ), Periods ]
      [ [ 1 ], [ 0, 1, 2 ] ],
      [ [ 3 ], [ 4, 5, 6 ] ]
    ],
    
    // Rooms
    "rooms": [ "7A106", "7C202" ],
    
    // Instructors
    "instructors": [ "筑波 太郎" ],
    
    // Overview & Remarks
    "overview": "",
    "remarks": ""
    
  }
}

File export

Now, I export formatted json like this:

$ yarn run parse kdb.csv|
sed -r '1,2d;s/(})[^}]+$/\1/'|
python -c 'import sys,json
print(
    json.dumps(
        json.loads(sys.stdin.read()),
        indent=4,
        ensure_ascii=False
    )
)' >export_kdb.json

I want to be more easy to export to a local file. Could I ask you to do that...?
Thanks.

Different periods per terms cannot be exported

Courses that have different periods per terms cannot be exported.

Test cases:

  • 01EH128
  • 01EJ636
  • EB10053
  • FCB1311
  • FCB1321
  • FCB1331
  • FCB1301

Current output schema cannot express this rule of schedule. Maybe it should be fixed with next major update.

CLI not working

From CLI:

~/twinkle-parser $ yarn run parse ../Downloads/kdb_20191114172106.csv 
yarn run v1.19.1
$ node dist/index.js ../Downloads/kdb_20191114172106.csv
module.js:549
    throw err;
    ^

Error: Cannot find module '/home/eggplants/twinkle-parser/dist/index.js'
    at Function.Module._resolveFilename (module.js:547:15)
    at Function.Module._load (module.js:474:25)
    at Function.Module.runMain (module.js:693:10)
    at startup (bootstrap_node.js:188:16)
    at bootstrap_node.js:609:3
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

/dist/ directory is missing. Please correct this probrem, or tell me how to fix it.
Thank you.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Awaiting Schedule

These updates are awaiting their schedule. Click on a checkbox to get an update now.

  • chore(deps): lock file maintenance

Detected dependencies

github-actions
.github/workflows/codeql-analysis.yml
  • actions/checkout v4
  • github/codeql-action v3
  • github/codeql-action v3
  • github/codeql-action v3
.github/workflows/push.yml
  • actions/checkout v4
  • actions/setup-node v4
  • actions/checkout v4
  • actions/setup-node v4
  • codecov/codecov-action v4
.github/workflows/release-please.yml
  • google-github-actions/release-please-action v4
  • actions/checkout v4
  • actions/setup-node v4
npm
package.json
  • consola ^3.0.0
  • csv-parse ^5.3.0
  • iconv-lite ^0.6.2
  • minimist ^1.2.0
  • @eslint/eslintrc 3.0.2
  • @eslint/js 9.1.1
  • @types/iconv 3.0.4
  • @types/jest 29.5.12
  • @types/minimist 1.2.5
  • @types/node 20.12.7
  • eslint 9.1.1
  • eslint-config-prettier 9.1.0
  • eslint-plugin-jest 28.2.0
  • globals 15.0.0
  • jest 29.7.0
  • prettier 3.2.5
  • ts-jest 29.1.2
  • ts-node 10.9.2
  • typescript 5.4.5
  • typescript-eslint 7.7.1
  • node 20
nvm
.nvmrc
  • node 20

  • Check this box to trigger a request for Renovate to run again on this repository

Duplication of terms, periods, etc.

Sample case: EB10043(2019) (Thanks for @asagatto777)

The values of terms, periods and rooms are duplicated and they doesn't normalized even in outputs.

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.