Giter Club home page Giter Club logo

orcli's People

Contributors

felixlohmeier avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

orcli's Issues

generate markdown help

# generate markdown files in subdirectory "help" from cli help screens
name="orcli"
files=(src/*_command.sh)

for f in "${files[@]}"; do
command="${f#src/}"
command="${command%_command.sh}"
commands+=( "${command}" )
done

mkdir -p help
for command in "${commands[@]}"; do
echo "# $name ${command//_/ }" > help/"${command}".md
echo '```sh' >> help/"${command}".md
$name ${command//_/ } --help >> help/"${command}".md
echo '```' >> help/"${command}".md
echo "code: [src/${command}_command.sh](../src/${command}_command.sh)" >> help/"${command}".md
done

echo "# $name" > help/README.md
echo '## command help screens' >> help/README.md
for command in "${commands[@]}"; do
echo "- [${command}](${command}.md)" >> help/README.md
done
echo '## main help screen' >> help/README.md
echo '```sh' >> help/README.md
$name --help >> help/README.md
echo '```' >> help/README.md

error

improve src/lib/error.sh:

echo >&2 "[$(date +'%Y-%m-%dT%H:%M:%S')]: ERROR $1"
shift
for err in "$@"; do echo >&2 "$err"; done
exit 1
}

remove exit statements in commands

run --dry

explore ways to support a dry-run

  • option 1: set -x (resolves variables)
  • option 2: trap 'echo "# $BASH_COMMAND"' DEBUG (commands remain unchanged)

if it runs well, could you possibly generate a standalone script with it by excluding output?

  • trap 'echo "# $BASH_COMMAND" >&2' DEBUG
  • ./test.sh > /dev/null

export template --split

flag: --split

  • help: one file per record; filename = first column (needs to be unique)
  • --splitPrefix (optional)
  • --splitSuffix (optional)
  • add ":::::ORCLI:::::" + row.columnNames[0] + "\n" to row template
cat test | gawk -v prefix="$splitPrefix" -v suffix="$splitSuffix" '/^:::::ORCLI:::::/{if(file){close(file)};file=prefix$2suffix;next}{print > file}'
  • dependency: GNU awk (gawk command)
  • performance?

extract facet config from permalink?

permalink="http://127.0.0.1:3333/project?project=2499881818253&ui=%7B%22facets%22%3A%5B%7B%22c%22%3A%7B%22type%22%3A%22list%22%2C%22name%22%3A%22Starred%20Rows%22%2C%22columnName%22%3A%22%22%2C%22expression%22%3A%22row.starred%22%2C%22omitBlank%22%3Afalse%2C%22omitError%22%3Afalse%2C%22selectBlank%22%3Afalse%2C%22selectError%22%3Afalse%2C%22invert%22%3Afalse%7D%2C%22o%22%3A%7B%22scroll%22%3Afalse%2C%22sort%22%3A%22name%22%7D%2C%22s%22%3A%5B%7B%22v%22%3A%7B%22v%22%3Afalse%2C%22l%22%3A%22false%22%7D%7D%5D%7D%2C%7B%22c%22%3A%7B%22type%22%3A%22text%22%2C%22name%22%3A%22File%22%2C%22columnName%22%3A%22File%22%2C%22mode%22%3A%22text%22%2C%22caseSensitive%22%3Afalse%2C%22invert%22%3Afalse%2C%22query%22%3A%22LE%22%7D%7D%5D%7D"
function urldecode() { : "${*//+/ }"; echo -e "${_//%/\\x}"; }
decoded="$(urldecode "$permalink")"
echo ${decoded##*ui=} | jq -c '[.facets[] | (. * .c) | .["selection"] = .s | del(.c, .o, .s)]'

batch: use heredoc (or linebreaks in files) to separate command groups

new syntax:

# multiple projects
orcli batch << EOF
import csv "https://git.io/fj5hF" --projectName "duplicates"
import csv "https://git.io/fj5hF" --projectName "copy"
info "duplicates"
info "copy"
export tsv "duplicates"
EOF

# multiple projects running in parallel #25
orcli batch << EOF
import csv "https://git.io/fj5hF" --projectName "duplicates" &
import csv "https://git.io/fj5hF" --projectName "copy" &
wait
export csv "duplicates" --output duplicates.csv &
export tsv "duplicates" --output duplicates.tsv &
wait
EOF

# shell commands
orcli batch << EOF
import csv "https://git.io/fj5hF" --projectName "duplicates" &
import csv "https://git.io/fj5hF" --projectName "copy" &
wait
echo "finished import"
export tsv "duplicates" --output duplicates.csv &
export tsv "duplicates" --output duplicates.tsv &
wait
wc duplicates*
echo "finished in \$SECONDS seconds"
EOF

implementation:

  • use bash -e to execute heredoc / file
  • use export for variables OPENREFINE_URL, tmpdir and openrefine_pid
  • enhance heredoc/file with sed before exec to support #22, #39 and #43
  • explicit flag —parallel #25 is obsolete?

import csv

args: file(s) or url(s)
out: project id

docker

provide a docker container with openrefine and orcli run entrypoint

  • build with github actions?
  • based on Debian?
  • weekly rebuild?
  • multi-arch?

tutorial

provide a small getting started tutorial in README.md chapter "Usage" (instead of help screen)

start and stop

start

  • flags: port, workspace directory, memory, autosave
  • set env OPENREFINE_URL to http://localhost:${port}
  • -x refine.autosave

stop

  • flags: port, kill (to use signal -9)
  • dependency: lsof (or fuser?)
PID=$(lsof -t -i:${PORT})
kill $PID
while ps -p $PID > /dev/null; do sleep 1; done

import and export format guessing

features:

  • import (without sub-command) should derive format from first arg
  • export (without sub-command) should derive format from file ext of output parameter (and default to tsv?)

approach:

  • collect all args via bashly catch-all setting
  • search ${other_args[@]} for file ext (.*) and save unique items (sort -u) in array ext
  • error if ${ext[0]} is blank (e.g. stdin)
  • error if ${ext[1]} is non-blank
  • call orcli import ${ext[0]} ${other_args[@]}

caveat:

  • bashly does not support args for command groups (root.commands[0] cannot have both commands and args) so we have to drop sub-commands and put all commands in top-level (which may be overwhelming)
    • import
    • import-csv
    • import-tsv
    • ...

csrf

  • remove OPENREFINE_CSRF env in bashly.yml
  • improve src/lib/get_csrf.sh
    if ! [[ "${response}" == '{"token":"'* ]]; then
      local version
      version=$(curl -fs http://localhost:3333/command/core/get-version | jq -r '.version' | cut -c 1-3)
    
    • error "no OpenRefine reachable/running at ${OPENREFINE_URL}" if $version is blank
    • pass if $version is in 3.2 3.1 3.0 2.8 2.7
    • else error "getting CSRF token failed!"

flag --project instead of ARG

option 1: args

orcli batch
orcli list
orcli import FILE…
orcli transform PROJECT FILE…
orcli export PROJECT COLUMN…
orcli show PROJECT
orcli delete PROJECT

option 2: flags

orcli batch
orcli list
orcli import FILE…
orcli transform FILE… --project
orcli export COLUMN… --project
orcli show --project
orcli delete --project

batch short syntax

support short syntax if ${other_args[@} does not contain any option --projectName and only one import command

example: orcli batch import-csv "https://git.io/fj5hF" export-tsv

approach:

  • get projectid from import (stdout?) or call list command after import?
  • add projectid to (the end of) command group arrays that start with info transform export (...)

search

this is essentially a shortcut for

orcli export tsv PROJECT --facets '[ { "type": "list", "expression": "grel:filter(row.columnNames,cn,cells[cn].value.find(/TERM/).length()>0).length()>0", "columnName": "", "selection": [ { "v": { "v": true } } ] } ]'

orcli search PROJECT TERM - apply regex to each column individually and show only matching rows

  • flag: match COLUMN
    • optional, repeatable
    • replaces row.columnNames in filter expression
    • no column(s) = all columns
  • flag: select COLUMN
    • optional, repeatable
    • filter result set to one or more columns
    • no column(s) = all columns
  • flag: mode (rows/records, default rows)
  • output: write tsv to stdout

macOS support?

  • recommend install via homebrew?
brew install bash jq
  • different behavior of macOS (BSD) utilities and GNU core utilities?
  • curl version?
  • ~/.local/bin/ in $PATH?

run --interactive

bash --rcfile <(
history -a
set +o history
cat ~/.bashrc
cat << 'EOF'
PS1="(orcli) [\u@\h \W]\$ "
eval "$(orcli completions)"
echo '================================================================'
echo 'Interactive Bash shell with OpenRefine running in the background'
echo 'Use the "orcli" command and tab completion to control OpenRefine'
echo 'Type "history -a FILE" to write out your session history'
echo 'Type "exit" or CTRL-D to destroy temporary OpenRefine workspace'
echo '================================================================'
EOF
set -o history
) -i

batch --parallel

support running commands in parallel

approach:

  • all command groups will be run as background jobs
  • wait function (that checks exit codes) between import-transform and transform-export
  • user may add additional waits

example:

orcli batch --parallel \
import a.csv --projectName p1 \
import b.csv --projectName p2 \
transform a.json p1 \
transform a.json p2 \
wait \
transform b.json p1 \
export 1

export csv/tsv options

implement remaining export options

bashly.yml snippet:

        flags:
          - long: --select
            help: filter result set to one or more columns
            repeatable: true
          - long: --facets
            help: filter result set by providing an OpenRefine facets config in json
          - long: --mode
            help: set operation mode
            arg: mode
            default: "row-based"
          - long: --noHeader
            help: do not output column header
          - long: --blankRows
            help: output blank rows
          - long: --quoteAll
            help: quote all cells
          - long: --preview
            help: limit export to 10 rows/records
          - long: --separator
            help: character(s) that separates columns
            arg: separator
            default: "\t"
          - long: --lineSeparator
            help: character(s) that separates rows/records
            arg: lineSeparator
            default: "\n"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.