opencultureconsulting / orcli Goto Github PK
View Code? Open in Web Editor NEWOpenRefine command-line interface written in Bash (💎+🤖). Supports batch processing (import, transform, export).
License: MIT License
OpenRefine command-line interface written in Bash (💎+🤖). Supports batch processing (import, transform, export).
License: MIT License
provide binder config and badge
https://gitpod.io/#https://github.com/opencultureconsulting/orcli/issues/5
# generate markdown files in subdirectory "help" from cli help screens
name="orcli"
files=(src/*_command.sh)
for f in "${files[@]}"; do
command="${f#src/}"
command="${command%_command.sh}"
commands+=( "${command}" )
done
mkdir -p help
for command in "${commands[@]}"; do
echo "# $name ${command//_/ }" > help/"${command}".md
echo '```sh' >> help/"${command}".md
$name ${command//_/ } --help >> help/"${command}".md
echo '```' >> help/"${command}".md
echo "code: [src/${command}_command.sh](../src/${command}_command.sh)" >> help/"${command}".md
done
echo "# $name" > help/README.md
echo '## command help screens' >> help/README.md
for command in "${commands[@]}"; do
echo "- [${command}](${command}.md)" >> help/README.md
done
echo '## main help screen' >> help/README.md
echo '```sh' >> help/README.md
$name --help >> help/README.md
echo '```' >> help/README.md
prevent OpenRefine from opening the browser
include some lines of OpenRefine's server log in error messages
improve src/lib/error.sh:
echo >&2 "[$(date +'%Y-%m-%dT%H:%M:%S')]: ERROR $1"
shift
for err in "$@"; do echo >&2 "$err"; done
exit 1
}
remove exit statements in commands
explore ways to support a dry-run
set -x
(resolves variables)trap 'echo "# $BASH_COMMAND"' DEBUG
(commands remain unchanged)if it runs well, could you possibly generate a standalone script with it by excluding output?
trap 'echo "# $BASH_COMMAND" >&2' DEBUG
./test.sh > /dev/null
flag: --split
":::::ORCLI:::::" + row.columnNames[0] + "\n"
to row templatecat test | gawk -v prefix="$splitPrefix" -v suffix="$splitSuffix" '/^:::::ORCLI:::::/{if(file){close(file)};file=prefix$2suffix;next}{print > file}'
permalink="http://127.0.0.1:3333/project?project=2499881818253&ui=%7B%22facets%22%3A%5B%7B%22c%22%3A%7B%22type%22%3A%22list%22%2C%22name%22%3A%22Starred%20Rows%22%2C%22columnName%22%3A%22%22%2C%22expression%22%3A%22row.starred%22%2C%22omitBlank%22%3Afalse%2C%22omitError%22%3Afalse%2C%22selectBlank%22%3Afalse%2C%22selectError%22%3Afalse%2C%22invert%22%3Afalse%7D%2C%22o%22%3A%7B%22scroll%22%3Afalse%2C%22sort%22%3A%22name%22%7D%2C%22s%22%3A%5B%7B%22v%22%3A%7B%22v%22%3Afalse%2C%22l%22%3A%22false%22%7D%7D%5D%7D%2C%7B%22c%22%3A%7B%22type%22%3A%22text%22%2C%22name%22%3A%22File%22%2C%22columnName%22%3A%22File%22%2C%22mode%22%3A%22text%22%2C%22caseSensitive%22%3Afalse%2C%22invert%22%3Afalse%2C%22query%22%3A%22LE%22%7D%7D%5D%7D"
function urldecode() { : "${*//+/ }"; echo -e "${_//%/\\x}"; }
decoded="$(urldecode "$permalink")"
echo ${decoded##*ui=} | jq -c '[.facets[] | (. * .c) | .["selection"] = .s | del(.c, .o, .s)]'
new syntax:
# multiple projects
orcli batch << EOF
import csv "https://git.io/fj5hF" --projectName "duplicates"
import csv "https://git.io/fj5hF" --projectName "copy"
info "duplicates"
info "copy"
export tsv "duplicates"
EOF
# multiple projects running in parallel #25
orcli batch << EOF
import csv "https://git.io/fj5hF" --projectName "duplicates" &
import csv "https://git.io/fj5hF" --projectName "copy" &
wait
export csv "duplicates" --output duplicates.csv &
export tsv "duplicates" --output duplicates.tsv &
wait
EOF
# shell commands
orcli batch << EOF
import csv "https://git.io/fj5hF" --projectName "duplicates" &
import csv "https://git.io/fj5hF" --projectName "copy" &
wait
echo "finished import"
export tsv "duplicates" --output duplicates.csv &
export tsv "duplicates" --output duplicates.tsv &
wait
wc duplicates*
echo "finished in \$SECONDS seconds"
EOF
implementation:
args: file(s) or url(s)
out: project id
replace working directory with script directory
provide a docker container with openrefine and orcli run entrypoint
calculate starting time and run time of each step
print RAM and CPU usage for each step
see bash-refine templates and openrefine-batch.sh
provide a small getting started tutorial in README.md chapter "Usage" (instead of help screen)
start
-x refine.autosave
stop
PID=$(lsof -t -i:${PORT})
kill $PID
while ps -p $PID > /dev/null; do sleep 1; done
print only "real" errors to STDERR
provide row template as FILE, URL or STDIN
create https://asciinema.org recording of basic features and convert it to svg with https://github.com/marionebl/svg-term-cli to enrich README.md
create documentation with examples of how to upgrade to orcli from...
maybe directly in README.md of the old repos (before read-only switching)
features:
approach:
${other_args[@]}
for file ext (.*) and save unique items (sort -u) in array ext${ext[0]}
is blank (e.g. stdin)${ext[1]}
is non-blankorcli import ${ext[0]} ${other_args[@]}
caveat:
root.commands[0] cannot have both commands and args
) so we have to drop sub-commands and put all commands in top-level (which may be overwhelming)
if ! [[ "${response}" == '{"token":"'* ]]; then
local version
version=$(curl -fs http://localhost:3333/command/core/get-version | jq -r '.version' | cut -c 1-3)
error "no OpenRefine reachable/running at ${OPENREFINE_URL}"
if $version is blankerror "getting CSRF token failed!"
get metadata for a project (see Utilities in https://code.opencultureconsulting.com/opencultureconsulting/bash-refine/src/branch/master/templates.sh)
https://gitpod.io/#https://github.com/opencultureconsulting/orcli/issues/9
option 1: args
orcli batch
orcli list
orcli import FILE…
orcli transform PROJECT FILE…
orcli export PROJECT COLUMN…
orcli show PROJECT
orcli delete PROJECT
option 2: flags
orcli batch
orcli list
orcli import FILE…
orcli transform FILE… --project
orcli export COLUMN… --project
orcli show --project
orcli delete --project
support short syntax if ${other_args[@} does not contain any option --projectName
and only one import
command
example: orcli batch import-csv "https://git.io/fj5hF" export-tsv
approach:
info
transform
export
(...)bash -e
)MYTMPDIR="$(mktemp -d)"
trap 'rm -rf -- "$MYTMPDIR"' EXIT
https://gitpod.io/#https://github.com/opencultureconsulting/orcli/issues/6
this is essentially a shortcut for
orcli export tsv PROJECT --facets '[ { "type": "list", "expression": "grel:filter(row.columnNames,cn,cells[cn].value.find(/TERM/).length()>0).length()>0", "columnName": "", "selection": [ { "v": { "v": true } } ] } ]'
orcli search PROJECT TERM - apply regex to each column individually and show only matching rows
brew install bash jq
bash --rcfile <(
history -a
set +o history
cat ~/.bashrc
cat << 'EOF'
PS1="(orcli) [\u@\h \W]\$ "
eval "$(orcli completions)"
echo '================================================================'
echo 'Interactive Bash shell with OpenRefine running in the background'
echo 'Use the "orcli" command and tab completion to control OpenRefine'
echo 'Type "history -a FILE" to write out your session history'
echo 'Type "exit" or CTRL-D to destroy temporary OpenRefine workspace'
echo '================================================================'
EOF
set -o history
) -i
support running commands in parallel
approach:
wait
sexample:
orcli batch --parallel \
import a.csv --projectName p1 \
import b.csv --projectName p2 \
transform a.json p1 \
transform a.json p2 \
wait \
transform b.json p1 \
export 1
implement remaining export options
bashly.yml snippet:
flags:
- long: --select
help: filter result set to one or more columns
repeatable: true
- long: --facets
help: filter result set by providing an OpenRefine facets config in json
- long: --mode
help: set operation mode
arg: mode
default: "row-based"
- long: --noHeader
help: do not output column header
- long: --blankRows
help: output blank rows
- long: --quoteAll
help: quote all cells
- long: --preview
help: limit export to 10 rows/records
- long: --separator
help: character(s) that separates columns
arg: separator
default: "\t"
- long: --lineSeparator
help: character(s) that separates rows/records
arg: lineSeparator
default: "\n"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.