dbohdan / sqawk Goto Github PK

View Code? Open in Web Editor NEW

308.0 19.0 14.0 583 KB

Like awk but with SQL and table joins

License: MIT License

Makefile 0.59% Tcl 99.13% Batchfile 0.28%

awk sql data-wrangling cli csv tsv delimited-files data-transformation converter json

sqawk's Issues

Tcl and JSON parser: do not read the entire file in kv=1 lines=1 mode

Read it line-by-line.

sqawk/lib/parsers/tcl.tcl

Line 32 in 6ffc59e

set lines [split [string trim [read $channel]] \n]

is there a limit of number of lines for input files?

Hi,
I was testing the "summing up" example and noticed that sqawk gives I/O error for more than 10K lines. I was wondering if this is related to hardware limitations or software limitations? Is it possible to override this limit?
Also, I was surprised that it gives I/O error since I'm not reading any data from a file or writing results to a file.

Here are the codes to reproduce:

10K lines

$ awk -v seed=$RANDOM 'BEGIN{srand(seed); split("ABCDEFGHIJKLMNOPQRSTUVWXYZ",chars,""); for (n=1;n<10000;n++){loc=int(rand()*26); num=int(rand()*10000);  print chars[loc+1]" "num }}'  | sqawk 'select a1, sum(a2) from a group by a1 order by a1'
A 1908044
B 2134362
C 1979623
D 1913669
E 1989040
F 1947681
G 1853303
...

100K lines

$ awk -v seed=$RANDOM 'BEGIN{srand(seed); split("ABCDEFGHIJKLMNOPQRSTUVWXYZ",chars,""); for (n=1;n<100000;n++){loc=int(rand()*26); num=int(rand()*10000);  print chars[loc+1]" "num }}'  | sqawk 'select a1, sum(a2) from a group by a1 order by a1'

error: disk I/O error

Escaping special characters in SELECT

I have a table of data where one of the column has a special character #

  #  Done Priority Get      Size  Name
  0: 100% Normal   Yes  709.3 MB  FILE_NAME

When I pipe it to sqawk it threw unrecognized token: "#" at me

sqawk -output json,indent=1 \
'select #,Done,Priority,Get,Size,Name from a' \
trim=left \
fields=1,2,3,4,5-end \
header=1

If I replace # to ID, then it is working perfectly.
I have tried to escape # with the following but none works

'select \#,Done,Priority,Get,Size,Name from a'
"select \#,Done,Priority,Get,Size,Name from a"

table print assistance

Hello,

What do you expect with this ls -l | sqawk -output table 'select a1 from a' trim=left?

I would expect this to only select the first column. Instead, all rows are printed:

┌──────────┬──┬────┬────┬────┬───┬──┬─────┬─────────────────────┐
│   total  │23│    │    │    │   │  │     │                     │
├──────────┼──┼────┼────┼────┼───┼──┼─────┼─────────────────────┤
│drwxr-xr-x│ 7│sean│sean│  7 │Aug│31│01:59│         bin         │
├──────────┼──┼────┼────┼────┼───┼──┼─────┼─────────────────────┤
│-rw-r--r--│ 1│sean│sean│ 419│Aug│31│03:45│exp_1441043133748.csv│
├──────────┼──┼────┼────┼────┼───┼──┼─────┼─────────────────────┤
│drwxr-xr-x│ 5│root│sean│  5 │Aug│21│03:04│        perl5        │
├──────────┼──┼────┼────┼────┼───┼──┼─────┼─────────────────────┤
│-rw-r--r--│ 1│sean│sean│3892│Aug│28│03:14│      script.sh      │
├──────────┼──┼────┼────┼────┼───┼──┼─────┼─────────────────────┤
│-rw-r--r--│ 1│sean│sean│ 60 │Aug│20│07:54│ tmux-client-4279.log│
├──────────┼──┼────┼────┼────┼───┼──┼─────┼─────────────────────┤
│-rw-r--r--│ 1│sean│sean│9570│Aug│20│07:54│ tmux-server-4281.log│
└──────────┴──┴────┴────┴────┴───┴──┴─────┴─────────────────────┘

From you example, 'ps | sqawk -output table 'select a1 from a' trim=left' this also prints full data.

Rethink 'columns', 'F0', 'merge' and 'skip'

Compiling assistance on freebsd

Hello,

FreeBSD calls tclsh tclsh8.6, so when attempting to install your program, it can 't find tclsh. What file do I need to amend to have it find tclsh8.6?

Thanks!

tabulate can not correctly deal with Chinese character

One Chinese character occupy two English character space, example：

**
1234

so the tabulate has problem with Chinese character

truncating zeros

Hello,

% cat test.csv
numbers,letters
001,a
002,b
003,c
004,d
5,e
6,f
7,g
8,i
9,j
10,j

When doing cat test.csv| sqawk -output table 'select a1,a2 from a' trim=left format=csv2, 001-004 turn into just 1,2,3,4. Actually all output options drop the 0 for 001-004.

If a trim option and format are not specified, then 001-004 are printed but its on one table.

Is there some work around to support numbers with 0 in front of numbers? If I add a decimal point in front of 001 and 002, sqawk will support this:
% cat test.csv| sqawk -output json 'select a1,a2 from a' format=csvalt trim=none

[{"a1":"numbers","a2":"letters"},{"a1":"0.001","a2":"a"},{"a1":"0.002","a2":"b"},{"a1":"3","a2":"c"},{"a1":"4","a2":"d"},{"a1":"5","a2":"e"},{"a1":"6","a2":"f"},{"a1":"7","a2":"g"},{"a1":"8","a2":"i"},{"a1":"9","a2":"j"},{"a1":"10","a2":"j"}]

I did json above because it's far easier to format that in the comment.

patch for new option -numcolumns

The ps | jimsh ./tabulate.tcl example on the tcler's wiki is kind of a red herring--in real life the 4th column is multiple words and the result is gibberish. I'm not sure how to make a patch so I've included the updated ::tabulate::main procedure to handle that new switch. This makes ps | tclsh ./tabulate.tcl -numcolumns 4 look good.

# Read the input, process the command line options and output the result.
proc ::tabulate::main {argv0 argv} {
    options::process $argv \
        store -FS in FS default {} \
        store -style in style default default \
        store -alignments or -align in alignments default {} \
        store -margins in margins default 0 \
        store -numcolumns in numcolumns default 0
    set data [split [string trim [read stdin]] \n]

    # Input field separator. If none is given treat each line of input as a Tcl
    # list.
    if {$FS ne {}} {
        set updateData {}
        foreach line $data {
            lappend updateData [split $line $FS]
        }
        set data $updateData
    }
    if {$numcolumns > 0} {
        if {$FS eq {}} { set FS " "}
        set updateData {}
        foreach line $data {
            set last [join [lrange $line $numcolumns-1 end] $FS]
            lappend updateData [lreplace $line $numcolumns-1 end $last]
        }
        set data $updateData
    }

    # Accept style names rather than style *values* that ::tabulate::tabulate
    # normally takes.
    set styleName [::tabulate::dict-get-default $argv default -style]

    puts [tabulate -data $data \
            -style [::tabulate::style::by-name $style] \
            -alignments $alignments \
            -margins $margins]
}

Revise the command line interface before declaring it stable

Note that Sqawk is meant to be used as a command line program. The ::sqawk Tcl API will probably never be stable.

Tests failing

Hello,

Updated sqawk this morning, but the tests are failing--all of them:
https://gist.github.com/jungle-boogie/ac7b7d410375466120b8

Make the parsers and the serializers lazy

-output json 'select * from a' causes too many json fields to be included

I have a command:

❯ ./table.sh
foo	bar
1	a
2	b

I pipe it to swawk and get the output that I want if I name the fields explicitly, like so:

❯ ./table.sh | sqawk -output json 'select foo, bar from a where bar = "b"' header=1
[{"foo":"2","bar":"b"}]

But if I select * I get many other fields too:

❯ ./table.sh | sqawk -output json 'select * from a where bar = "b"' header=1
 [{"anr":"2","anf":"2","a0":"2\tb","foo":"2","bar":"b","a3":"","a4":"","a5":"","a6":"","a7":"","a8":"","a9":"","a10":""}]

As a workaround I can delete those fields explicitly:

❯ ./table.sh | sqawk -output json 'select * from a where bar = "b"' header=1 | jq 'map(del(.a0, .a1, .a2, .a3, .a4, .a5, .a6, .a7, .a8, .a9, .a10, .anr, .anf))'
[
  {
    "foo": "2",
    "bar": "b"
  }
]

But I think it would be cleaner if I could tell sqawk to not include them in the first place. Is there such a way? If not, can there be?

sqawk cannot work, what's the reason?

while executing

"package require cmdline"
(file "/usr/local/bin/sqawk" line 6)
-bash-4.2# sqawk
can't find package cmdline
while executing
"package require cmdline"
(file "/usr/local/bin/sqawk" line 6)

Read and parse JSON and Tcl inputs gradually when kv is false

This would reduce memory usage.

dbohdan / sqawk Goto Github PK

sqawk's Issues

Tcl and JSON parser: do not read the entire file in kv=1 lines=1 mode

is there a limit of number of lines for input files?

Escaping special characters in SELECT

table print assistance

Rethink 'columns', 'F0', 'merge' and 'skip'

Compiling assistance on freebsd

tabulate can not correctly deal with Chinese character

truncating zeros

patch for new option -numcolumns

Revise the command line interface before declaring it stable

Tests failing

Make the parsers and the serializers lazy

-output json 'select * from a' causes too many json fields to be included

sqawk cannot work, what's the reason?

Read and parse JSON and Tcl inputs gradually when kv is false

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent