tamada / wildcat Goto Github PK
View Code? Open in Web Editor NEWanother implementation of wc (word count) accepting archives, directories, and file lists.
Home Page: https://tamada.github.io/wildcat
License: Apache License 2.0
another implementation of wc (word count) accepting archives, directories, and file lists.
Home Page: https://tamada.github.io/wildcat
License: Apache License 2.0
From #30 (comment).
$ ls -l testdata/bigdata/
total 12703144
-rw-r--r-- 1 tamada staff 2600000000 4 29 21:22 file1
-rw-r--r-- 1 tamada staff 3060000000 4 29 21:22 file2
-rw-r--r-- 1 tamada staff 844000000 4 29 21:21 file3
-rw-r--r-- 1 tamada staff 563 4 29 21:21 file4
$ time wc testdata/bigdata/*
10156309 78706542 2600000000 testdata/bigdata/file1
11957339 92647296 3060000000 testdata/bigdata/file2
3298378 25545849 844000000 testdata/bigdata/file3
1 23 563 testdata/bigdata/file4
25412027 196899710 6504000563 total
real 0m42.853s
user 0m40.827s
sys 0m1.692s
$ time wildcat -HNP testdata/bigdata/
5/ 5 targets [==============================================================] 100 %
lines words characters bytes
10,156,309 49,781,703 2,502,888,353 2.6 GB testdata/bigdata/file1
11,957,339 58,608,673 2,945,701,993 3.1 GB testdata/bigdata/file2
3,298,378 16,164,888 812,470,804 844 MB testdata/bigdata/file3
1 8 545 563 B testdata/bigdata/file4
25,412,027 124,555,272 6,261,061,695 6.5 GB total (4 entries)
real 0m32.483s
user 1m5.332s
sys 0m4.964s
wildcat
is slow comparing from wc
based on above user time.
wildcat
is slow comparing from wc
based on above user time.
I will try to update wildcat
counting more faster.
wildcat [OPTIONS] <FILEs...|DIRs...|URLs...>
...
Runs REST API server of wildcat on Heroku.
like below.
see the content following total
lines words characters bytes
...
210 549 5,067 5,067 wildcat.go
34 133 1,166 1,166 wildcat_test.go
285,952 1,446,263 72,350,987 75,059,179 total (1,206 entries, 1,361 files)
Use the following functions.
add option to store content of url targets.
As the title says, this occurrs only with powershell (cmd.exe works good).
Let's say we have a dir with this name
"Testing Wildcat"
If you run this command in powershell:
wildcat '.\Testing Wildcat'
this is what you get:
.\Testing Wildcat": file or directory not found
In the current implementation, the file types are parsed by the extensions of the file name.
This issue proposes to parse the file type by the magic number of the file content, like file
command.
For this, we can use h2non/filetype
, and osaki-lab/iowrapper
implements server mode, like below.
$ wildcat --server --port 9090
post the file content in the request body, then wildcat
counts the lines, words, characters, and bytes and will return the result in the JSON format.
Introduce goroutine to the counting targets in order to reduce the time, by keeping the order of targets.
Hello, I would like to use it as package, as I have a program where I need to find the word count. Is that possible?
Hi there. As the title says, what about to add an additional flag (-h maybe?) for an human readable output in the total byte size section? It's a lot more confortable reading file sizes using KB,MB,GB,TB etc...
Hi there. As the title says, if you try to run a command to list bytes inside a dir, the app hangs without results if the target dir is bigger than 3MB in size. Previous version 1.0.3 doesn't seem to be affected. Another thing i've noticed is that if the files in the target dir are big in size (like ISO), the app hang too. This happens in both 1.0.3 and 1.1.0
OS: Windows 10
The current implementation, wildcat
reads all files in the directories.
However, the hidden files (starts with .
) will exclude from the result in default.
Also, introduce -a
, --all
option to accept all files including the hidden files.
inconsistent the behaviors of archive files by specifying arguments and located in the directory.
For example, executing wildcat testdata/archive
and wildcat testdata/archive/*
$ wildcat testdata/archive
lines words characters bytes
5 62 1054 1080 testdata/archives/wc.jar
78 364 5452 5632 testdata/archives/wc.tar
5 36 810 839 testdata/archives/wc.tar.bz2
2 13 748 771 testdata/archives/wc.tar.gz
5 62 1054 1080 testdata/archives/wc.war
4 95 1206 1232 testdata/archives/wc.zip
99 632 10324 10634 total
$ wildcat testdata/archive/*
lines words characters bytes
4 26 142 142 testdata/archives/wc.jar!humpty_dumpty.txt
0 0 0 0 testdata/archives/wc.jar!ja/
15 26 118 298 testdata/archives/wc.jar!ja/sakura_sakura.txt
59 260 1341 1341 testdata/archives/wc.jar!london_bridge_is_broken_down.txt
4 26 142 142 testdata/archives/wc.tar!humpty_dumpty.txt
0 0 0 0 testdata/archives/wc.tar!ja/
15 26 118 298 testdata/archives/wc.tar!ja/sakura_sakura.txt
59 260 1341 1341 testdata/archives/wc.tar!london_bridge_is_broken_down.txt
4 26 142 142 testdata/archives/wc.tar.bz2!humpty_dumpty.txt
0 0 0 0 testdata/archives/wc.tar.bz2!ja/
15 26 118 298 testdata/archives/wc.tar.bz2!ja/sakura_sakura.txt
59 260 1341 1341 testdata/archives/wc.tar.bz2!london_bridge_is_broken_down.txt
4 26 142 142 testdata/archives/wc.tar.gz!humpty_dumpty.txt
0 0 0 0 testdata/archives/wc.tar.gz!ja/
15 26 118 298 testdata/archives/wc.tar.gz!ja/sakura_sakura.txt
59 260 1341 1341 testdata/archives/wc.tar.gz!london_bridge_is_broken_down.txt
5 62 1054 1080 testdata/archives/wc.war
4 26 142 142 testdata/archives/wc.zip!humpty_dumpty.txt
0 0 0 0 testdata/archives/wc.zip!ja/
59 260 1341 1341 testdata/archives/wc.zip!london_bridge_is_broken_down.txt
15 26 118 298 testdata/archives/wc.zip!ja/sakura_sakura.txt
395 1622 9059 9985 total
wildcat
should act consistent behaviors.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.