Giter Club home page Giter Club logo

jeroenjanssens / data-science-at-the-command-line Goto Github PK

View Code? Open in Web Editor NEW
3.7K 3.7K 759.0 95.22 MB

Data Science at the Command Line

Home Page: https://datascienceatthecommandline.com

License: Other

Python 0.62% R 0.12% Shell 0.26% HTML 92.12% TeX 1.99% CSS 1.88% Makefile 0.19% JavaScript 1.32% NewLisp 0.02% Jupyter Notebook 1.16% Nunjucks 0.32%
bash book bookdown cli cowsay curl data-science ggplot2 gnuplot jq linux oreilly oreilly-books python r shell terminal unix zsh

data-science-at-the-command-line's People

Contributors

aborruso avatar adamchainz avatar alexanderyastrebov avatar andrewsanchez avatar anfedorov avatar azulgarza avatar cherouvim avatar ebedthan avatar fphilipe avatar gkampolis avatar gnukev avatar jeroenjanssens avatar leshaker avatar lissahyacinth avatar lukereding avatar mpettis avatar npztest avatar petersaalbrink avatar reustle avatar revg avatar richpauloo avatar royseto avatar spo-tn avatar tonyfischetti avatar xakon avatar zach14c avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-science-at-the-command-line's Issues

Docker pull error 'incorrect username or password'

I'm trying to pull the latest version but getting an error message as shown below:
docker pull datascienceworkshops/data-science-at-the-command-line
Using default tag: latest
Error response from daemon: Get https://registry-1.docker.io/v2/datascienceworkshops/data-science-at-the-command-line/manifests/latest: unauthorized: incorrect username or password

My docker info output in case it may help:
docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 18.03.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.87-linuxkit-aufs
Operating System: Docker for Windows
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.934GiB
Name: linuxkit-00155d01344c
ID: SUNZ:CTDE:UKFI:MPMJ:KJRG:ODSS:7LGL:MNUF:NKKI:GHRH:CSP3:DYLO
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 19
Goroutines: 35
System Time: 2018-04-29T21:39:50.4780016Z
EventsListeners: 1
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

Scrape in python 3 env

Seems not to be working with python 3 versions.

args.expression = [e.decode('utf-8') for e in args.expression]
AttributeError: 'str' object has no attribute 'decode'

issue with "header"

Last line of "header" fails if the header contains spaces.

Change from

print_header $OLDHEADER

to

print_header "${OLDHEADER}"

"Executing a Command-line Tool" paragraph

Hi,
first of all this is a great book.

In "Executing a Command-line Tool" paragraph of chapter 2, page 21, there is "cd book/ch02/" command, but this directory does not exist in my vagrant installation.
Is it normal?

Thank you

movies.txt does not exist - but it does?

First of all, I would like to thank Crevax for solving the previous issue I had - I was able to pull the data and access the files in the book! But now, I am running into another issue. This may not be a problem with the data or files, but I believe it is a problem for those who are new to the command line because I am not able to resolve this issue myself.

Following the examples in Chapter 2 to get to the directory for chapter to and examine the 'movies.txt' file, I can see that the file is there, but when i try to run the command:
head -n 3 data/movies.txt
I get the error "head: cannot open 'data/movies.txt' for reading: No such file or directory" even though I can see the file with the ls command!
I have included a screenshot of my command line prompt for someone better versed in docker to see if I have made a mistake.
image
I think it is worth noting that I cannot exactly follow the docker run command from the book as the ` symbol throws errors in the windows cmd, so maybe my arbitrary directory naming of "dsacm" is the problem. Also there is the fact that I have to take a roundabout path to get to the actual directory containing the data - not sure if this has something to do with it either.

If this is an issue with my docker experience I would greatly appreciate some sources of how to learn docker (preferably other than the official docker website, it's descriptions are a bit abstract for someone with 0 formal CS education). But really if anyone could shed some light on the issue I am having I would be very grateful. Thanks!

Additional info: I am running Windows 10 x64 with Docker version 17.12.0-ce, build c97c6d6

Adding the data science toolbox to homebrew repository

The toolbox is great, as well as the book. But insofar as the book and the tools are so easy to use, perhaps, it's worth adding the tools to the homebrew repository as a single package? Or maybe to add readme describing how to install the tools without installing the environment, which is described in the book?

Question for vagrant up

I use virtual linux on my windows 7.
After I installed virtualbox and vagrant and followed the environment set up steps, the terminal show up the warming like this :
Progress: 90%There was an error while executing VBoxManage, a CLI used by Vagrant
for controlling VirtualBox. The command and stderr is shown below.

Command: ["import", "/home/datatest1/.vagrant.d/boxes/data-science-toolbox-VAGRANTSLASH-data-science-at-the-command-line/1.0.0/virtualbox/box.ovf", "--vsys", "0", "--vmname", "packer-virtualbox-iso_1523945099207_28351", "--vsys", "0", "--unit", "7", "--disk", "/home/datatest1/VirtualBox VMs/packer-virtualbox-iso_1523945099207_28351/packer-virtualbox-iso-disk1.vmdk"]

Stderr: 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Interpreting /home/datatest1/.vagrant.d/boxes/data-science-toolbox-VAGRANTSLASH-data-science-at-the-command-line/1.0.0/virtualbox/box.ovf...
OK.
0%...10%...20%...30%...40%...50%...60%...70%...
Progress state: VBOX_E_FILE_ERROR
VBoxManage: error: Appliance import failed
VBoxManage: error: Could not create the imported medium '/home/datatest1/VirtualBox VMs/packer-virtualbox-iso_1523945099207_28351/packer-virtualbox-iso-disk1.vmdk'.
VBoxManage: error: VMDK: cannot write allocated data block in '/home/datatest1/VirtualBox VMs/packer-virtualbox-iso_1523945099207_28351/packer-virtualbox-iso-disk1.vmdk' (VERR_DISK_FULL)
VBoxManage: error: Details: code VBOX_E_FILE_ERROR (0x80bb0004), component ApplianceWrap, interface IAppliance
VBoxManage: error: Context: "RTEXITCODE handleImportAppliance(HandlerArg*)" at line 886 of file VBoxManageAppliance.cpp

How can I fix it !!!!

Rio question

Hello:
Very nick tools!! Running Rio on Mac OS X, But have always have this information “ ARGUMENT…ignored”. How to correct/remove this? Any helps?

seq 100 | Rio -nf summary

ARGUMENT '',~+file='/var/folders/_0/yvrx0nkx76nfvz85q1dxg27183vg3m/T/Rio-X9BIzDnt.pdf')}else+~if(exists('is.ggplot')&&is.ggplot(last)){ggsave('/var/folders/_0/yvrx0nkx76nfvz85q1dxg27183vg3m/T/Rio-X9BIzDnt.pdf',last,dpi=72,units='cm',width=20,height=15);}else{sink('/var/folders/_0/yvrx0nkx76nfvz85q1dxg27183vg3m/T/Rio-X9BIzDnt.pdf');print(last);}' ignored

Wrong output to cols example in Ch. 3.2.1

I'm on Ubuntu 16.04.6 LTS

Trying to run the ch 3.2.1 example on cols, I got a strange output :

|        day |  bill |  tip | sex    | smoker | time   | size |
| ---------- | ----- | ---- | ------ | ------ | ------ | ---- |
| 0001-01-07 | 16,99 | 1,01 | Female |  False | Dinner |    2 |
| 0001-01-07 | 10,34 | 1,66 | Male   |  False | Dinner |    3 |
| 0001-01-07 | 21,01 | 3,50 | Male   |  False | Dinner |    3 |
| 0001-01-07 | 23,68 | 3,31 | Male   |  False | Dinner |    2 |

When switching 'day' w/ 'sex', the upper-case operation works but the day column is still messep up :

| sex    |  bill |  tip | smoker |        day | time   | size |
| ------ | ----- | ---- | ------ | ---------- | ------ | ---- |
| FEMALE | 16,99 | 1,01 |  False | 0001-01-07 | Dinner |    2 |
| MALE   | 10,34 | 1,66 |  False | 0001-01-07 | Dinner |    3 |
| MALE   | 21,01 | 3,50 |  False | 0001-01-07 | Dinner |    3 |
| MALE   | 23,68 | 3,31 |  False | 0001-01-07 | Dinner |    2 |

I printed the head of my tips.csv (downloaded latest version) and it's all in order.

Any idea what went wrong ?

Installing the docker image on Windows with WSL

Hi all,

I believe I managed to have the environment up and running in Windows for thus of us who are interested. This requires the following steps, those are not too hard but can be time consuming. I'll assume you're using Windows 10.

  1. Follow the steps described by Windows here.

This involves to 1.a) ensure you are a Windows Insider user, explained here

and that you have 1.b) WSL up and running, explained here.

I had the following issue but the fix is in the thread.

Next, and point 2) you'll need a valid docker setup with WSL 2, which is explained

in 2.a) the docker documentation

The following 2.b) article was also helpful, be sure to run Windows containers.

Finally, with docker up and running as seen in the pictures and with a WSL prompt open you can simply use the steps describe in point 2 of the book.

The only thing I didn't get right is mapping the book to a directory with a docker volume, if anyone has any ideas I'm all ears.

Hope it helps!

SSL error while downloading the toolbox

Hi Jeroen, performing vagrant up led to the following SSL error, can you please update the certificates or guide me to a solution. Thanks.

[centos@localhost MyDataScienceToolbox]$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Box 'data-science-toolbox/data-science-at-the-command-line' could not be found. Attempting to find and install...
default: Box Provider: virtualbox
default: Box Version: >= 0
==> default: Loading metadata for box 'data-science-toolbox/data-science-at-the-command-line'
default: URL: https://atlas.hashicorp.com/data-science-toolbox/data-science-at-the-command-line
==> default: Adding box 'data-science-toolbox/data-science-at-the-command-line' (v1.0.0) for provider: virtualbox
default: Downloading: https://atlas.hashicorp.com/data-science-toolbox/boxes/data-science-at-the-command-line/versions/1.0.0/providers/virtualbox.box
An error occurred while downloading the remote file. The error
message, if any, is reproduced below. Please fix this error and try
again.

SSL certificate problem: unable to get local issuer certificate
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
of Certificate Authority (CA) public keys (CA certs). If the default
bundle file isn't adequate, you can specify an alternate file
using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
the bundle, the certificate verification probably failed due to a
problem with the certificate (it might be expired, or the name might
not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
the -k (or --insecure) option.

jsontsv might replace json2csv

I just released jsontsv yesterday. It is similar to json2csv which you mention on your original blog post "Seven command line tools for data science." But I believe jsontsv is strictly speaking more powerful and not strictly speaking easier to use. Feedback is appreciated if you have time but in any case thanks for writing about this whole topic.

https://github.com/danchoi/jsontsv

Second Virtual Machine

Hi
I alreday have a VM belonging to Mining Social Web of Matthew Russell.
I would like to know the precautions to install Command Line VM.
Do I have to execute vagrant also?
I am using windows 8.1 and I dont have any issues with Mattehw VM.

Any suggestion of recomendation are appreciated.

Thanks for your help

Franco

Refactor Rio

Rio started out as a proof-of-concept (see: http://jeroenjanssens.com/2013/09/19/seven-command-line-tools-for-data-science.html). However, over time, this quick-and-dirty Bash script has been proven to be useful to me (and who knows, perhaps even to others). Unfortunately, the code is quite messy making it difficult to maintain and extend.

Because of this, and also because it's playing a role in my upcoming book, I believe that Rio deserves to be cleaned up. (I've attempted to add some whitespace and newlines to that horrible SCRIPT string, but either Rscript or Bash didn't like that.) Please let me know if you have any suggestions.

Not able to use cols with sed

Hi,
I have this CSV (input_out.csv):

nome,id,url,start,creato,venue_id,logo
Palermo (Sicilia) - 25 Maggio 2016 25/5/2016 Karaoke di beneficenza di Chi ama la Sicilia,25180566753,http://www.eventbrite.it/e/biglietti-palermo-sicilia-25-maggio-2016-2552016-karaoke-di-beneficenza-di-chi-ama-la-sicilia-25180566753?aff=ebapi,2016-05-25T20:00:00,2016-05-03T20:40:12Z,15149091,https://img.evbuc.com/https%3A%2F%2Fimg.evbuc.com%2Fhttp%253A%252F%252Fcdn.evbuc.com%252Fimages%252F20814649%252F175077385541%252F1%252Foriginal.jpg%3Frect%3D0%252C172%252C860%252C430%26s%3Dd9a4bfa29cc27f85d8428de320cc9b3c?h=200&w=450&s=7ed859da13004f17403a9a3b0e1b2f7b
Tech-Marketplace & StartupItalia! Open Summit Tour 2016,25161107550,http://www.eventbrite.it/e/biglietti-tech-marketplace-startupitalia-open-summit-tour-2016-25161107550?aff=ebapi,2016-05-17T15:00:00,2016-05-03T10:49:31Z,15134493,https://img.evbuc.com/https%3A%2F%2Fimg.evbuc.com%2Fhttp%253A%252F%252Fcdn.evbuc.com%252Fimages%252F20794591%252F68137449621%252F1%252Foriginal.jpg%3Frect%3D473%252C0%252C3674%252C1837%26s%3De156d1a692274f2b3e48fa61b9e3964d?h=200&w=450&s=cfa6f9b49d4dee5180e53c71931a167d

I would like to replace the & characther that I have in the first column with &.

If I write:

cat input_out.csv | sed -e 's/[\/&]/test/g' > out.txt

I have what I want. But I would like to work only in the first column. But if I write:

< input_out.csv cols -c nome body sed -e 's/[\/&]/test/g' > out.txt

I have

sed: -e expression #1, char 4: unterminated `s' command

What's wrong in my command?

Thank you

scrape syntax in chapter05 does not work

data/wiki.html scrape -b -e 'table.wikitable > tr:not(:first-child)'

this command cannot get the result as shown in the book.
if I just type scrape -b -e 'table.wikitalbe',then the content of the table is printed

so how to remove the first 'tr' as said in the book, please give me some help

New tool to format data into tables

I'm releasing a new tool that might be useful to the data science toolkit. (I don't know how else to inform you about it @jeroenjanssens other than a GitHub issue, since I don't use Twitter.)

https://github.com/danchoi/table

table formats lines of TSV, CSV, or DSV (delimiter-separated values) into a pretty plain text table, wrappings cells with long content to try to fit the table in the screen.

Rio -r flag

-r flag loads the dplyr library, but I got error:

cat iris.csv | Rio -e -r -v "df %>% group_by(species) %>% mean(sepal_length)"
Error: object 'r' not found
Execution halted
cat: /var/folders/h5/3xs5c90n0njgp7n9_qdwwj2m0000gn/T//Rio-OLVgmOU2.err: No such file or directory

scrape does not accept right XPath query?

Hi,
I need to do this query

curl -L -s -A "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0" "http://www.unionesulserio.it"  | /usr/local/bin/nadir/scrape -be '(//a[(contains(.,"pretorio")) or (contains(.,"Pretorio"))])[1]'

I have tested (//a[(contains(.,"pretorio")) or (contains(.,"Pretorio"))])[1] XPath query with other tools and it seems to work, but scrape gives me "Invalid CSS selector".

Is my XPath really wrong?

Thank you

scrape: why empty results?

Dear @jeroenjanssens,
first of all thank you: I learned a lot from your book, reading it there is a bit 'more light in my brain.

I try to use scrape with an XML file that seems properly formatted, I use a rigth XPATH query, but I obtain empy result.

This is my command:

curl 'http://referendum2016.comune.palermo.it/AFFLSEZ_1_82053_R1.xml' | scrape -be '//SV'

What's wrong in it?

Thank you

Vagrant up issues

Hi,

I am having trouble starting the VM using vagrant up. I have tried on both windows and ubuntu.
I get to the password/private key authentication part and can progress no further. Working with the VirtualBox GUI doesn't help either.

Is there any other information I can provide to help diagnose the problem? My Vagrant config file is quite basic (mostly defaults, except that I tried to set password instead of relying on private key due to error- didn't help), but I can share that if needed.

Thanks alot.
Rohail

Unable to use Rio at terminal for ggplotting

Hi, I am following your book Data Science at the Command Line and its awesome. While most things have worked so far as I am installing individual components on my CentOS, there are issues off and on. Is this a good place to ask? For example the plotting code fashion.csv Rio -ge... throws the error display: no decode delegate for this image format `/tmp/magick-02f7rn9B' @ error/constitute.c/ReadImage/544. I have tried re-installing different versions of ImageMagick but without success. Your suggestion will be greatly appreciated. Thanks.

Sequence of dates

Allow me to introduce my dateutils, along with its dateseq command to produce sequences of dates (or date/times) not only faster but more portable and flexible.

Your semantics (you specify dates as day difference relative to today) would have to go through dateadd first though, as all tools in the toolkit take absolute dates. Your example from the book would hence become:

$ dateseq `dateadd today -2` today
2016-03-19
2016-03-20
2016-03-21

So I suppose one could write a wrapper so your dseq tool wouldn't break its API.

Rio ggplot2 not giving expected results

Working from book, on reproducing Fig 7-4 in the book, I run this from the ch07 directory:

$ <data/tips.csv Rio -ge 'g+geom_bar(aes(factor(size)))'
ARGUMENT '',~+~file='/var/folders/4s/gd5gw2bd2n16njbfqsg0j5yc0000gn/T/Rio-zQ6olMgv.png')}else~+~if(exists('is.ggplot')&&is.ggplot(last)){ggsave('/var/folders/4s/gd5gw2bd2n16njbfqsg0j5yc0000gn/T/Rio-zQ6olMgv.png',last,dpi=72,units='cm',width=20,height=15);}else{sink('/var/folders/4s/gd5gw2bd2n16njbfqsg0j5yc0000gn/T/Rio-zQ6olMgv.png');print(last);}' __ignored__

Loading required package: ggplot2
Loading required package: methods

I would expect that binary would stream to stdout that I could pipe to display or a .png file, but I get only this. Also, If I run the following, I get the results below:

$ <data/tips.csv Rio -e 'head(df)'
ARGUMENT '',~+~file='/var/folders/4s/gd5gw2bd2n16njbfqsg0j5yc0000gn/T/Rio-xGYRL7dR.png')}else~+~if(exists('is.ggplot')&&is.ggplot(last)){ggsave('/var/folders/4s/gd5gw2bd2n16njbfqsg0j5yc0000gn/T/Rio-xGYRL7dR.png',last,dpi=72,units='cm',width=20,height=15);}else{sink('/var/folders/4s/gd5gw2bd2n16njbfqsg0j5yc0000gn/T/Rio-xGYRL7dR.png');print(last);}' __ignored__

   bill  tip    sex smoker day   time size
1 16.99 1.01 Female     No Sun Dinner    2
2 10.34 1.66   Male     No Sun Dinner    3
3 21.01 3.50   Male     No Sun Dinner    3
4 23.68 3.31   Male     No Sun Dinner    2
5 24.59 3.61 Female     No Sun Dinner    4
6 25.29 4.71   Male     No Sun Dinner    4

What I'm not expecting is the line starting with ARGUMENT .... Could not figure out what is wrong. Any ideas?

Ch2 Paste & Bc Example Not Working

Example provided will cause an error with paste using Bash & Zsh on MacOS. Believe this will cause an issue on all variants, but haven't checked on Linux/Docker as of yet.

$ fac() { (echo 1; seq $1) | paste -s -d\* | bc; }
$ fac 5
> usage: paste [-s] [-d delimiters] file ...

Suggestion is to add in a - character, which will cause it to run correctly, making me think this is just an unchecked typo.

$ fac() { (echo 1; seq $1) | paste -s -d\* - | bc; }
$ fac 5
> 120

Rio -d flag

Hi there, I have a tab delimited file, and when I use Rio -d "\t" -e "summary(df)", I got some errors.
cat iris.tsv | head | Rio -d"\t" -e "summary(df)"
ARGUMENT '',stringsAsFactors=F);summary(df);last<-.Last.value;if(is.matrix(last)){last<-as.data.frame(last)};if(is.data.frame(last)){write.table(last,'/var/folders/h5/3xs5c90n0njgp7n9_qdwwj2m0000gn/T//Rio-fK1bGblg.png',sep=',',quote=T,qmethod='double',row.names=F,col.names=T);}else~+if(is.vector(last)){cat(last,sep='\n',+file='/var/folders/h5/3xs5c90n0njgp7n9_qdwwj2m0000gn/T//Rio-fK1bGblg.png')}else+~if(exists('is.ggplot')&&is.ggplot(last)){ggsave('/var/folders/h5/3xs5c90n0njgp7n9_qdwwj2m0000gn/T//Rio-fK1bGblg.png',last,dpi=72,units='cm',width=20,height=15);}else{sink('/var/folders/h5/3xs5c90n0njgp7n9_qdwwj2m0000gn/T//Rio-fK1bGblg.png');print(last);}' ignored

Do you have any idea?
Thanks,
Ming Tang

Rio-scatter to tty should display plot on X11

This will dump a PNG file to the tty:

< iris.csv Rio-scatter sepal_length sepal_width species

You never want that. If the output is not redirected to a file, display it on the screen with R's native displayer.

To see if stdout is a terminal you can use:

if [ -t 1 ] ; then
...
fi

close failed in file object destructor

Running your example curl -s 'http://en.wikipedia.org/wiki/List_of_countries_and_territories_by_border/area_ratio' | scrape -b -e 'table.wikitable > tr:not(:first-child)' | head I get:

close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr

Book could use some reviews

The book Data Science at the Command Line has been getting some really good reviews on the O'Reilly product page. I have to say: it's great to get feedback from readers. (It's also a nice confirmation that I'm not the only person crazy enough to think that the command line can be used for doing data science.) ;)

On Amazon and other book websites, there are unfortunately currently very few or no reviews. This makes it difficult for someone to decide whether the book would be useful to them or not.

If you have read the book and you have an opinion about it, whether it's positive or negative, then it would be greatly appreciated if you would spend a few minutes writing it down and submitting it as a review to Amazon (or any other book website). More reviews means that potential readers can better inform themselves, which could potentially lead to more command-line users.

Thanks for helping me out!

Cheers,

Jeroen

PS. I realize that not everybody who's using this repository has actually read the book, so forgive me for posting this question here. If you're still wondering whether you should buy the book or not, the first chapter is available for free at O'Reilly.

PPS. Of course, if you have any feedback not suited for a review, then you can always open a GitHub issue or contact me on Twitter.

sql2csv --db "mysql: ..." says I don't have the necessary backend installed

vagrant@data-science-toolbox:~$ sql2csv --db "mysql://user:[email protected]:3306/database" --query "select count(*) from session"
You don't appear to have the necessary database backend installed for connection string you're trying to use.. Available backends include:

Postgresql: pip install psycopg2
MySQL: pip install MySQL-python

For details on connection strings and other backends, please see the SQLAlchemy documentation on dialects at:

http://www.sqlalchemy.org/docs/dialects/

To fix I did:

sudo apt-get update
sudo apt-get install libmysqlclient-dev
sudo pip install MySQL-python

Add Dockerfile

It would be awesome to have a Dockerfile, to autoinstall the project and run with. I'll open a Pull Request for this tomorrow.

Online book is missing appendix

The online version of the book is missing the appendix, which helpfully lists all of the command-line tools mentioned in the book with a brief description of each. It is mentioned several times in the text but seems to be missing from this repository and the hosted version of the book.

This is a super useful reference that I refer to often when trying to figure out how to do something on the command line, or to remind myself of tools that I don't use often—it should be included in the online version!

Scrubbing Data (Page 56)

@jeroenjanssens
On page 56 the following bash snippet does not work "out of the box" in the vagrant / virtualbox environment running Ubuntu 14.04 LTS as suggested in the book:

$ echo 'foo\nbar\nfoo' | sort | uniq -c | sort -nr
1 foo\nbar\nfoo
$ echo 'foo\nbar\nfoo' | sort | uniq -c | sort -nr | awk '{print $2","$1}' | header -a value,count 
value,count
foo\nbar\nfoo,1

The following snippet works correctly with -e parameter given to echo:

$ echo -e 'foo\nbar\nfoo' | sort | uniq -c | sort -nr
2 foo
1 bar
$ echo -e 'foo\nbar\nfoo' | sort | uniq -c | sort -nr | awk '{print $2","$1}' | header -a value,count 
value,count
foo,2
bar,1

Alternative way would be using printf instead of echo. There might be also differences between distributions and versions of Unix in terms of using -e parameter with echo.

The following snippet might be more universally acceptable:

$ echo $'foo\nbar\nfoo' | sort | uniq -c | sort -nr
2 foo
1 bar
$ echo $'foo\nbar\nfoo' | sort | uniq -c | sort -nr | awk '{print $2","$1}' | header -a value,count 
value,count
foo,2
bar,1

Stuck after running Docker image

Hi, Jeroen!

First of all, thanks for the awesome work and clear explanations about Data Science and Command Line Tools.

I've found the site when reading HN today. Then, I followed the through the first chapters until I got stuck at 2.2 Installing the Docker Image.

I'm aware it is currently a WIP to update the online version of the book. But I was wondering if there is something I could do to keep going once I ran the Docker container. Now, it seems the files you use are not there yet (e.g. book/ folder), which makes it difficult to follow your first examples.

Am I missing something here? I would appreciate some help 😃

Thanks in advance!

Twitter API v1 is outdated

Hi Jeroen,

on page 39 you call an outdated Twitter API:

$ curlicue -f credentials \
> 'https://api.twitter.com/1/statuses/home_timeline.xml'

It returns

<?xml version="1.0" encoding="UTF-8"?>
<errors>
  <error code="64">
    The Twitter REST API v1 is no longer active. Please migrate to API v1.1.
    https://dev.twitter.com/docs/api/1.1/overview.
  </error>
</errors>

The actual API endpoint is

https://api.twitter.com/1.1/statuses/home_timeline.json

So the proper call would be

$ curlicue -f credentials \
> 'https://api.twitter.com/1.1/statuses/home_timeline.json'

however it returns JSON and not XML as expected in v1.0 call, but since it is the end of the pipeline, and the returned value is not processed, it is not a big issue. I guess the API was deactivated at the end of last year.

Scrape trin Golang

Hi, do you mind writing a version of the scrape utility in Golang for better cross platform support? Thanks in advance!

sample is already a base tool

man sample

DESCRIPTION
     sample is a command-line tool for gathering data about the running behav-
     ior of a process.  It suspends the process at specified intervals (by
     default, every 1 millisecond), records the call stacks of all threads in
     the process at that time, then resumes the process.  The analysis done by

cols not working

It seems like cols is not working.
No matter how I run it, I always get this response:

usage: mktemp [-d] [-q] [-t prefix] [-u] template ...
mktemp [-d] [-q] [-u] -t prefix
mkfifo: /other_columns: Permission denied
/Applications/command-line-tools/cols: line 24: ${ARG~~}: bad substitution
tee: /other_columns: Permission denied

Off-topic: cron and csvsql

Hi,
I'm writing here because you have great experience with csvkit.

Did you ever have test cron with a csvsql process? If yes, are you able to produce an output file?
I have always 0 kb output file.

I have just opened an issue wireservice/csvkit#342. I think it could be something related to the creation of the temp sqlite file, but I'm not able to read anything also in the log (it's a zero kb file too).

Thank you very much

`mktemp` failure

$ seq 12 | Rio -e 'df**2'
mktemp: failed to create file via template `/tmp/user/1001Rio-XXXXXXXX': Permission denied
mktemp: failed to create file via template `/tmp/user/1001Rio-XXXXXXXX': Permission denied
mktemp: failed to create file via template `/tmp/user/1001Rio-XXXXXXXX': Permission denied
/opt/Rio: line 115: $IN: ambiguous redirect
Rscript execution error: No such file or directory

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.