Giter Club home page Giter Club logo

bash-eda's Introduction

README

Data Analysis using the shell

  • Read a csv file
  • head(10)
  • tail(10)
  • show number of rows (.shape)
  • show all columns (.columns)
  • display specific rows by index (df.iloc[:, 1, 2, 3])
  • display specific column by index (df.iloc[1, 2, 3, :])
  • display specific column by name

If you want to do some really advanced stuff on the bash, you can also check out the csvkit python library

There's also a great book on Data Science at the command line

cat

cat <filename> will display contents of any given file. You can concatinate multiple files to read at onces like

cat file1.txt file2.txt

and shell will display the contents of the files in order specified. You can also use cat to copy contents of one file into another, like

cat file1.txt > newfile.txt

Here the > will bascially overwrite the contents in newfile.txt. If you want to append contents of file1.txt into newfile.txt after the content already present within newfile.txt, then use the >> like

cat file1.txt >> newfile.txt

Similarly, you can even read the file in reverse, by using tac (reverse of cat)

tac file1.txt

using cat to create a file

You can use cat command to create a file, just like touch or a vim editor

cat > file2.txt

will bring the cursor to a new line and will allow you to enter the contents of the file on the shell. Once you finish entering the text, press the ctrl + d key to exit and the file will automatically be saved.

The only drawback in entering the contents in such a way is that you won't be allowed to remove lines after the new line entered.

You can even directly append content into an existing file by using the >>

cat >> file2.txt
This is a new line appended at the end of the file.

and then press the ctrl + d keys to save and exit.

echo

echo simply displays the text and can be used like

echo hello
hello

Unlike, cat, echo will not read and output the contents of the file. This is more like just a print statement.

write and append to files

You can even use echo to append lines into a file using the > and the >>

# append new line to existing file
echo last line >> file1.txt

# overwrite content into a file
echo overwritten content > file1.txt

You can use the -e option to enable input of special characters like the

  • newline \n
  • tab \t

global variables

You can echo the values of the global variables onto shell

echo $USER

To view list of all global variables on bash, use the printenv command on shell

sed

Also known as Stream Editor, primarily used for modifying contents of a file, like fileting content within a file and substituting values. The two options mentioned below allow you run commands directly from an external file (-f) or from the shell command line (-e).

# run srcipt inline on a textfile
sed -e command <filename>

# run script from a scriptfile on a textfile
sed -f scriptfile <filename>

# run script inline and overwrite within the textfile
sed -i command <filename>

The commands that you want to run on a file are usually regular expressions (covered later in this file) on files to match pattern files.

The syntax of sed command is like

# replace the first occurance of the pattern
sed -e /pattern/replace_string/ <filename>

# replace all the occurances of the pattern
sed s/pattern/replace_string/g <filename>

# replace all occurances in line range 1 to 3
sed 1,3s/pattern/replace_string/g <filename>

# replace all occurances and overwrite the file
sed -i s/pattern/replace_string/g <filename>

You can even replace the text and write into a new file

sed -e s/pattern/replace_string/g file1 > file2

multiple patterns

Replace multiple patterns at the same time like

sed -e 's/01/JAN/' -e 's/02/FEB/' file1 > file2

will replace 01 to JAN as well as 02 to FEB from file1 and write into a file2.

awk

grep

You mainly use grep to search for contents within a file.

Say you have a sample.txt that looks something like this:

Fred apples 20
Susy oranges 5
Mark watermellons 12
Robert pears 4
Terry oranges 9
Lisa peaches 7
Susy oranges 12
Mark grapes 39
Anne mangoes 7
Greg pineapples 3
Oliver rockmellons 2
Betty limes 14

Now, let's use grep to search for some specific lines in this file.

Search for all the lines that contain apple

grep apple sample.txt

Search for all the lines that contain apple and also display the line numbers

grep -n apple sample.txt

Search for all the lines that contain apple and have an even number

grep apple sample.txt | grep [24680]$

Search for all the lines that contain oranges but not susy

grep oranges sample.txt | grep -v Susy

If you wanted susy to be case-sensitive, add the -i option

grep oranges sample.txt | grep -vi susy

Search for all the lines that start from letters between C and M and have apples

grep -n ^[C-Mc-m] sample.txt | grep apples

How many lines contain the word apple

grep -c apples sample.txt
  • EXTRA *

How many lines in a file

wc -l sample.txt

wc as in word count

By now you would have understood that at the heart of grep lies regular expressions . In fact the name grep comes from the early days of unix, where if one had to search for a the word junk in a file, they wrote it like*

g/junk/p

This feature of searching within a file was so widely used that they separated this search feature and called it regular expression and thus, if you had to find any expression within a file, you would use the command

g/re/p _ re for regular expression and thus, grep

So, what are regular expressions ?

re

Ref Link

A regular expression syntax consists of 3 main parts, namely

  1. An Anchor - specifies the position of a pattern
  2. Character Sets - the characters that match once or more at that specific position
  3. Modifiers - the number of times a previous character set is repeated

For Example ^#* is the simplist example show-casing all the 3 parts.

  1. ^ - indicates that the character set should begin with.
  2. # - the character '#' is the character that the line should start with.
  3. * - repeat that search for that character multiple times

bash-eda's People

Contributors

kaizer1v avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

bash-eda's Issues

Project driven

Run through all steps of data analysis using a sample data set.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.