In the 1920s and 1930s there were two theories of computation.
- Turing machine (Alan Turing, 1932)
- step-by-step model of computation based on a physical (albeit theoretical) machine.
- gave rise to imperative programming languages:
- Fortran
- C/C++
- Java
- Python
- ...
- Lambda Calculus (Alonzo Church, 1930s) and Combinatorial Logic (Haskell Curry, 1927)
- function based models based on pure abstract mathematics.
- gave rise to functional programming languages:
- Lisp
- Haskell
- SML/OCaml
- Clojure
- Scala
- ...
- A category of programming languages (see above)
- A paradigm of programming which also applies to imperative languages.
We will focus on the functional programming paradigm (i.e. patterns/principles) with examples in Python 3. (However, I am secretly teaching you how to program in a purely functional language like OCaml.)
- Use functions as values (which can be passed to other functions)
- Write pure functions (with no side effects)
- Program declaratively, not imperatively
- Composition not steps
- Avoid
for
andwhile
loops - Inline
if then else
(alsocases
) - Use pattern matching (deconstruction)
- Use immutable types
- Consider type safety
- Embrace type theory
Warning Python is not a purely functional language and following this advice exactly may cause you to fail your code reviews. Use common sense.
- Mathematically elegant
- You probably don't care!
- Modular/Readable/Simple programming style
- This subjective at best, but there is some truth to it
- Less prone to (certain) bugs
- This is really important for some applications
- Naturally suited for parallel and vectorized code
- This is a big deal, especially in (big) data science
# function
def plus_one(x):
return x + 1
# higher order function (takes a function f as an argument)
def repeat_twice(f):
# use input function f to construct a new function g = f o f
def g(x):
return f(f(x))
# return this function
return g
plus_two = repeat_twice(plus_one) # plus_two is a function
plus_two(0) # == plus_one(plus_one(0)) == 2
For simple functions, can use lambda
.
# lambda functions (not how they are typically used)
plus_one = lambda x: x + 1
plus_two = lambda x: plus_one(plus_one(x))
plus_two(0) # == plus_one(plus_one(0)) == 2
s # pd.Series object
df # pd.DataFrame object
# apply function to every series element
s.map(lambda x: x * x) # a copy of s with every item squared
# apply function to every dataframe column
df.apply(lambda x: x.sum()) # a series containing the sum of each column
Languages like Python, Java, and C++ are object oriented.
In Python everything is an object. Objects have attached attributes and methods which are attached to the object via the .
notation.
my_array = np.array([[1, 2, 3], [4, 5, 6])
my_list = [1, 2, 3]
my_float = 7.5
my_string = "HI THERE"
my_function = len
# methods
my_list.append(7)
my_number.hex() # == 0x1.e000000000000p+2
my_string.lower() # == "hi there"
# attributes
my_array.shape # == (2, 3)
my_function.__doc__ # == 'Return the number of items in a container.'
Many purely functional programming languages are also object oriented, including
- OCaml
- Scala
# NOT PURE!!!!
def third_largest_item(list_of_nums):
list_of_nums.sort() # <-- Side effect: list_of_nums has been is modified
global_variable = 5 # <-- Side effect: changing a global variable
print("Hello") # <-- Side effect: input/output operations
return list_of_nums[-3]
# Pure
def third_largest_item(list_of_nums):
new_list = sorted(list_of_nums) # list_of_nums is not changed
return new_list[-3]
Pure functions perform the same way every time.
- Imperative:
- Focused on steps (the "how").
- Closer to the actual machine steps.
- Declarative:
- Focused on results (the "what").
- Closer to the desired results.
"""
find the minimum word (alphabetically) in a string
"""
# imperative (focused on steps)
def min_word(s):
# make s lower case
lower_s = ""
for c in s:
lower_s += c.lower()
# split new_s into words
words = []
w = ""
for c in new_s:
if c == " ":
words.append(w)
w == ""
else:
w += c
w += c
# find lowest word alphabetically
min_word = ""
for w in words:
if w < min_word:
min_word = w
return min_word
# declaritive (focused on results)
def min_words(s):
return min(s.lower().split())
SQL code is purely declaritive (no steps, only results).
SELECT gender, SUM(income)
FROM income_list
WHERE age > 20
GROUP BY gender
ORDER BY SUM(income);
Excel formulas are another example of purely declaritive code.
= SUM(INDEX(C5:F11,0,MATCH(I6,C4:F4,0)))
Functional programming prefers declaritive code over imperative code.
# imperative (steps)
y = f(x) # step 1
z = g(y) # step 2
w = h(z) # step 3
# declaritive (function composition)
w = h(g(f(x)))
With object oriented programming, it is possible to write code declaritively, but still preserve the "natural" order of the operations:
# imperative (steps)
x = "This is a string"
y = x.lower() # make lower case
z = y.split() # split into a list of words
w = w.index("is") # find the index of the first occurrence of the word "is"
# declarative (function composition)
w = ("This is a string"
.lower() # make lower case
.split() # split into a list of words
.index("is")) # find the index of the first occurrence of the word "is"
Python Pro Tip: Use parentheses on both sides of an expression to break it into multiple lines. You can line things up and insert comments as desired.
Data science example of imperative/declarative style. Here we are plotting the number of rows per week.
# imperative
def plot_number_of_rows_per_week(df):
df1 = df.copy() # we don't want to change the original df
weeks = df1['date_time'].dt.week() # get week of the year
df1['week'] = weeks # add column for the week of the year
df2 = df1.groupby('week').size() # count number of rows for each week
df2.plot() # plot
# declarative
def plot_number_of_rows_per_week(df):
(df.groupby(df.date_time.dt.week()) # group by week of the year
.size() # count rows for each week
.plot()) # plot
Many loop constructs can be replaced with recursion. This is especially true of algorithms for trees.
class Tree:
def __init__(self, value):
self.val = value
self.left = None
self.right = None
"""
Check if a certain value is in a tree
"""
# imperative (while loop + a stack)
def contains(tree, value):
to_visit = [tree] # list of nodes to visit
while to_visit:
node = to_visit.pop() # visit node
if node is None:
continue
elif node.val == value:
return True
else:
# explore child nodes
to_visit.append(node.left)
to_visit.append(node.right)
return False
# declaritive (recursion)
def contains(tree, value):
# base case
if tree is None:
return False
# recursion
return (tree.val == value
or contains(tree.left, val)
or contains(tree.right, val))
Simple loops (especially those which construct lists or strings) can be accomplished via list comprehension
"""
Copy list and add one to every item.
"""
# imperative (for loop)
def add_one_to_each_item(old_list):
new_list = old_list
for n in old_list:
new_list.append(n+1)
return old_list
# declaritive (list comprehension)
def add_one_to_each_item(old_list):
return [n + 1 for n in old_list] # list comprehension
"""
Divide all even numbers by 2.
"""
# imperative (for loop)
def divide_by_two_if_even(old_list):
new_list = old_list
for n in old_list:
if n % 2 == 0: # even
new_list.append(n // 2)
return old_list
# declaritive (list comprehension)
def add_one_to_each_item(old_list):
return [n // 2 for n in old_list if n % 2 == 0]
"""
Reverse each word in a string.
e.g. "This is a sentence." -> "sihT si a ecnetnes"
"""
# imperative (for loop)
def reverse_each_word(sentence):
words = sentence.split() # split sentences
s = ""
for w in words:
s += " " + w[::-1] # add space and reversed word
return s[1:] # remove leading space
# declaritive (list comprehension and join)
def reverse_each_word(sentence):
return " ".join([w[::-1] # reverse word
for w in sentence.split()]) # for all words in the sentence
There are other types of comprehension.
# list comprehension
[n for n in range(4) if n != 2] # == [0, 1, 3]
# set comprehension
{n for n in range(4) if n != 2} # == {0, 1, 3}
# dictionary comprehension
{n:n+1 for n in range(4) if n != 2} # == {0:1, 1:2, 3:4}
# tuple comprehension
tuple(n for n in range(4) if n != 2) # == (0, 1, 4)
# iterator comprehension
(n for n in range(4) if n != 2) # an iterator (advanced concept)
sum(list_of_nums) # sum numbers
min(list_of_nums) # find smallest number
max(list_of_strings, key=len) # find longest word
"".join(list_of_strings) # concatenate strings
reversed(my_list) # reverse the list
all(list_of_bools) # True if all conditions are True
any(list_of_bools) # True if at least one condition is True
These are all example of reduce
operations.
# imperative
def sign(n):
if n < 0:
return -1
elif n == 0:
return 0
else:
return 1
# declaritive
def sign(n):
return (-1 if n < 0
else (0 if n == 0 else 1))
The inline ... if ... else ...
of Python is a bit awkward to use. Most other programming languages have a more natural inline if ... then ... else ...
structure. SQL has CASES
which has a similar functionality.
SELECT
CASES
WHEN age < 16 THEN 'child age'
WHEN age < 65 THEN 'working age'
ELSE 'retirement age'
END AS age_category
FROM income_list;
Nonetheless, Python's if else
logic is useful for some one-liners.
def replace_negative_numbers_with_0(number_list):
return [0 if x < 0 else x for x in number_list]
# pattern matching
x, _, y = [1, 2, 3] # set x = 1 and y = 3 (ignore 2)
a, (b, c), d = (1, (2, 3), 4) # set a = 1, b = 2, c = 3, d = 4
# swapping variables
a, b = b, a # swaps a and b
n_old, n = n, n + 1 # simaltaneously set n_old = n and n = n + 1
# cool iterators which loop over pairs
for index, item in enumerate(my_list): # loop over index/items pairs
pass
for a, b in zip(list_a, list_b): # loop over list_a and list_b in parallel
pass
for key, value in my_dict.items(): # loop over dictionary key/value pairs
pass
for i, (a, b) in enumerate(zip(list_a, list_b)):
pass
Purely functional languages (e.g. Haskell) have really powerful pattern matching abilities.
The following Python packages are really useful
collections
: useful types such asdefaultdict
anddeque
itertools
: useful tools for working with iteratorsfunctools
: useful tools for working with functions
You are are writing the same code often:
- Check if that functionality is already in a standard Python package.
- Turn your code into a function (or class or decorator).
- Write tests for your function.
- Replace your old code with that function.
Python is not a functional language and does not have the tools to write in a purely declaritive style all the time.
- Functional programming must be written declaritively.
- Imperative programming may be written declaritively (when appropriate).
The goal is to write working, readable, maintainable code. Use common sense.
Numpy and Pandas (as well as R, Matlab, Tensorflow, and probabilistic programming languages) are based on vector calculations. Vectorized operations are faster since they are done in C, Fortran, or the GPU. Vectorized programming is also more declarative.
# fast vectorized operations (declarative)
y = np.matmul(w, x) + b
Computer science is moving to massively parallel programming. Writing declarative code with pure functions which have no side effects goes a long way to making code which is naturally parallelizable.
Google's MapReduce is an example of how functional/declarative programming has influenced data science. Suppose one wants to count the instances of every word on the Internet.
- Map (In parallel for each website) Count the words on a website, creating a dictionary of word counts.
- Group (In parallel for each website) For each word in the dictionary, send that word count to a process based on that word.
- Reduce (In parallel for each word) Sum all the counts for that word.
Google noticed many of their data gathering tasks followed this format, and therefore they could reduce the task to a simple function:
map_reduce(map_function, reduce_function)
Hadoop is based on MapReduce applied to databases.
- Immutable Types can't be changed after they are created. These include
int
,float
,string
,tuple
,bool
,frozenset
,bytes
,complex
- Mutable Types can be changed. These include
list
,dict
,set
,np.array
,pd.DataFrame
, and almost all other types
def try_to_change_input(x):
x += x # will this change x?
return x
# immutable
a = 1
try_to_change_input(a) # == 2
a # == 1
b = "hello"
try_to_change_input(b) # == "hellohello"
b # == "hello"
c = (1, 2, 3)
try_to_change_input(c) # == (1, 2, 3, 1, 2, 3)
c # == (1, 2, 3)
# mutable
d = [1, 2, 3]
try_to_change_input(d) # == [1, 2, 3, 1, 2, 3]
c # == [1, 2, 3, 1, 2, 3] changed!
e = np.array([1, 2, 3])
try_to_change_input(e) # == np.array([2, 4, 6])
e # == np.array([2, 4, 6]) changed!
Python pro tip:
# tuples of various lengths
(1, 2, 3)
(1, 2)
(1,) # note the trailing comma; in Python: (1) == 1
() # could also use tuple()
Functional programmers are really good at building immutable equivalents of mutable objects.
# An immutable linked list (built with 2-tuples)
my_list = (1, (2, (3, (4, (5, (6, None))
# even though it is immutable I can "change" the first element
my_list2 = (-1, my_list[1]) # == (-1, (2, (3, (4, (5, (6, None))
# but this doesn't change the original list
my_list # == (1, (2, (3, (4, (5, (6, None))
# an immutable tree (built with 3-tuples)
# a
# / \
# b e
# / \ \
# c d f
t = ("a", # first element is the value
("b", # second element is the left child
("c", None, None),
("d", None, None)),
("e", # third element is the right child
None,
("f", None, None))
Python pro tip:
Use collections.NamedTuple
to build immutable types with attributes.
Python hack:
np.arrays
are not immutable, but they can hacked to be immutable by setting a flag.
a = np.array([1, 2, 3])
a.setflags(write=False) # makes immutable
a += 1 # this will raise an error
Immutable types help ensure that functions stay pure (no side effects). This is really important for parallel programming where the most common bugs are when two different processes try to change the same (non-thread-safe) object.
Immutable types can be used as dictionary keys. The keys of a dict
ionary (and the items in a set
) must be immutable.
More formally, keys need to be hashable, but in Python the hashable object are exactly the immutable ones:
int
,float
,complex
,bool
,string
,bytes
,tuple
(if the items of the tuple are hashable!)frozenset
Python pro tip:
d = {}
# tuples make good dictionary keys
d[1, 2] = 3 # key = (1,2) , value = 3
(1, 2) in d # == True
# if you really need to use an immutable object as a key:
# convert lists to tuples
d[tuple(my_list)] = 3
# convert sets and dictionaries to frozensets
d[frozenset(my_set)] = 3
d[frozenset(my_dict.items())] = 3
# convert numpy arrays to bytes
d[my_array.tobytes())] = 3
Pure functions can be memoized/cached. This is a significant part of dynamic programming.
cache = {} # dict storing the results
def fib(n):
"""
Computes nth Fibonacci number
0, 1, 1, 2, 3, 5, 8, ...
"""
# check cache:
if n in cache: return cache[n]
# otherwise calculate result (and cache it)
elif n == 0: result = cache[n] = 0
elif n == 1: result = cache[n] = 1
else: result = cache[n] = fib(n - 1) + fib(n - 2)
return result
Python pro tip:
Use @functools.lru_cache
to memoize automatically.
from functools import lru_cache
@lru_cache # automatically memoizes this function
def fib(n):
"""
Computes nth Fibonacci number
0, 1, 1, 2, 3, 5, 8, ...
"""
if n == 0: return 0
elif n == 1: return = 1
else: return fib(n - 1) + fib(n - 2)
-
Strongly typed language. Every object has a fixed type. Python is strongly typed, as are C++, Java, and most functional languages.
3 # type: int "abc" # type: string np.array([1,2,3]) # type: np.array
-
Dynamically typed (a.k.a. duck typed) language. Variables and function arguments don't have a specified type. It is sometimes even possible to add attributes to objects. Python is dynamically typed. This makes it great for a scripting language, but can cause errors.
def f(x): return x + x # no type specification of x f(1) # == 2 f("abc") # == abcabc f([1,2]) # == [1,2,1,2] f(np.array([1,2])) # == np.array([2,4]) f(None) # TypeError! (+ doesn't work on None)
-
Statically typed language. Every variable and every function has a specified type. C++, Java, and most functional languages are statically typed.
Python pro tip: Python has type hints, so it can pretend to be statically typed.
from typing import List def my_function(l: List[int], i:int) -> int: # input types ─────┴──────────┘ | # return type ────────────────────────┘ return l[i] # This is only a suggestion. The compiler ignores type hints! my_function("abc", 2) # == "c"
Many newer functional languages have type inference were the compiler automatically infers the types for you so you don't have to always put them in the code.
-
Type safe language. This means you can't do inappropriate things with types, and you always know what type a variable will be. In this sense Python is not type safe. A common issue with type safely is Null values:
def divide(i: int, j:int) -> int: if j == 0: return None # to signify multiplying by zero else: return i // j divide(1,0) # == None (which is not an int) # Is this what I want? Does the use of divide # in my code take this into account?
Some functional languages have advanced ways to:
- avoid null types (which we have to remember to handle)
- avoid raising errors (which we have to remember to handle)
- make sure one handles all possible cases (even rare ones)
Holy Trinity of functional programming theory
- Type theory
- Symbolic logic
- Category theory
Don't believe anyone who says you have to learn category theory to program in a functional language. It is enough to understand basic type theory.
If A
and B
are types, then so is the product A x B
. An element of this type is a pair (a, b)
. This is basically the same thing as a tuple in Python.
a # type: A
b # type: B
(a, b) # type: A x B
If A
and B
are types then A -> B
is the type of functions with inputs from A
and outputs from B
.
def add(a: int, b:int) -> int: # type: (int x int) -> int
return a + b
def add_curried(a: int): # type: int -> (int -> int)
return lambda b: a + b
add(2, 3) # 5 type: int
add_curried(2) # function type: int -> int
add_curried(2)(3) # 5 type: int
This is an example of currying. In most purely functional languages, functions default to curried form. This makes it easy to partially apply the function.
plus_one = add_curried(1) # function (type: int -> int)
In a functional language like OCaml, one doesn't have to write parentheses:
add;; (* Type: int -> int -> int *)
add 1;; (* Type: int -> int *)
add 1 3;; (* Type: int *)
add mult 1 2 add 3 4;; (* Type: int *)
If A
and B
are types then A | B
is the type of objects x
, where x
is of type either A
or B
. There is not (a natural) way to do this in Python. Sum types are great because they allow more than one type as input but still have strict control over the types of input.
OCaml, for example, has a generalization of sum types:
type PrimaryColor = (* A type with only three values *)
| cyan
| yellow
| magenta;;
type Option =
| None (* None value represents an error *)
| Some of int;; (* All other values are of the form Some x *)
cyan;; (* Type: PrimaryColor *)
None;; (* Type: Option *)
Some 5;; (* Type: Option *)
This allows pattern matching.
let other_colors c =
match c with
| cyan -> (yellow, magenta)
| yellow -> (magenta, cyan)
| magenta -> (cyan, yellow);;
let option_add x y =
match x with
| None -> None
| Some x_ -> Some (x_ + y);;
other_colors cyan;; (* (yellow, magenta) *)
option_add (Some 5) 4;; (* Some 9 *)
option_add None 4;; (* None *)
Recursive types make it easy to construct new types and have them be self referential. Recursive functions on these objects are very natural to implement.
type linked_list =
| Nil (* The end of the list *)
| Cons of linked_list;; (* A node in the linked list *)
(* "Cons" is short for constructor *)
type tree =
| Leaf (* Dead end *)
| Node of int * tree * tree;; (* Node containing:
- node value
- left subtree
- right subtree *)
(* recursive function which sums all the values in a tree *)
let sum_values t =
match t with
| Leaf -> 0
| Node v t1 t2 -> v + (sum_values t1) + (sum_values t2);;