Fama French 3-Factor Model

By Robert Yip
Oct 2018
Built with Python

In this project, I build a Fama French 3-factor model using two opposite portfolios from Morningstar. The first portfolio is based on an Aggressive strategy and the other a Conservative strategy. The results show the model efficacy based on the strength of the fit.

The project includes these steps:

Retrieval - Sourcing the raw data from CSV files (monthly snapshots of the portfolio). The CSVs are retrieved and exported from SQL. Multiple functions are defining to aid in the retrieval and table transformation.
Transformation - Setup the tables properly to calculate returns and categorize portfolio constituents to factors
Regression model

The Fama French 3-factor model has these attributes. The description explains the proxy that I used.
Market Premium - Calculated from S&P/TSX Composite Index and 90-day Treasury Bills.
SMB - Categorized each security as small or large market cap by using 30-70 percentiles of aggregate market cap in portfolio.
HML - Used inverse of P/B as proxy to catergorize and calculate book to market value.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import statsmodels.api as sm
#from sklearn.linear_model import LinearRegression
import scipy, scipy.stats
pd.options.mode.chained_assignment = None  # default='warn'

C:\Users\Ry\Anaconda3\lib\site-packages\statsmodels\compat\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools

#Set years of data to look at: 2007-2018
start = 2007
end = 2018

1) Retrieving the data

def dfAppend(strat, s, e):
    """
    Reads data and appends to a central df.
    df    = data frame to append to
    strat = Strategies label {A = Aggressive, C = Conservative}
    start = year to start
    end   = year to end, not inclusive of end date
    """
    df = pd.DataFrame()
    fileRange = np.arange(s, e + 1)     
    
    for i in fileRange:
        if i == s:
            dfAdd = pd.read_csv(strat + str(i) + ".csv", skiprows=range(1,4), sep=',', encoding='iso-8859-1')
        else:
            dfAdd = pd.read_csv(strat + str(i) + ".csv", skiprows=range(1,4), sep=',', encoding='iso-8859-1')
        dfAdd = dfAdd.iloc[:, : -1] #drops last extra column
        dfAdd['Year'] = i
        df = df.append(dfAdd, ignore_index = True)

    return df

##2) Transformation

def dfResetIndex (df):
    """
    Resets index each time a df is made
    """
    
    return df.reset_index(drop = True)

def calcReturn(df):
    """
    Gives return of individual security from portfolio
    """
   
    return round(df.Return.mean(), 4)

def calcRf (df):
    """
    Returns Rf for FF
    """
    df = dfResetIndex (df)
    return round(df.TB90[1], 4)

def fillMktPrem (df, s, e):
    """
    Fills in MktPrem to DF
    """
    dfNew = df
    dfNew['MKtReturn'] = ""
    dfNew['MktPrem'] = ""
    fileRange = np.arange(s, e)    
    for i in fileRange:
        dfNew['MKtReturn'].loc[dfNew['Year'] == i+1] = (dfNew['TRI'].loc[dfNew['Year'] == i+1].iloc[0] / dfNew['TRI'].loc[dfNew['Year'] == i].iloc[0] - 1)*100
        dfNew['MktPrem'].loc[dfNew['Year'] == i+1] = dfNew['MKtReturn'].loc[dfNew['Year'] == i+1].iloc[0] - dfNew['TB90'].loc[dfNew['Year'] == i+1].iloc[0]
    return dfNew

def calcMktPrem (df):
    """
    Returns Mkt Premium for FF
    """
    df = dfResetIndex(df)
    return round(df.MktPrem[1], 4)

def calcSMB(df):
    """
    Returns SMB for FF
    """
    #Define Quantile
    SQuantile = 0.3
    LQuantile = 0.7
    df["SMB"] = ""
    
    #Assigns stock size based on market cap
    df.SMB[df.MKTCAP <= df.MKTCAP.quantile(SQuantile)] = "SCap"
    df.SMB[(df.MKTCAP > df.MKTCAP.quantile(SQuantile)) & (df.MKTCAP < df.MKTCAP.quantile(LQuantile))] = "MCap"
    df.SMB[df.MKTCAP >= df.MKTCAP.quantile(LQuantile)] = "LCap"
    
    #Calculates average return of stocks in portfolio subset based on size
    SmallCapReturn = df.Return.loc[df["SMB"] == "SCap"].mean()
    LargeCapReturn = df.Return.loc[df["SMB"] == "LCap"].mean()
    
    #Returns SMB based on definition
    SMB = SmallCapReturn - LargeCapReturn
    return round(SMB, 4)

def calcHML (df):
    """
    Returns HML for FF
    Uses inverse of P/B as proxy for Book/Mkt
    """
    #Define Quantile
    SQuantile = 0.3
    LQuantile = 0.7
    df["HML"] = ""
    df["BP"] = df.PB**(-1) #Create Book/MktValue Proxy
    
    #Assigns stock size based on market cap
    df.HML[df.BP <= df.BP.quantile(SQuantile)] = "SValue"
    df.HML[(df.BP > df.BP.quantile(SQuantile)) & (df.BP < df.BP.quantile(LQuantile))] = "MValue"
    df.HML[df.BP >= df.BP.quantile(LQuantile)] = "LValue"
    
    #Calculates average return of stocks in portfolio subset based on size
    SmallValueReturn = df.Return.loc[df["HML"] == "SValue"].mean()
    LargeValueReturn = df.Return.loc[df["HML"] == "LValue"].mean()
    
    #Returns SMB based on definition
    HML = SmallValueReturn - LargeValueReturn
    return round(HML, 4)

def cleanColumns(df):
    """
    Cleans up unnecessary characters
    Cleans up columns, removing the extras
    """
    dfNew = df
   
    try:
        dfNew.columns = dfNew.columns.str.replace(' ','')
    except:
        pass
    
    try:
        dfNew.columns = dfNew.columns.str.replace('/','')
    except:
        pass
    
    dfNew = dfNew.rename(columns={"PCHG12M": "Return"})
    
    dfNew = dfNew[['Symbol',
                'Year',
                'Return',
                'TRI',
                'TB90',
                'MKTCAP',
                'PB'
                ]]
    
    return dfNew

###Set up Data Frame

#Create empty data frame for the strategies
dfA = pd.DataFrame()
dfC = pd.DataFrame()

#Append the list
dfA = dfAppend("A", start, end) #year 2007-2018
dfC = dfAppend("C", start, end) #year 2008-2018

###Clean up Data Frame and preparing for FF model
#Remove space in columns
dfA = cleanColumns(dfA)
dfC = cleanColumns(dfC)

dfA

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	Symbol	Year	Return	TRI	TB90	MKTCAP	PB
0	TCM	2007	394.1667	34542.6016	3.90	2009.5763	4.8272
1	BB	2007	196.5454	34542.6016	3.90	50392.2070	17.3708
2	QUX	2007	52.4324	34542.6016	3.90	914.2384	2.0376
3	TRE	2007	300.0000	34542.6016	3.90	2908.6021	2.6323
4	VT	2007	46.1039	34542.6016	3.90	2296.7551	2.1184
5	LNR	2007	89.3727	34542.6016	3.90	1791.8121	2.0731
6	MDI	2007	102.1609	34542.6016	3.90	1035.5814	4.6532
7	ET	2007	112.4800	34542.6016	3.90	1918.1101	17.0584
8	SW	2007	89.2187	34542.6016	3.90	662.2232	3.3381
9	NTR	2007	158.3199	34542.6016	3.90	29561.9219	6.0249
10	OIL1	2007	90.4130	34542.6016	3.90	2810.2747	5.2046
11	VRS1	2007	325.7028	34542.6016	3.90	622.7500	3.6446
12	AXP1	2007	31.8211	34542.6016	3.90	1124.9443	1.5986
13	SVC	2007	n/a	34542.6016	3.90	771.8656	5.4759
14	ELR	2007	80.3150	34542.6016	3.90	1528.5818	1.8666
15	AGU	2007	87.9626	34542.6016	3.90	6445.6694	4.0877
16	MG	2007	19.5411	34542.6016	3.90	10483.0146	1.2566
17	AL	2007	109.7164	34542.6016	3.90	38855.3320	3.0089
18	NCX	2007	9.1915	34542.6016	3.90	3194.3237	3.4441
19	EQN	2007	120.0000	34542.6016	3.90	1972.0448	3.9819
20	IOL	2007	111.3636	34542.6016	3.90	1113.1356	10.1440
21	SCL	2007	85.8289	34542.6016	3.90	2491.2971	4.3214
22	SCC	2007	27.5294	34542.6016	3.90	2916.5293	3.4336
23	OTEX	2007	45.7270	34542.6016	3.90	1318.4620	2.3879
24	EMP.A	2007	12.1356	34542.6016	3.90	1547.3501	1.5353
25	SAP	2007	34.4371	34542.6016	3.90	5194.7700	3.6509
26	ACO.X	2007	35.3991	34542.6016	3.90	3264.8979	2.1489
27	AGF.B	2007	60.0913	34542.6016	3.90	3164.1650	3.0294
28	BLS1	2007	4.2755	34542.6016	3.90	6353.5815	3.3295
29	RCI.B	2007	68.3509	34542.6016	3.90	30668.1426	7.3387
...	...	...	...	...	...	...	...
690	PVG	2018	4.7297	55229.8008	1.52	1985.1681	1.7643
691	VET	2018	1.8182	55229.8008	1.52	6315.7832	2.4468
692	DSG	2018	29.9487	55229.8008	1.52	3501.9514	5.2637
693	ECA	2018	48.3262	55229.8008	1.52	16524.0566	1.9500
694	BHC	2018	79.5116	55229.8008	1.52	10510.0762	2.3301
695	KL	2018	49.9078	55229.8008	1.52	5151.8237	3.2756
696	MEG	2018	64.471	55229.8008	1.52	2445.5652	0.6197
697	FSV	2018	28.3481	55229.8008	1.52	3866.5066	14.0518
698	ERF	2018	45.5365	55229.8008	1.52	3959.0337	2.3204
699	BAD	2018	0.3112	55229.8008	1.52	1076.3000	3.2235
700	CIGI	2018	63.8897	55229.8008	1.52	4030.4082	9.7911
701	OSB	2018	15.4884	55229.8008	1.52	4303.6250	3.6446
702	ECI	2018	38.1862	55229.8008	1.52	3107.1428	5.3444
703	BB	2018	20.0519	55229.8008	1.52	7463.3804	2.3713
704	MTY	2018	28.535	55229.8008	1.52	1520.9523	2.5935
705	PSI	2018	18.9944	55229.8008	1.52	1819.0796	5.0627
706	BBD.B	2018	72.4	55229.8008	1.52	10419.2314	331.5385
707	GC	2018	35.7122	55229.8008	1.52	2820.1333	5.1500
708	BAM.A	2018	12.8973	55229.8008	1.52	55326.3750	1.6888
709	WCP	2018	-10.1336	55229.8008	1.52	3369.1038	1.0425
710	SHOP	2018	37.5797	55229.8008	1.52	17794.7813	9.3487
711	PD	2018	60.8833	55229.8008	1.52	1497.9512	0.8482
712	RNW	2018	-14.8592	55229.8008	1.52	3173.7849	1.3884
713	VII	2018	-19.2004	55229.8008	1.52	5555.5166	1.2358
714	BCB	2018	7.2219	55229.8008	1.52	2836.1016	1.7670
715	TCL.A	2018	30.8231	55229.8008	1.52	2792.2144	1.9074
716	CP	2018	41.264	55229.8008	1.52	39135.0820	5.9499
717	CNR	2018	14.6611	55229.8008	1.52	85207.3281	4.9033
718	TRQ	2018	-28.0285	55229.8008	1.52	6097.3125	0.5099
719	CCO	2018	8.3067	55229.8008	1.52	5366.9531	1.1095

720 rows × 7 columns

###Fill in MktPrem
#This part should only be done once
dfA = fillMktPrem (dfA, start, end)
dfC = fillMktPrem (dfC, start, end)

###Continue Cleanup
#Drop First Year
dfA = dfA.loc[dfA['Year'] != start]
dfC = dfC.loc[dfC['Year'] != start]

dfA = dfA.reset_index(drop = True)
dfC = dfC.reset_index(drop = True)
#Convert all inputs used to numeric

dfA.iloc[:, 2:] = dfA.iloc[:, 2:].convert_objects(convert_numeric=True)
dfC.iloc[:, 2:] = dfC.iloc[:, 2:].convert_objects(convert_numeric=True)

C:\Users\Ry\Anaconda3\lib\site-packages\ipykernel_launcher.py:10: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  # Remove the CWD from sys.path while we load stuff.
C:\Users\Ry\Anaconda3\lib\site-packages\ipykernel_launcher.py:11: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  # This is added back by InteractiveShellApp.init_path()

dfA.head(30)

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	Symbol	Year	Return	TRI	TB90	MKTCAP	PB	MKtReturn	MktPrem
0	ATA	2008	39.0400	34992.8008	2.4	671.5371	1.4624	1.303316	-1.096684
1	BIR	2008	219.2500	34992.8008	2.4	1435.1438	2.9372	1.303316	-1.096684
2	IOL	2008	-11.5591	34992.8008	2.4	1136.4976	5.7316	1.303316	-1.096684
3	CR	2008	110.6742	34992.8008	2.4	1067.9100	1.6085	1.303316	-1.096684
4	HPX	2008	28.5444	34992.8008	2.4	922.6240	1.3032	1.303316	-1.096684
5	TOG1	2008	183.8667	34992.8008	2.4	2386.6516	2.1461	1.303316	-1.096684
6	AGU	2008	86.1347	34992.8008	2.4	14185.3096	3.4758	1.303316	-1.096684
7	PMCS	2008	17.8746	34992.8008	2.4	1998.7028	2.5971	1.303316	-1.096684
8	PXE	2008	50.8172	34992.8008	2.4	1178.0902	2.9557	1.303316	-1.096684
9	NVA	2008	13.6364	34992.8008	2.4	1186.5299	1.6271	1.303316	-1.096684
10	NTR	2008	97.4353	34992.8008	2.4	56345.7031	8.1750	1.303316	-1.096684
11	RUS	2008	3.0191	34992.8008	2.4	1942.6224	2.1252	1.303316	-1.096684
12	BBD.B	2008	33.0645	34992.8008	2.4	14468.5039	4.8851	1.303316	-1.096684
13	TLM	2008	3.5872	34992.8008	2.4	19121.1309	2.1333	1.303316	-1.096684
14	CFP	2008	-12.1359	34992.8008	2.4	1548.5165	0.8747	1.303316	-1.096684
15	AGI	2008	18.8929	34992.8008	2.4	1182.5964	1.8471	1.303316	-1.096684
16	TESO	2008	24.8582	34992.8008	2.4	1334.7764	3.7403	1.303316	-1.096684
17	TXP	2008	50.1761	34992.8008	2.4	3877.1294	5.4579	1.303316	-1.096684
18	GEA	2008	105.2811	34992.8008	2.4	1284.9156	3.4405	1.303316	-1.096684
19	GNA	2008	21.9512	34992.8008	2.4	6491.3252	1.4825	1.303316	-1.096684
20	BNK	2008	304.1667	34992.8008	2.4	885.1299	7.5475	1.303316	-1.096684
21	STE	2008	92.6241	34992.8008	2.4	1180.5229	2.3871	1.303316	-1.096684
22	CLS	2008	42.9253	34992.8008	2.4	1793.5140	0.8929	1.303316	-1.096684
23	ARE	2008	30.2488	34992.8008	2.4	851.7207	2.5575	1.303316	-1.096684
24	CMT	2008	-6.4182	34992.8008	2.4	1177.5142	1.3477	1.303316	-1.096684
25	CLL	2008	12.3288	34992.8008	2.4	865.2106	1.8701	1.303316	-1.096684
26	CNQ	2008	25.5576	34992.8008	2.4	49019.0195	3.6022	1.303316	-1.096684
27	CSU	2008	13.4179	34992.8008	2.4	498.6680	6.6142	1.303316	-1.096684
28	HSE	2008	21.1355	34992.8008	2.4	39861.0703	3.0938	1.303316	-1.096684
29	NXY	2008	12.6989	34992.8008	2.4	17655.6055	2.6526	1.303316	-1.096684

#Create Fama French 3 factor model for Aggressive Strategy
FFA = pd.DataFrame(columns =
                  ["Year",
                   "Return",
                   "Rf",
                   "MktPrem",
                   "SMB",
                   "HML"                    
                  ])
FFAIndex = 0 
for i in range(start+1, end+1):
    FFA.loc[FFAIndex] = [i, 
                    calcReturn(dfA.loc[dfA['Year'] == i]), 
                    calcRf(dfA.loc[dfA['Year'] == i]), 
                    calcMktPrem(dfA.loc[dfA['Year'] == i]), 
                    calcSMB(dfA.loc[dfA['Year'] == i]), 
                    calcHML(dfA.loc[dfA['Year'] == i])
                   ]
    FFAIndex += 1
FFA['Year'] = FFA['Year'].astype(int)

#Create Fama French 3 factor model for Conservative Strategy
FFC = pd.DataFrame(columns =
                  ["Year",
                   "Return",
                   "Rf",
                   "MktPrem",
                   "SMB",
                   "HML"                    
                  ])
FFCIndex = 0 
for i in range(start+1, end+1):
    FFC.loc[FFCIndex] = [i, 
                    calcReturn(dfC.loc[dfC['Year'] == i]), 
                    calcRf(dfC.loc[dfC['Year'] == i]), 
                    calcMktPrem(dfC.loc[dfC['Year'] == i]), 
                    calcSMB(dfC.loc[dfC['Year'] == i]), 
                    calcHML(dfC.loc[dfC['Year'] == i])
                   ]
    FFCIndex += 1
FFC['Year'] = FFC['Year'].astype(int)

FFA

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	Year	Return	Rf	MktPrem	SMB	HML
0	2008	36.8584	2.40	-1.0967	18.7416	43.6863
1	2009	12.8186	0.20	-17.5444	12.4716	36.1373
2	2010	68.1621	0.67	10.8205	14.7011	60.3519
3	2011	52.0113	0.89	8.2170	24.8434	61.1273
4	2012	29.9158	1.03	-5.2575	5.4930	46.9944
5	2013	38.0250	0.99	9.0373	13.0665	10.3304
6	2014	54.0386	0.94	26.1894	25.7608	66.1083
7	2015	20.5506	0.37	-9.0489	-17.1839	61.2438
8	2016	74.2083	0.50	8.1886	26.1487	18.0131
9	2017	31.4535	0.71	6.5242	-0.7788	21.5734
10	2018	34.1188	1.52	8.5722	6.5659	25.0311

FFC

.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}

</style>

	Year	Return	Rf	MktPrem	SMB	HML
0	2008	-9.2666	2.40	-1.0967	4.2731	21.5956
1	2009	-5.4656	0.20	-17.5444	-1.4298	-2.2787
2	2010	15.8135	0.67	10.8205	11.8667	-8.5265
3	2011	8.0557	0.89	8.2170	13.6852	13.4448
4	2012	5.7244	1.03	-5.2575	4.1007	5.9682
5	2013	20.0063	0.99	9.0373	7.6418	-6.0691
6	2014	17.3592	0.94	26.1894	-3.0535	3.7997
7	2015	-1.1869	0.37	-9.0489	-1.8108	23.3356
8	2016	4.3359	0.50	8.1886	-6.1322	4.7550
9	2017	6.4929	0.71	6.5242	-0.8448	-4.3763
10	2018	5.8371	1.52	8.5722	-13.3971	19.1939

3) Regression

#Set up regression
Y = FFA.Return.values 
X = FFA[["MktPrem","SMB","HML"]]


model = sm.OLS( Y.astype(float), X.astype(float) )

result = model.fit()
print (result.params)
print(result.summary())

MktPrem    0.662782
SMB        0.994160
HML        0.545028
dtype: float64
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.877
Model:                            OLS   Adj. R-squared:                  0.831
Method:                 Least Squares   F-statistic:                     19.08
Date:                Tue, 25 Sep 2018   Prob (F-statistic):           0.000528
Time:                        11:42:09   Log-Likelihood:                -45.925
No. Observations:                  11   AIC:                             97.85
Df Residuals:                       8   BIC:                             99.04
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
MktPrem        0.6628      0.577      1.148      0.284      -0.669       1.994
SMB            0.9942      0.484      2.054      0.074      -0.122       2.110
HML            0.5450      0.158      3.443      0.009       0.180       0.910
==============================================================================
Omnibus:                        0.240   Durbin-Watson:                   1.821
Prob(Omnibus):                  0.887   Jarque-Bera (JB):                0.081
Skew:                          -0.128   Prob(JB):                        0.960
Kurtosis:                       2.666   Cond. No.                         5.55
==============================================================================

#Set up regression
Y = FFC.Return.values 
X = FFC[["MktPrem","SMB","HML"]]
X = sm.add_constant(X)
X.rename(columns = {"const":"Intercept"}, inplace = True)


model = sm.OLS( Y.astype(float), X.astype(float) )

result = model.fit()
print (result.params)
print(result.summary())

Intercept    5.810956
MktPrem      0.528004
SMB          0.151975
HML         -0.310956
dtype: float64
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.761
Model:                            OLS   Adj. R-squared:                  0.659
Method:                 Least Squares   F-statistic:                     7.436
Date:                Tue, 25 Sep 2018   Prob (F-statistic):             0.0140
Time:                        11:42:09   Log-Likelihood:                -31.609
No. Observations:                  11   AIC:                             71.22
Df Residuals:                       7   BIC:                             72.81
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      5.8110      2.143      2.711      0.030       0.743      10.879
MktPrem        0.5280      0.148      3.562      0.009       0.177       0.879
SMB            0.1520      0.224      0.678      0.519      -0.378       0.682
HML           -0.3110      0.159     -1.953      0.092      -0.687       0.065
==============================================================================
Omnibus:                        0.221   Durbin-Watson:                   1.382
Prob(Omnibus):                  0.895   Jarque-Bera (JB):                0.364
Skew:                          -0.238   Prob(JB):                        0.834
Kurtosis:                       2.247   Cond. No.                         17.0
==============================================================================

belindajae129 / python-fama-french-quant-model Goto Github PK

python-fama-french-quant-model's Introduction

Fama French 3-Factor Model

1) Retrieving the data

3) Regression

python-fama-french-quant-model's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent