lemairejean-baptiste / eventstudy Goto Github PK

Event Study package is an open-source python project created to facilitate the computation of financial event study analysis.

License: GNU General Public License v3.0

Python 42.63% Makefile 0.41% Batchfile 0.40% CSS 1.14% Jupyter Notebook 55.42%

eventstudy's Introduction

Event Study package

Event Study package is an open-source python project created to facilitate the computation of financial event study analysis.

Install

$ pip install eventstudy

Documentation

You can read the full documentation here.

Go through the Get started section to discover through simple examples how to use the eventstudy package to run your event study for a single event or a sample of events.

Read the API guide for more details on functions and their parameters.

eventstudy's People

Contributors

Stargazers

Watchers

eventstudy's Issues

Import Returns

Hi this may be a silly question but it is not entirely clear to me.

When we import returns, do we import:

the price of stock or
percentage change from previous day or
some other calculated return

I'm slightly confused because the returns csv files in examples folder seems to be the percent change and not price.

Thanks!

Query on the computation of standard deviation of abnormal return AR in constant mean model

Hello,

I noticed in the constant mean model, the std(AR) is computed across the whole timeline (estimation & event window) rather than the estimation window alone, as per my understanding from A. Craig Mackinlay's paper[1] and EventStudyTools website[2]. This also aligns with your implementation of the OLS model.

eventstudy/eventstudy/models.py

Line 122 in 2c6b200

var = [np.var(residuals)] * event_window_size

If my observation is correct, is it okay if I contribute to the repo?

PS: Thank you for the great repo. You have taught me how to implement an event study myself.

Reference:
[1] A. Craig Mackinlay, Event Studies in Economics and Finance
[2] https://www.eventstudytools.com/significance-tests, Some Preliminaries section

Interface for data import

Thanks for the package!

Why does import has to be file or API - the user can have some dataframe with data (eg from web), seems one must save data to file to allow import?

Maybe there should be some method.import_returns_from_dataframe()? Can help implementing it, if it is a good idea for the project.

Also the decision to propagate import inside whole class - is it really worth it? Why does class need to be aware of the data?

https://lemairejean-baptiste.github.io/eventstudy/api/eventstudy.Single.import_returns.html#eventstudy-single-import-returns

TypeError: object of type 'map' has no len()

Hi,
when I try to get the table --> event.results() it says: "TypeError: object of type 'map' has no len()"

How can I solve this issue? Please much needed

statistics variance

Dear,
I was wondering what formula u were using for the t-test and its variance.
I am using the Fama and French three factor model with the linear regression function: residuals, df, var_res, model = Model(estimation_size, event_window_size, keep_model).OLS(X, Y). I saw something in the statistics code:
and

but I do not completely understand this formulation.
Is the variance formula the same like the following, but formulated in a more 'python' way?

with 4 the numbers of parameters needed to estimate the abnormal return (FF: one constant, three factors?)
Kind regards and thank you for the amazing package.

.to_excel

.results

Hi, thank you for the package!

I am having some trouble showing the results when calling event.results(decimals=[3,5,3,5,2,2]).

The plot is working properly as is the get_CAR_dist().

The error I am receiving is:
if not indexes and not raw_lengths:
TypeError: object of type 'map' has no len()

Do you have any idea where I might have gone wrong?

Thanks again //agarp

Segmented regressions with optional variables for identifying drivers of abnormal returns

Hey @LemaireJean-Baptiste ,

wouldn't it be helpful to extend the given functionality by being able to call a method that extends the given dataset by other similarly-formatted variables (such as oil prices or size of the firm, for instance) for running a segmented regression with the abnormal returns as the independent variable, analyzing corresponding coefficients and therefore potentially draw conclusions on the varying impact of included dependent variables as key drivers?

In M&A research, the significance of firm acquisitions (or generally certain events) on the shareholder value of the firm (or the stock price) is oftentimes analyzed by 1) determining whether statistically-significant abnormal returns associated with a particular event can be established and 2) adopt a more medium- to long-term view by investigating the drivers of varying abnormal returns by employing a segmented regression.

What do you think? Because when I was using your module for accomplishing step 1), I often wished that step 2) could handily applied as well (this can be achieved with any Python package that allows specifying segmented regressions, however it would be considerably more comfortable for this use case to have everything included in a single package.

Kind regards,
Andreas

Contribution to the computation of daily stock return, i.e. a preprocessing module

Hello,

I wonder if it's a good idea to extend this package to cover the data preprocessing, i.e. transform data into ready-to-consumed before feeding it into this package.

To elaborate further, when I research event study, I realize there are two different ways to compute daily stock return, that are:

$R_t = S_t/S_{t-1} -1$ with $S_t$ is stock value at day $t$ and $R_t$ is the correspondingly computed day-to-day return.
$R_t = log(S_t/S_{t-1})$ similarly but have good distributional characteristics.

Also, when I work on a use case where data could span in both negative and positive zones, I must develop a new variant for the first approach to handle corner cases dealing with 0. I'm new to this and am unsure if an event study can be applied to this case.

Please let me know if this idea is good and if I can contribute to this feature.
Thank you :))

# postive and negative

Hi there,

in the multiple event study setting, is there any way of attaining the number or percentage of positive and negative cases for the CAAR.

Included an example in the screenshot:

Cheers!!

Exporting Multiple Sets of Data

Hi,

I am trying to export my data to Excel but am having issues exporting multiple sets of data at once.

I ran multiple single events and want to export them at once, rather than manually having to export them individually. Intuitively, I have tried the following code but it does not work (I only get the data for the last event):

for event in results_full:
event.to_excel('test1.xlsx')

I am looking to retrieve data for an aggregate set of events (multiple class) and data for each individual event that make up the aggregate set (single class). Is there an efficient way to do this?

Thank you!

NaN output

Hey!

Just wondering why I might be getting NaN output (see attached). For context, the package works for a smaller estimation window i.e., approx 30-40 days. But some reason it all goes to NaN after a certain threshold. I checked to see if there was enough data to estimate the model/ if there were any outliers etc. but doesn't seem to be the case.

Other securities work fine!

Cheers

What are the requirements for the returns file?

Hey, first of all thank you for creating this amazing package, I like it a lot!
I am just struggling with the returnsfile. What are the requirements?

I created my csv file containing stockprices with yahoofinancials, creating an output like this:

But when I run the event study, I do not get the right output, I get an error:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'date'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/yanah/Documents/BEDRIJFSECONOMIE/Thesis/Results/Event1.py", line 5, in
es.Single.import_returns('prices.csv', is_price=True, log_return=False, date_format= '%Y-%m-%d')
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/eventstudy/single.py", line 327, in import_returns
data = read_csv(path, format_date=True, date_format=date_format)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/eventstudy/utils.py", line 112, in read_csv
df[date_column] = pd.to_datetime(df[date_column], format=date_format)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandas/core/frame.py", line 3458, in getitem
indexer = self.columns.get_loc(key)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 'date'

What am I doing wrong with my file?

Kind regards

Validating returns_GAFAM.csv file

Hi. I tried to validate the AAPL returns in the returns_GAFAM.csv file and found differences on the dividend dates. It would appear that the returns are underestimated as if the dividends had been ignored.

From Yahoo Finance
based on
pct_change of
Adj Closing Price
2012-08-09 | 0.005703
2012-11-07 | -0.038263
2013-02-07 | 0.029734
2013-05-09 | -0.008724
2013-08-08 | -0.001991

From returns_GAFAM.csv
date AAPL
2012-08-09 0.001404
2012-11-07 -0.042635
2013-02-07 0.023767
2013-05-09 -0.015242
2013-08-08 -0.008538

Thank you.

possibly some gain to accept data as pandas TimeSeries?

eventstudy/eventstudy/utils.py

Lines 75 to 87 in 2c6b200

 def get_index_of_date(data, date: np.datetime64, n: int = 4): 

 # assume the date exist and there is only one of it in the dataset 

 # assume date are in index 

 for i in range(n + 1): 

 index = np.where(data == date)[0] 

 if len(index) > 0: 

 return index[0] 

 else: 

 date = date + np.timedelta64(1, "D") 

 # return None if there is no row corresponding to this date or n days after. 

 return None

Carhart Four-Factor Model

This is very helpful and easy to use – thank you!

I was wondering whether you could add the option of using the Carhart Four-Factor Model to modelise the returns. This is essentially the Fama-French Three-Factor Model with an additional factor, Momentum. Many research papers use this model so it would be extremely useful to have it.

Link to Carhart (1997): https://onlinelibrary.wiley.com/doi/10.1111/j.1540-6261.1997.tb03808.x

I am new to Python so am not sure how I could add this myself. Since it is similar to the Fama-French Three-Factor Model, I assume there is a way to edit that model by adding the Momentum data to it. One source of this data is https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html, but probably best to calculate using code.

Help! Don't understand what csv files should include

I tried to follow the guide (https://lemairejean-baptiste.github.io/eventstudy/get_started.html), but I couldn't find any examples of the csv files:

What columns should the files contain? What kind of values should be inserted in the columns? Can someone please explain and show some examples?
Thanks in advance!

t-test for AAR

Dear Jean-Baptiste,

maybe a cool improvement idea would be to also include the t-test and p-values of H1: AAR = 0 next to the t-test and p-values of H1: CAAR = 0.

Best,
Constantin

results of event study can't be show in a table

Hi, I am a beginner of python programming and I got a problem on results table.
I will truly appreciate if you can help me.
The following is my code:

event= es.Single.constant_mean( security_ticker= 'aapl', event_date= np.datetime64('2020-01-16'), event_window= (-2,+10), estimation_size= 30, buffer_size= 0)

event.results(decimals=[3,5,3,5,2,2])

TypeError Traceback (most recent call last)

in
5 estimation_size= 30,
6 buffer_size= 0)
----> 7 event.results(decimals=[3,5,3,5,2,2])

D:\Anaconda3\lib\site-packages\eventstudy\single.py in results(self, asterisks, decimals)
194 asterisks_dict=asterisks_dict,
195 decimals=decimals,
--> 196 index_start=self.event_window[0],
197 )
198

D:\Anaconda3\lib\site-packages\eventstudy\utils.py in to_table(columns, asterisks_dict, decimals, index_start)
28 )
29
---> 30 df = pd.DataFrame.from_dict(columns)
31 df.index += index_start
32 return df

D:\Anaconda3\lib\site-packages\pandas\core\frame.py in from_dict(cls, data, orient, dtype, columns)
1136 raise ValueError('only recognize index or columns for orient')
1137
-> 1138 return cls(data, index=index, columns=columns, dtype=dtype)
1139
1140 def to_numpy(self, dtype=None, copy=False):

D:\Anaconda3\lib\site-packages\pandas\core\frame.py in init(self, data, index, columns, dtype, copy)
390 dtype=dtype, copy=copy)
391 elif isinstance(data, dict):
--> 392 mgr = init_dict(data, index, columns, dtype=dtype)
393 elif isinstance(data, ma.MaskedArray):
394 import numpy.ma.mrecords as mrecords

D:\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in init_dict(data, index, columns, dtype)
210 arrays = [data[k] for k in keys]
211
--> 212 return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
213
214

D:\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype)
49 # figure out the index, if necessary
50 if index is None:
---> 51 index = extract_index(arrays)
52 else:
53 index = ensure_index(index)

D:\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in extract_index(data)
303 elif is_list_like(val) and getattr(val, 'ndim', 1) == 1:
304 have_raw_arrays = True
--> 305 raw_lengths.append(len(val))
306
307 if not indexes and not raw_lengths:

TypeError: object of type 'map' has no len()

the beginner examples can not be used?

event = es.EventStudy.FamaFrench_3factor(....
AttributeError: module 'eventstudy' has no attribute 'EventStudy'
event I delete es,,, still shows do not have FamaFrench_3factor~~

DataMissingError

I imported files according the sample csv's in the repo. The debugger is reporting {'error_msg': "Some data are missing for (Mkt-RF) in 'FamaFrench''.", 'error_type': 'DataMissingError', 'event_date': numpy.datetime64('2009-02-27T00:00:00.000000'), 'market_ticker': 'SPY', 'security_ticker': 'APPLE'}. But when I open the actual csv, the row does not miss any data. This error only appears for some rows. And if I just ignore and finish running, the output AR table is full of NaN.

I checked the dtypes if I import the files as a dataframe. All dtypes match with the repo sample csv. All columns names match. The only difference is the number of decimal places.

I also ran the code using the original sample csv from the repo. The code works fine. So it must be something with my file. Any ideas?

import numpy as np
import matplotlib.pyplot as plt
from eventstudy.single import Single
from eventstudy.multiple import Multiple

Single.import_returns('security_returns.csv')
Single.import_FamaFrench('factor_returns.csv')
release_10K = Multiple.from_csv(
    path = 'earnings_surprises.csv', # the path to the csv file created  
    event_study_model = Single.FamaFrench_3factor,
    event_window = (-5,+10),
    estimation_size = 200,
    buffer_size = 30,
    date_format = '%d/%m/%Y',
    ignore_errors = True
)

print(release_10K.results(decimals=[3,5,3,5,2,2]))

Single event study loop: one event on one company for multiple companies

Hey!
In the get started pdf, there is a tutorial on how to set up a multi event study for different events and companies using the eventstudy.Multiple.from_list. However, is there a possibility to make a loop for a single event study on different companies?
I want to study the CAR effect of one event A on company A, as well as on company B, C, D, E, ...
And how can I export all the results for every company to excel?
Sorry for the stupid question, I am not a python expert, but I have been looking for the solution for a while now.
Kind regards,

The equations of market model\famafrench-3\famafrench-5\constant mean

Hello, I really appreciate your work and this package is excellent!
But could you please tell me the detailed equations to calculate the AR and CAR when you using the different models?

famafrench.csv

In the famafrench.csv, Is column Mkt-RF the same as (MktRet_t - RF_t)? Then why does the csv also contains RF column?

In fama french model Ret_(i,t) = alpha + beta *(MktRet_t - RF_t) + gamma*SMB_t + theta*HML_t where Ret_(i,t) is return of company i on day t, MktRet_t is market return on day t, RF_t risk free rate on day t, SMB_t is small minus big factor on day t. HML_t is high minus low factor on day t.

	def get_index_of_date(data, date: np.datetime64, n: int = 4):
	# assume the date exist and there is only one of it in the dataset
	# assume date are in index

	for i in range(n + 1):
	index = np.where(data == date)[0]
	if len(index) > 0:
	return index[0]
	else:
	date = date + np.timedelta64(1, "D")

	# return None if there is no row corresponding to this date or n days after.
	return None