Data challenge 1 at Propulsion Academy
- Web scraping of financial reports in Python and statistical analysis of companies current ratios to see if it can be used to predict the probability of surviving a financial crisis (2008 crisis).
- The scraped financial reports can later be used for sentiment/theme analysis using NLP.
- Google Slides presentation of this project (preliminary results).
- Google Doc notes/brainstorming for this project.
- Matthias Galipaud
- Peerawan Wiwattananon
- Mevluet Polat ([email protected])
- Anselme Borgeaud ([email protected])
- Python data scraping and basic analysis scripts are in the notebooks folder
- R data analysis (incl. survival analysis) are in the R analysis folder
- Companies from 2005 with their most recent SEC filling dates (6409 companies) [CSV] (companies_filling_minimal.csv)
- Stock closing price data [Parquet] (stock_closes.pq)
- stock_close.pq only for companies in mevluet_data_merged.csv [CSV] (prices_mevluet_data.csv)
- Bulk financial report data (from 2005, but data mostly from 2008) [CSV] (mevluet_data_merged.csv)
- current assets and liabilities data for companies since 1993 scraped from the SEC website [CSV] (scraped_companies.csv)
- scraped_companies.csv merged with companies_fillinf_minimal.csv [CSV] (scraped_companies_merged_survival.csv)
- scraped_companies_merged_survival.csv filtered for continuous records that includes fiscal year 2005 [CSV] (scraped_companies_merged_survival_continuous_from2005.csv)