Christian Grech
This repository contains the Python script and datasets used for a coding exercise evaluating my data processing and ETL development skills.
Key Features
- Reusable Functions: The script is structured with distinct functions for reading, cleaning, validating, and saving data, promoting modularity and potential reuse.
- Data Cleaning: Implements data type conversions, handling of missing values (forward-filling for stock prices) for two sample datasets:
- airline_flights.csv
- big_tech_stock_prices.txt
Datasets
- airline_flights.csv: A sample dataset containing information about airline flights.
- big_tech_stock_prices.txt: A sample dataset containing historical stock prices of major tech companies.
How to Run
- Prerequisites:
- Python 3.x
- pandas library (install with
pip install pandas
)
- Download: Download or clone this repository.
- Place Datasets: Place the
airline_flights.csv
andbig_tech_stock_prices.txt
files in the same directory as the Python script. - Execute: Run the script from the command line:
python etl.py
The result is two files called cleaned_airline_flights.csv
and cleaned_big_tech_stock_prices.txt
.