The task is to write a Python program to read an input file orders.jsonl and output three CSV files:
- File 1:
customers.csv
Fields: customer_id, city (renamed from customer_city), country (renamed from customer_country)
- File 2:
products.csv
Fields: product_id, product_name
- File 3:
order_items.csv
Fields: order_id, customer_id, product_id, quantity, price_gbp
Content | Description |
---|---|
input/ | Folder containing the input dataset orders.jsonl and 5 aditional test files used to test the validation and deduplication functions |
output/ | Folder where the output result will be stored (CSV files) |
main.py | Script that takes the input dataset and generates the required output |
requirements.txt | Python requirements to run main.py |
Dockerfile | Dockerfile for building a Docker image |
docker-compose.yml | Docker Compose file to run a container which executes main.py |
The solution was implemented as a Python script that contains three main functions executing different steps:
- Loading the input data (
extractData
), validating it and discarding invalid objects - Performing transformations (flattening and renaming) required to generate the desired output (
transformOrdersList
) - Deleting existing files in the output directory, creating deduplicated dataframes and generating CSV output files (
loadData
)
The solution was designed to fulfill the requirements of this specific task scope.
The project can be executed either locally or in a Docker container.
Once executed, the main.py
script will create the following three files in the output directory: customers.csv
, products.csv
, order_items.csv
.
To run locally, you will need Python 3.11 or newer installed in your local environment.
Then, install the dependencies:
pip install -r requirements.txt
And run the script:
python3 ./main.py
Make sure to have these two folders available in the same directory of the main.py
file: input
(containing orders.jsonl) and output
(initially empty).
To run using Docker, install Docker and run the following command from the project's root directory:
docker-compose up
This will build a Docker image according to instructions in Dockerfile
and create a Docker container that will run in any environment. It will then execute once and exit.
- Separate validation schema (
ORDER_SCHEMA
) and validation function (validateObjects
) from the main script - Add unit tests
- Automate unit tests execution
- Add a total_price column with the total price of each order
- Create a proper order_item_id for order_items.csv (unique values are currently being identified by the automatic index generated while creating the CSV files)