This project template is designed for data science and analytics workflows using Jupyter Notebooks. It provides a structured and standardized way to organize code, data, and outputs for efficient and reproducible research.
The template is organized into the following directories:
data/
: Contains raw and processed data.raw/
: Stores the original, unaltered data.processed/
: Holds data that has been cleaned, transformed, or otherwise processed.
notebooks/
: Contains Jupyter notebooks (.ipynb files) used for analysis and data processing.scripts/
: For standalone Python scripts, often used for more complex or reusable code.utils/
: Includes utility functions and helper scripts.outputs/
: Stores the results and products of analyses.figures/
: For plots, charts, and other visualizations.data/
: Final or exported data sets, ready for sharing or publishing.logs/
: Log files for tracking and debugging.models/
: Trained machine learning model files.summaries/
: Textual output such as reports and summaries.
env/
: Virtual environment directory (not tracked by version control).
-
Set Up Environment:
- Create a virtual environment:
python -m venv env
- Activate the environment:
- Windows:
.\env\Scripts\activate
- Unix/macOS:
source env/bin/activate
- Windows:
- Install required packages:
pip install -r requirements.txt
- Create a virtual environment:
-
Working with Notebooks:
- Jupyter notebooks are located in the
notebooks/
directory. - Start JupyterLab with
jupyter lab
and open notebooks from the interface.
- Jupyter notebooks are located in the
-
Using the Data Directory:
- Place your raw data in
data/raw/
. - Save processed data in
data/processed/
.
- Place your raw data in
-
Scripts and Utilities:
- Store reusable scripts in
scripts/
. - Place utility functions in
utils/
.
- Store reusable scripts in
-
Saving Outputs:
- Save figures and plots in
outputs/figures/
. - Export final data sets to
outputs/data/
.
- Save figures and plots in
-
Logging:
- Generate and store log files in
outputs/logs/
.
- Generate and store log files in
- Keep raw data immutable to maintain data integrity.
- Document each step in your Jupyter notebooks for clarity and reproducibility.
- Write modular and reusable code in scripts and utility functions.
- Regularly commit changes to version control.
This template provides a foundational structure to kickstart your notebook-based projects, ensuring that your work remains organized and adheres to best practices in data science.