Comments (4)
I agree this approach has worked well. I think the .env
workflow may need some explanation in the docs so I'll open a separate issue for that.
from cookiecutter-data-science.
Here's how we've done this so far, just as one data point:
- A script (or scripts) in
src/data/
will do all of the ETL. Right now in the template there is an example stub calledmake_dataset.py
but it could be anything - you could have multiple files for interfacing with various systems or even multiple directories if things start to get too busy insrc/
. As you mentioned, adding a folder for saved SQL queries might be a good way to keep things tidy. - That script dumps its output somewhere in the
data/
directory. - Optional: rule in the
Makefile
specifying the target (output) and dependencies (input files, such as the.py
and.sql
/.hql
files). - The project secrets (including database credentials) live in a file only on your machine called
.env
in the top level directory, which is in.gitignore
d by default to keep it out of version control. If you look at the stubmake_dataset.py
, it uses a package called python-dotenv to load up all the entries in this file as environment variables, so they are accessible withos.environ.get
or whatever is language-appropriate.
Here's an example .env
:
DATABASE_URL=postgres://username:password@localhost:5432/dbname
AWS_ACCESS_KEY=myaccesskey
AWS_SECRET_ACCESS_KEY=mysecretkey
...
Meta discussion:
- One of the core "opinions" of the project is that all code lives in
src/
and everything in thedata/
directory is just data. - Another opinion is that end users should be liberal in adding folders to suit their needs but the template should be conservative in making those choices permanent.
Thoughts?
from cookiecutter-data-science.
completely agree with that approach. I think that works quite well.
from cookiecutter-data-science.
See #18 for .env
issue - closing this issue for now but still very open to receiving comments.
from cookiecutter-data-science.
Related Issues (20)
- More documentation for newcomers HOT 1
- Dry run of ownership transfer HOT 2
- Announce v2 release HOT 1
- add documentation for running make on Windows HOT 8
- Make v1 template docs accessible in new docs
- Termynal markdown page should not be included HOT 1
- v2 release logistics checklist HOT 1
- ideas for documentation about just+pyproject.toml+mkdocs HOT 1
- Defend against broken paths from non-editable installs
- Document how to use the Python source code scaffolding
- Document conda-forge as a way to install make
- Option for Poetry support for package managent
- Update directory structure in README to reflect v2
- Add tag for v2 HOT 1
- Release package on conda-forge
- Add badges to README
- config import fails when using V2 scaffolding HOT 1
- Consolidate linting and formatting to use ruff
- Add documentation about contributing and requesting tools
- Is there an example project?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cookiecutter-data-science.