Reconcile Config class

This class encodes species information, can be used for generating impacts for things such as "harvest" or "pruning" or "fertilizer", and use it to inform another multiplier.

I think something like... a config can span an event, one we know will be relevant, so we dont need a second lookup perhaps (for configs and events). Have to think through where the inefficiencies are.

Maybe config isnt even required? How is it used right now?

We have a step where we find the relevant configs.
For example... Config stores info about a species, then can be used to generate harvest functions like the ones in simulate.py. Should events be able to take configs as arguments? So, rather than span four guate-functions, encode info in config, and span on the fly? Seems like itd be an inefficiency for runtime but efficiency for storage. Easier to store a config and a template callable than a bunch of generated functions from that template.

new README

once we get the packaging initialized, we need to write a really basic readme that updates the status of the project. branch name: feature/readme merge into develop for this.

membership start date

determine if harvest is active based on when a plot "joined" the cooperative.

in other words, can have someone with farms dating back to the 90's but joined in 2002, don't want their yields showing up as part of the total unless we explicitly toggle them to.
is_active considers time as in age of the plot, but we can encode 'join_date' as a feature, and if present, make sure that exceeds the present time.

understanding early pruning

Trying to understand the numbers being used for the early pruning simulation. Is this implying that giving the trees early pruning gives them an extra 20 years of life? I also don't see production being affected, only years of production. This very well could be my lack of exp with matlab.

If you had ten minutes, something that would be extremely helpful is a brief summary of how production numbers (e.g. proportion of full harvest; years of production) change as a result of (1) pruning and (2) intercropping.

refactor simulate.py, define console entry-point.

if we want this to be a CLI usage, then we should properly define it as a console script that can be run from anywhere

in github actions test, navigate to /tmp/ and create fake data + simulate.
can we make console scripts optional?
should it be excluded from testing? or do we refactor in a way that allows testing?

class object initialization strategy

discover:

in python is it acceptable to call member functions in the initializer to help abstract the bulk of the code away from the init function? In c++ this behavior is deprecated, however c++ is much different in regards to scope & user accessability so it may not apply here.

example:

with member function calls:

class Farmarelli:
    def __init__(self, eventRows:pd.DataFrame, initialYear:int):
        self.assignParameters()

where self.assignParameters() assigns member variables?

or should these simply be assigned directly in the __init__ function?

streamlit app

make some sort of interactive front-end
continuous deployment (heroku?)

create a decorator for `age`-derived attributes

even if we only use it once, I think that methods like this should be defined by a decorator in order to briefen the class definition:

    def years(self, current_time=datetime.datetime.today()) -> int:
        return round(self.age(current_time).days / 365.25)

    def days(self, current_time=datetime.datetime.today()) -> int:
        return self.age(current_time).days

    def mins(self, current_time=datetime.datetime.today()) -> int:
        return round(self.age(current_time).seconds / 60)

how it would work:

add @add_age_attrs as a decorator
attaches these methods if cls.age is defined. does nothing otherwise (or raises a usage error of some sort).

set up github actions

testing
placeholders for docker
placeholders for pypi

logging data

after the simulation is in good working-order, set up a function that will log data from each year iteration in the simulation (this will be helpful to come back to later to compare and contrast the effectiveness of strategies).

helpful error response when lack of data

add more context into error response that's run when fakedata hasn't been generated and/or the filepath argument doesn't exist

remove in-class data for tree attributes

assure yaml file import is functioning, then remove the data that is stored within the farm.Farm class

docker image + autobuild + push

versioning scheme
release triggers
maybe wait until the app has some natural endpoints (server, gui, etc)

[outdated] binder links point to coffeecode

Problem

Binder points to coffecode not cafelytics

Priority

Low, github seems to resolve /coffecode to cafelytics just fine (maybe we changed the name?)

provide history of older "releases" of the codebase.

maybe we keep the original matlab code anyway? I don't know, perhaps put it on a separate branch with some mention of the archival process on the README. We can actually just refer people to the last commit number before the merge of #15 and provide some backstory in a blog post.

That said I eventually do want auto-generated documentation and it would be nice if some markdown file from this repo just got incorporated into the docs, however that happens.

Part of this process should involve re-writing the README, but I'll raise another issue for that.

update binder links

these are really outdated. now that we don't depend on octave, we can use the default binder settings to just build the dependency set from requirements.txt

test cases for events

bug adding event

        start_year = datetime.datetime(2020,1, 1)
        end_year = datetime.datetime(2021, 1, 1)
        events.append(Event("catastrophic overfertilization", impact=.001, scope={"type": "species", "def": "e14"}, start=start_year, end=end_year))

update packaging

update pyscaffold version, update workflows for new packaging procedures.

try fully migrating to pyproject.toml (but ... I do like development installations)

representative dataset

now that the CLI is successfully returning plots (somewhat) resembling the original:

tweak fakeData.py (probabilities, selections, etc.)
run a script that lets you sift through data options

until

we find a demo data set that somewhat resembles the data from the actual coop

create MANIFEST for cleaner publishing

too much stuff is included in the package...

create "main" method for CLI, implement argument parsing

cafelytics/src/Worksheet.m is like a precursor to a notebook.

I want the user to interact with this as follows:

python simulate_growth.py --farm ./data/farm.xlsx --growth ./data/growth.yml --strategy ./interventions/strategy1.yml --years 60

grab narrative from deleted index.ipynb

before merging, grab the stuff from index.ipynb

Originally posted by @mathematicalmichael in #22 (comment)

use the `str` library or something similar to standardize stripping/formatting instead of trying to catch this

use the str library or something similar to standardize stripping/formatting instead of trying to catch this

Originally posted by @mathematicalmichael in #39 (comment)

Develop dataset

Based on data (and industry trends), develop an example/demo dataset to model & test the simulation. Then, construct a class of co-op/plantation and utilize the dataset to build & fill the class.

Also: what are the most common units of measurements? What units can be constructed ambiguously (I.e units/time; units/space) to extrapolate beyond coffee?

automate deployment to PyPi

GitHub Secrets (PyPi token)
GitHub Actions

simulation data returning farmers with zero cuerdas

need to adjust simulation data creation so that any farmers listed have at least one cuerda of trees

automatic dictionary var

farm class is now updated to have a dictionary passed to it with tree attribute information (as opposed to it being defined in the class itself). either: (1) something needs to be built so that the class can rely on a default dictionary for testing or (2) the class should instead be passed a filepath (however this would break the current workflow pattern)

Test speeds, prepare for inverse problems

Test how long the example simulation takes, benchmark it.

See if caching with lru_cache has any impact whatsoever.

Get a sense of what time forward simulations will require.

Figure out how to collect input parameters in a way that is collapsable to a matrix of samples.

UPDATE THE README

It still talks of this code as if it's written in Octave!

ingestion of user data

be able to take in a spreadsheet (to start), csv/tsv, parquet, something that you imagine a user can drag/drop into streamlit eventually.

right now, some settings are just grabbed from hardcoded locations in a xlsx file

that's not required. you can make settings available in a human-readable format such as yaml

assumptions about growth of trees should be decoupled from farmer information

need bugfix on develop: 'Farm' object has no attribute 'firstHarvest'

    def test_farm_instantiation_defaults():
       test_farm = cf.Farm()

I added some default arguments to be able to instantiate this without arguments, and came across an immediate error during testing:
https://github.com/mathematicalmichael/cafelytics/pull/20/checks?check_run_id=962191696


>               elif ((age >= self.firstHarvest['year']) and (age <= self.death['year'])):
E               AttributeError: 'Farm' object has no attribute 'firstHarvest'

for now I'm going to take this test out and simply "pass" everything just to simulate the basics of the testing workflow. See PR #20 for the last instance of an "X" in the test status next to each commit message before the checkmarks started

if tree type is unknown, raise `AttributeError`

if type is unknown, raise AttributeError

Originally posted by @mathematicalmichael in #20

string stripping

create some sort of global functions (or elect them from re/string class/etc). string matching is a large component of the flow of control of these simulations and there should be as many safety nets built in as possible (also see #41 )

variable naming/labeling

for best practice: while translating, all significant numbers should be assigned to variables with descriptive names and/or the lines in which they are used should be commented. Many already have this, but, for example,

if n ==18:
    data(1,13)=20;

should be something like:

pruneThreshold = 18 # description of what 'pruneThreshold' means (fake var name)
if n == pruneThreshold: # if the year is pruneThreshold, coffee plants explode
    data[1][13] =20;

plugin arch around events

set up automated testing with pytest

syntax pytest from project root directory should return a bunch of passed tests.
code coverage above 60%

units

pint (python units) to make sure we have proper agreement ("pounds/year") etc

simulate cooperative

won't necessarily be named main, simulate_cooperate(dataFrame, time, etc)

It should not be doing data loading.

set up automated linting

never done this before but the idea is that when you push/submit a PR, your linting errors are either highlighted or just automatically fixed.

set up codecoverage

codecov config file
runs in github actions

EDIT: using coveralls

dockerfile to build project as lightly as possible
test that docker build works in github actions (do the symlinking thing so that we run additional tests in docker python)
push to mindthegrow/cafelytics

mindthegrow / cafelytics Goto Github PK

cafelytics's People

Contributors

Stargazers

Forkers

cafelytics's Issues

Problem

Priority

Recommend Projects

Recommend Topics

Recommend Org