Giter Club home page Giter Club logo

cafelytics's People

Contributors

djdebonis avatar mathematicalmichael avatar mpilosov avatar

Stargazers

 avatar  avatar  avatar  avatar

Forkers

djdebonis

cafelytics's Issues

Reconcile Config class

This class encodes species information, can be used for generating impacts for things such as "harvest" or "pruning" or "fertilizer", and use it to inform another multiplier.

I think something like... a config can span an event, one we know will be relevant, so we dont need a second lookup perhaps (for configs and events). Have to think through where the inefficiencies are.

Maybe config isnt even required? How is it used right now?

We have a step where we find the relevant configs.
For example... Config stores info about a species, then can be used to generate harvest functions like the ones in simulate.py. Should events be able to take configs as arguments? So, rather than span four guate-functions, encode info in config, and span on the fly? Seems like itd be an inefficiency for runtime but efficiency for storage. Easier to store a config and a template callable than a bunch of generated functions from that template.

new README

once we get the packaging initialized, we need to write a really basic readme that updates the status of the project. branch name: feature/readme merge into develop for this.

membership start date

determine if harvest is active based on when a plot "joined" the cooperative.

in other words, can have someone with farms dating back to the 90's but joined in 2002, don't want their yields showing up as part of the total unless we explicitly toggle them to.
is_active considers time as in age of the plot, but we can encode 'join_date' as a feature, and if present, make sure that exceeds the present time.

understanding early pruning

Trying to understand the numbers being used for the early pruning simulation. Is this implying that giving the trees early pruning gives them an extra 20 years of life? I also don't see production being affected, only years of production. This very well could be my lack of exp with matlab.

If you had ten minutes, something that would be extremely helpful is a brief summary of how production numbers (e.g. proportion of full harvest; years of production) change as a result of (1) pruning and (2) intercropping.

refactor simulate.py, define console entry-point.

if we want this to be a CLI usage, then we should properly define it as a console script that can be run from anywhere

  • in github actions test, navigate to /tmp/ and create fake data + simulate.
  • can we make console scripts optional?
  • should it be excluded from testing? or do we refactor in a way that allows testing?

class object initialization strategy

discover:

in python is it acceptable to call member functions in the initializer to help abstract the bulk of the code away from the init function? In c++ this behavior is deprecated, however c++ is much different in regards to scope & user accessability so it may not apply here.

example:

with member function calls:

class Farmarelli:
    def __init__(self, eventRows:pd.DataFrame, initialYear:int):
        self.assignParameters()

where self.assignParameters() assigns member variables?

or should these simply be assigned directly in the __init__ function?

streamlit app

  • make some sort of interactive front-end

  • continuous deployment (heroku?)

create a decorator for `age`-derived attributes

even if we only use it once, I think that methods like this should be defined by a decorator in order to briefen the class definition:

    def years(self, current_time=datetime.datetime.today()) -> int:
        return round(self.age(current_time).days / 365.25)

    def days(self, current_time=datetime.datetime.today()) -> int:
        return self.age(current_time).days

    def mins(self, current_time=datetime.datetime.today()) -> int:
        return round(self.age(current_time).seconds / 60)

how it would work:

  • add @add_age_attrs as a decorator
  • attaches these methods if cls.age is defined. does nothing otherwise (or raises a usage error of some sort).

logging data

after the simulation is in good working-order, set up a function that will log data from each year iteration in the simulation (this will be helpful to come back to later to compare and contrast the effectiveness of strategies).

provide history of older "releases" of the codebase.

maybe we keep the original matlab code anyway? I don't know, perhaps put it on a separate branch with some mention of the archival process on the README. We can actually just refer people to the last commit number before the merge of #15 and provide some backstory in a blog post.

That said I eventually do want auto-generated documentation and it would be nice if some markdown file from this repo just got incorporated into the docs, however that happens.

Part of this process should involve re-writing the README, but I'll raise another issue for that.

update binder links

these are really outdated. now that we don't depend on octave, we can use the default binder settings to just build the dependency set from requirements.txt

test cases for events

  • test we can simulate pruning (and what comes with it, such as:)

    • expansion of lifeline of plants (?)
    • decreased yield in years immediately following pruning event
    • increase in yield in subsequent years
  • test we can simulate intercropping of the same crop

    • increase in plot yield as new plants reach maturity
    • decrease in lifespan of pants (David has some data and a semi-functional regression of this relationship with coffee trees)
  • test we can simulate intercropping of a different crop (this totally changes the dynamic, though)

  • test we can simulate a wildfire event which wipes out a proportion of crops and requires replanting the following year

    • this brings into question the idea of whether the software will carry with it informational messages that direct farmers: if their farm burns down, there is likely some work that needs to be done for the restoration of the soil before replanting, and/or the soil quality could impact future yields. Will simulating these events warrant these types of messages/adjustments?
  • test we can simulate the impacts of pests on plot health (i.e. proportion of plants that stay alive, lifespan of plants) and crop yield

  • test we can simulate drought and how it impacts plot health and crop yield

  • test we can simulate a large storm and how it impacts plot health and crop yield

  • test we can simulate adding/removing an irrigation system and how it impacts plot health and crop yield

bug adding event

        start_year = datetime.datetime(2020,1, 1)
        end_year = datetime.datetime(2021, 1, 1)
        events.append(Event("catastrophic overfertilization", impact=.001, scope={"type": "species", "def": "e14"}, start=start_year, end=end_year))

update packaging

update pyscaffold version, update workflows for new packaging procedures.

try fully migrating to pyproject.toml (but ... I do like development installations)

representative dataset

now that the CLI is successfully returning plots (somewhat) resembling the original:

  • tweak fakeData.py (probabilities, selections, etc.)
  • run a script that lets you sift through data options

until

  • we find a demo data set that somewhat resembles the data from the actual coop

create "main" method for CLI, implement argument parsing

cafelytics/src/Worksheet.m is like a precursor to a notebook.

I want the user to interact with this as follows:

python simulate_growth.py --farm ./data/farm.xlsx --growth ./data/growth.yml --strategy ./interventions/strategy1.yml --years 60

Develop dataset

Based on data (and industry trends), develop an example/demo dataset to model & test the simulation. Then, construct a class of co-op/plantation and utilize the dataset to build & fill the class.

Also: what are the most common units of measurements? What units can be constructed ambiguously (I.e units/time; units/space) to extrapolate beyond coffee?

automatic dictionary var

farm class is now updated to have a dictionary passed to it with tree attribute information (as opposed to it being defined in the class itself). either: (1) something needs to be built so that the class can rely on a default dictionary for testing or (2) the class should instead be passed a filepath (however this would break the current workflow pattern)

Test speeds, prepare for inverse problems

Test how long the example simulation takes, benchmark it.

See if caching with lru_cache has any impact whatsoever.

Get a sense of what time forward simulations will require.

Figure out how to collect input parameters in a way that is collapsable to a matrix of samples.

ingestion of user data

be able to take in a spreadsheet (to start), csv/tsv, parquet, something that you imagine a user can drag/drop into streamlit eventually.

right now, some settings are just grabbed from hardcoded locations in a xlsx file

that's not required. you can make settings available in a human-readable format such as yaml

assumptions about growth of trees should be decoupled from farmer information

need bugfix on develop: 'Farm' object has no attribute 'firstHarvest'

    def test_farm_instantiation_defaults():
       test_farm = cf.Farm()

I added some default arguments to be able to instantiate this without arguments, and came across an immediate error during testing:
https://github.com/mathematicalmichael/cafelytics/pull/20/checks?check_run_id=962191696


>               elif ((age >= self.firstHarvest['year']) and (age <= self.death['year'])):
E               AttributeError: 'Farm' object has no attribute 'firstHarvest'

for now I'm going to take this test out and simply "pass" everything just to simulate the basics of the testing workflow. See PR #20 for the last instance of an "X" in the test status next to each commit message before the checkmarks started

string stripping

create some sort of global functions (or elect them from re/string class/etc). string matching is a large component of the flow of control of these simulations and there should be as many safety nets built in as possible (also see #41 )

variable naming/labeling

for best practice: while translating, all significant numbers should be assigned to variables with descriptive names and/or the lines in which they are used should be commented. Many already have this, but, for example,

if n ==18:
    data(1,13)=20;

should be something like:

pruneThreshold = 18 # description of what 'pruneThreshold' means (fake var name)
if n == pruneThreshold: # if the year is pruneThreshold, coffee plants explode
    data[1][13] =20;

units

pint (python units) to make sure we have proper agreement ("pounds/year") etc

simulate cooperative

won't necessarily be named main, simulate_cooperate(dataFrame, time, etc)

It should not be doing data loading.

set up automated linting

never done this before but the idea is that when you push/submit a PR, your linting errors are either highlighted or just automatically fixed.

set up dockerized application

  • dockerfile to build project as lightly as possible

  • test that docker build works in github actions (do the symlinking thing so that we run additional tests in docker python)

  • push to mindthegrow/cafelytics

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.