Giter Club home page Giter Club logo

association-rosia / crop-yield-estimate Goto Github PK

View Code? Open in Web Editor NEW
9.0 0.0 1.0 22.5 MB

Harness the power of machine learning to forecast rice and wheat crop yields per acre in India, aiming to empower smallholder farmers, combat poverty and malnutrition, utilizing data from Digital Green surveys to revolutionize agriculture and promote sustainable practices in the face of climate change for enhanced global food security.

Home Page: https://zindi.africa/competitions/digital-green-crop-yield-estimate-challenge

License: MIT License

Jupyter Notebook 99.89% Python 0.10% Shell 0.01%
catboost crop-yield-prediction lightgbm machine-learning wandb xgboost zindi-competition digital-green-crop-yield-estimate-challenge

crop-yield-estimate's People

Contributors

baptisteurgell avatar louisreberga avatar

Stargazers

 avatar  avatar  avatar

Forkers

barde-s

crop-yield-estimate's Issues

/Users/louis/Projects/00 - RosIA/crop-yield-estimate/src/features/preprocessing.py:158: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)` X.loc[:, self.CORR_AREA_COLS] = X[self.CORR_AREA_COLS].divide(X[self.config.scale], axis='index')

With config:

wandb: colsample_bylevel: 0.9454975906395352
wandb: colsample_bynode: 0.6156319765770114
wandb: colsample_bytree: 0.5433985993245196
wandb: cv: 5
wandb: delna_thr: 0.24267359296653335
wandb: estimator_name: XGBoost
wandb: eval_metric: rmse
wandb: fillna: False
wandb: gamma: 6.540732951979319
wandb: learning_rate: 0.01852552810130359
wandb: limit_h: 5000
wandb: limit_l: 500
wandb: max_depth: 49
wandb: min_child_weight: 1.9795403458702547
wandb: n_estimators: 487
wandb: random_state: 42
wandb: reg_alpha: 9.360039137053167
wandb: reg_lambda: 36.499563187767954
wandb: scale: CropCultLand
wandb: subsample: 0.8456896788704847
wandb: task: regression

KeyError: "['Acre'] not in index"

Reproductibility:

python src/models/predict_model.py --ensemble_strategy classification --run_id m72e4kdc 13q4uafc denrfqm6 62cf2e1k

error:

Traceback (most recent call last):
File "/Users/titou/Documents/RosIA/projets/crop-yield-estimate/src/models/predict_model.py", line 142, in
main()
File "/Users/titou/Documents/RosIA/projets/crop-yield-estimate/src/models/predict_model.py", line 35, in main
list_predict.append(predict(run_id))
File "/Users/titou/Documents/RosIA/projets/crop-yield-estimate/src/models/predict_model.py", line 95, in predict
X_train = preprocessor.fit_transform(X_train)
File "/opt/homebrew/Caskroom/miniconda/base/envs/crop-yield-estimate-env/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/Users/titou/Documents/RosIA/projets/crop-yield-estimate/./src/features/preprocessing.py", line 64, in fit_transform
return self.fit(X).transform(X)
File "/opt/homebrew/Caskroom/miniconda/base/envs/crop-yield-estimate-env/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/Users/titou/Documents/RosIA/projets/crop-yield-estimate/./src/features/preprocessing.py", line 56, in transform
X = self.make_consistent(X)
File "/Users/titou/Documents/RosIA/projets/crop-yield-estimate/./src/features/preprocessing.py", line 186, in make_consistent
X = X[self.out_columns]
File "/opt/homebrew/Caskroom/miniconda/base/envs/crop-yield-estimate-env/lib/python3.10/site-packages/pandas/core/frame.py", line 3902, in getitem
indexer = self.columns._get_indexer_strict(key, "columns")[1]
File "/opt/homebrew/Caskroom/miniconda/base/envs/crop-yield-estimate-env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6114, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
File "/opt/homebrew/Caskroom/miniconda/base/envs/crop-yield-estimate-env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6178, in _raise_if_missing
raise KeyError(f"{not_found} not in index")
KeyError: "['Acre'] not in index"

Preprocessing One Hot List

Aujourd'hui on split les valeurs des listes dans de nouvelles colonnes par rapport au jeu de Train.
Deux problèmes :

  • Si la valeur n'existe pas dans le dataset de Test, la colonne n'est pas créé.
  • Si une valeur n'est pas rencontré dans le dataset de Test, une nouvelle colonne est créé dans le dataset de Test qui n'existait pas avant. On

Proposition de solutions :

  • Ajouter toutes les colonnes du dataset de Train dans Test et les initialiser à 0. Il faut donc garder une trace des colonnes créé lors de One Hot List
  • Ignorer les nouvelles valeurs, donc supprimer les colonnes qui n'ont pas était créé pendant l'encoding du dataset de Test

KeyError('OrgFertilizersFYM')

Dans le preprocessing quand j'entraine un modèle sur les données à rentabilité "High" j'obtient l'erreur KeyError('OrgFertilizersFYM'). Je pense que ça vient du fait que lors du one hot encoding des listes, cette colonne n'est pas créée car elle n'est pas dans les observations. Cela relève une erreur lorsqu'on manipule pour les colonnes corrélées entre elle.

delete_outliers in preprocessor

On supprime les outliers du dataframe X mais on ne supprime pas les yields correspondants

X_train.index == y_train.index
Traceback (most recent call last):
File "", line 1, in
File "/opt/homebrew/Caskroom/miniconda/base/envs/crop-yield-estimate-env/lib/python3.10/site-packages/pandas/core/ops/common.py", line 76, in new_method
return method(self, other)
File "/opt/homebrew/Caskroom/miniconda/base/envs/crop-yield-estimate-env/lib/python3.10/site-packages/pandas/core/arraylike.py", line 40, in eq
return self._cmp_method(other, operator.eq)
File "/opt/homebrew/Caskroom/miniconda/base/envs/crop-yield-estimate-env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 7104, in _cmp_method
raise ValueError("Lengths must match to compare")
ValueError: Lengths must match to compare

Reproductibilité preprocessor.py:

if __name__ == '__main__':
    cst = get_constants()
    config = CYEConfigPreProcessor(deloutliers=True)
    processor = CYEDataPreProcessor(config=config)

    config = CYEConfigTransformer()
    transformer = CYETargetTransformer(config=config)
    df_train = pd.read_csv(cst.file_data_train, index_col='ID')

    labels = create_labels(
        y=df_train[cst.target_column],
        acre=df_train['Acre'],
        limit_h=5000,
        limit_l=500,
    )

    df_train_l = df_train[labels == 0].copy(deep=True)
    df_train_m = df_train[labels == 1].copy(deep=True)
    df_train_h = df_train[labels == 2].copy(deep=True)

    for df_train in [df_train]:
        X_train, y_train = df_train.drop(columns=cst.target_column), df_train[cst.target_column]
        y_train = transformer.fit_transform(X_train, y_train)
        X_train = processor.fit_transform(X_train)
        # y_train = transformer.inverse_transform(y_train)
        X_train.index == y_train.index
        # Test data
        X_test = pd.read_csv(cst.file_data_test, index_col='ID')
        # y_test = transformer.fit(X_test)
        X_test = processor.transform(X_test)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.