association-rosia / crop-yield-estimate Goto Github PK

Harness the power of machine learning to forecast rice and wheat crop yields per acre in India, aiming to empower smallholder farmers, combat poverty and malnutrition, utilizing data from Digital Green surveys to revolutionize agriculture and promote sustainable practices in the face of climate change for enhanced global food security.

Home Page: https://zindi.africa/competitions/digital-green-crop-yield-estimate-challenge

License: MIT License

Jupyter Notebook 99.89% Python 0.10% Shell 0.01%

catboost crop-yield-prediction lightgbm machine-learning wandb xgboost zindi-competition digital-green-crop-yield-estimate-challenge

crop-yield-estimate's People

Contributors

Stargazers

Forkers

barde-s

crop-yield-estimate's Issues

/Users/louis/Projects/00 - RosIA/crop-yield-estimate/src/features/preprocessing.py:158: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)` X.loc[:, self.CORR_AREA_COLS] = X[self.CORR_AREA_COLS].divide(X[self.config.scale], axis='index')

With config:

wandb: colsample_bylevel: 0.9454975906395352
wandb: colsample_bynode: 0.6156319765770114
wandb: colsample_bytree: 0.5433985993245196
wandb: cv: 5
wandb: delna_thr: 0.24267359296653335
wandb: estimator_name: XGBoost
wandb: eval_metric: rmse
wandb: fillna: False
wandb: gamma: 6.540732951979319
wandb: learning_rate: 0.01852552810130359
wandb: limit_h: 5000
wandb: limit_l: 500
wandb: max_depth: 49
wandb: min_child_weight: 1.9795403458702547
wandb: n_estimators: 487
wandb: random_state: 42
wandb: reg_alpha: 9.360039137053167
wandb: reg_lambda: 36.499563187767954
wandb: scale: CropCultLand
wandb: subsample: 0.8456896788704847
wandb: task: regression

KeyError: "['Acre'] not in index"

Reproductibility:

python src/models/predict_model.py --ensemble_strategy classification --run_id m72e4kdc 13q4uafc denrfqm6 62cf2e1k

error:

Traceback (most recent call last):
File "/Users/titou/Documents/RosIA/projets/crop-yield-estimate/src/models/predict_model.py", line 142, in
main()
File "/Users/titou/Documents/RosIA/projets/crop-yield-estimate/src/models/predict_model.py", line 35, in main
list_predict.append(predict(run_id))
File "/Users/titou/Documents/RosIA/projets/crop-yield-estimate/src/models/predict_model.py", line 95, in predict
X_train = preprocessor.fit_transform(X_train)
File "/opt/homebrew/Caskroom/miniconda/base/envs/crop-yield-estimate-env/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/Users/titou/Documents/RosIA/projets/crop-yield-estimate/./src/features/preprocessing.py", line 64, in fit_transform
return self.fit(X).transform(X)
File "/opt/homebrew/Caskroom/miniconda/base/envs/crop-yield-estimate-env/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/Users/titou/Documents/RosIA/projets/crop-yield-estimate/./src/features/preprocessing.py", line 56, in transform
X = self.make_consistent(X)
File "/Users/titou/Documents/RosIA/projets/crop-yield-estimate/./src/features/preprocessing.py", line 186, in make_consistent
X = X[self.out_columns]
File "/opt/homebrew/Caskroom/miniconda/base/envs/crop-yield-estimate-env/lib/python3.10/site-packages/pandas/core/frame.py", line 3902, in getitem
indexer = self.columns._get_indexer_strict(key, "columns")[1]
File "/opt/homebrew/Caskroom/miniconda/base/envs/crop-yield-estimate-env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6114, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
File "/opt/homebrew/Caskroom/miniconda/base/envs/crop-yield-estimate-env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6178, in _raise_if_missing
raise KeyError(f"{not_found} not in index")
KeyError: "['Acre'] not in index"

Preprocessing One Hot List

Aujourd'hui on split les valeurs des listes dans de nouvelles colonnes par rapport au jeu de Train.
Deux problèmes :

Si la valeur n'existe pas dans le dataset de Test, la colonne n'est pas créé.
Si une valeur n'est pas rencontré dans le dataset de Test, une nouvelle colonne est créé dans le dataset de Test qui n'existait pas avant. On

Proposition de solutions :

Ajouter toutes les colonnes du dataset de Train dans Test et les initialiser à 0. Il faut donc garder une trace des colonnes créé lors de One Hot List
Ignorer les nouvelles valeurs, donc supprimer les colonnes qui n'ont pas était créé pendant l'encoding du dataset de Test

KeyError('OrgFertilizersFYM')

Dans le preprocessing quand j'entraine un modèle sur les données à rentabilité "High" j'obtient l'erreur KeyError('OrgFertilizersFYM'). Je pense que ça vient du fait que lors du one hot encoding des listes, cette colonne n'est pas créée car elle n'est pas dans les observations. Cela relève une erreur lorsqu'on manipule pour les colonnes corrélées entre elle.

delete_outliers in preprocessor

On supprime les outliers du dataframe X mais on ne supprime pas les yields correspondants

X_train.index == y_train.index
Traceback (most recent call last):
File "", line 1, in
File "/opt/homebrew/Caskroom/miniconda/base/envs/crop-yield-estimate-env/lib/python3.10/site-packages/pandas/core/ops/common.py", line 76, in new_method
return method(self, other)
File "/opt/homebrew/Caskroom/miniconda/base/envs/crop-yield-estimate-env/lib/python3.10/site-packages/pandas/core/arraylike.py", line 40, in eq
return self._cmp_method(other, operator.eq)
File "/opt/homebrew/Caskroom/miniconda/base/envs/crop-yield-estimate-env/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 7104, in _cmp_method
raise ValueError("Lengths must match to compare")
ValueError: Lengths must match to compare

Reproductibilité preprocessor.py:

if __name__ == '__main__':
    cst = get_constants()
    config = CYEConfigPreProcessor(deloutliers=True)
    processor = CYEDataPreProcessor(config=config)

    config = CYEConfigTransformer()
    transformer = CYETargetTransformer(config=config)
    df_train = pd.read_csv(cst.file_data_train, index_col='ID')

    labels = create_labels(
        y=df_train[cst.target_column],
        acre=df_train['Acre'],
        limit_h=5000,
        limit_l=500,
    )

    df_train_l = df_train[labels == 0].copy(deep=True)
    df_train_m = df_train[labels == 1].copy(deep=True)
    df_train_h = df_train[labels == 2].copy(deep=True)

    for df_train in [df_train]:
        X_train, y_train = df_train.drop(columns=cst.target_column), df_train[cst.target_column]
        y_train = transformer.fit_transform(X_train, y_train)
        X_train = processor.fit_transform(X_train)
        # y_train = transformer.inverse_transform(y_train)
        X_train.index == y_train.index
        # Test data
        X_test = pd.read_csv(cst.file_data_test, index_col='ID')
        # y_test = transformer.fit(X_test)
        X_test = processor.transform(X_test)

association-rosia / crop-yield-estimate Goto Github PK

crop-yield-estimate's People

Contributors

Stargazers

Forkers

crop-yield-estimate's Issues

KeyError: "['Acre'] not in index"

Preprocessing One Hot List

KeyError('OrgFertilizersFYM')

delete_outliers in preprocessor

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent