Giter Club home page Giter Club logo

benchmarks's Introduction

benchmarks's People

Contributors

aclyde11 avatar adpartin avatar andrew-weisman avatar brettin avatar brettinanl avatar bvanessen avatar crstngc avatar georgezakinih avatar gounley avatar hongjuny avatar hyoo avatar j-woz avatar jmohdyusof avatar levinas avatar mshukla1 avatar ncollier avatar pbalapra avatar rajeeja avatar talathi avatar tipizen avatar yngtodd avatar zhuyitan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

benchmarks's Issues

Missing parameter causing errors in NT3 (P1B4)

While running Pilot 1 NT3 using the command python nt3_baseline_keras2.py --conf nt3_perf_bench_model.txt I ran into this error caused by a missing parameter

Traceback (most recent call last): File "nt3_baseline_keras2.py", line 290, in <module> main() File "nt3_baseline_keras2.py", line 286, in main run(gParameters) File "nt3_baseline_keras2.py", line 101, in run X_train, Y_train, X_test, Y_test = load_data(train_file, test_file, gParameters) File "nt3_baseline_keras2.py", line 70, in load_data if gParameters['add_noise']: KeyError: 'add_noise'

Issue is caused by these lines

# TODO: Add better names for noise boolean, make a featue for both RNA seq and label noise together
# check if noise is on (this is for label)
if gParameters['add_noise']:
# check if we want noise correlated with a feature
if gParameters['noise_correlated']:
Y_train, y_train_noise_gen = candle.label_flip_correlated(Y_train,
gParameters['label_noise'], X_train,
gParameters['feature_col'],
gParameters['feature_threshold'])
# else add uncorrelated noise
else:
Y_train, y_train_noise_gen = candle.label_flip(Y_train, gParameters['label_noise'])
# check if noise is on for RNA-seq data
elif gParameters['noise_gaussian']:
X_train = candle.add_gaussian_noise(X_train, 0, gParameters['std_dev'])

It seems like the candle parser being used never includes the parameters being checked here

# Initialize parameters
gParameters = candle.finalize_parameters(nt3Bmk)

So is there a different way to run this (maybe different flags) to avoid this issue. Obviously commented out lines 68-82 in nt3_baseline_keras2.py works but was not sure if parameters such as 'add_noise' will ever make it through to NT3. If not then maybe commenting out these lines permanently will save others some trouble?

uno does not use CANDLE_DATA_DIR when --use_exported_data option is used

echo $CANDLE_DATA_DIR
/homes/brettin/Singularity/workspace/data_dir
ls $CANDLE_DATA_DIR
uno_input_data.h5

ERROR MESSAGE (note that I ran uno from /homes/brettin/Singularity/workspace)

OSError: /homes/brettin/Singularity/workspace/uno_input_data.h5 does not exist

CANDLE_CONFIG

[Global_Params]
train_sources=['CCLE', 'GDSC', 'CTRP', 'ALMANAC']
#export_data='uno_input_data.h5'
use_exported_data='uno_input_data.h5'
test_sources=['train']
cell_types=None
cell_features=['rnaseq']
drug_features=['descriptors']
dense=[1000, 1000, 1000, 1000, 1000]
dense_feature_layers=[1000, 1000, 1000]
activation='relu'
loss='mse'
optimizer='adamax'
scaling='std'
dropout=.1
epochs=1
batch_size=32
val_split=0.2
cv=1
max_val_loss=1.0
learning_rate=0.0001
base_lr=None
agg_dose='AUC'
residual=False
reduce_lr=True
warmup_lr=True
batch_normalization=False
feature_subsample=0
rng_seed=2018
no_gen=False
verbose=False

preprocess_rnaseq='source_scale'
gpus=[0]
use_landmark_genes=True
no_feature_source=True
no_response_source=True
cp=True
save_path='save/uno'
output_dir='output/uno'
single=True
on_memory_loader=True

[Monitor_Params]
timeout=-1

save argument does not get overwritten by command line arg

save is defined and assigned a value in uno_default_params.txt

when --save is specified on the command line, it does not override the value in the default_params.txt file

when i hack default_utils.py and add
parser.add_argument('--save', ...
it seems to work.

Issue running drug example

When I try to run

python uno_baseline_keras2.py --config_file uno_by_drug_example.txt

I am getting the following error:

Traceback (most recent call last):
  File "uno_baseline_keras2.py", line 555, in <module>
    main()
  File "uno_baseline_keras2.py", line 551, in main
    run(params)
  File "uno_baseline_keras2.py", line 309, in run
    use_exported_data=args.use_exported_data,
  File "/home/z1835018/code/uncertainty/Benchmarks/Pilot1/Uno/uno_data.py", line 999, in load
    self.save_to_cache(cache, params)
  File "/home/z1835018/code/uncertainty/Benchmarks/Pilot1/Uno/uno_data.py", line 698, in save_to_cache
    os.mkdir(dirname)
FileNotFoundError: [Errno 2] No such file or directory: ''

Uno - use_exported_data not prepending CANDLE_DATA_DIR

[1]+ nohup singularity exec --nv ../images/uno-tensorflow:2.8.2-gpu-20220624.sif train.sh 0 $CANDLE_DATA_DIR /homes/brettin/Singularity/workspace/configs/uno_auc_model.txt &
(base) brettin@lambda7:~/Singularity/workspace/top21_uno$ tail -f nohup.out
'timeout': -1,
'train_bool': True,
'train_sources': ['CCLE', 'GDSC', 'CTRP', 'NCI60'],
'use_exported_data': 'top21_uno_v2.h5',
'use_filtered_genes': False,
'use_landmark_genes': True,
'val_split': 0.2,
'verbose': False,
'warmup_lr': True}
Params: {'train_sources': ['CCLE', 'GDSC', 'CTRP', 'NCI60'], 'use_exported_data': 'top21_uno_v2.h5', 'shuffle': True, 'test_sources': ['train'], 'cell_types': None, 'cell_features': ['rnaseq'], 'drug_features': ['descriptors'], 'dense': [1000, 1000, 1000, 1000, 1000], 'dense_feature_layers': [1000, 1000, 1000], 'activation': 'relu', 'loss': 'mse', 'optimizer': 'adamax', 'scaling': 'std', 'dropout': 0.1, 'epochs': 400, 'batch_size': 32, 'val_split': 0.2, 'cv': 1, 'max_val_loss': 1.0, 'learning_rate': 0.0001, 'base_lr': None, 'agg_dose': 'AUC', 'residual': False, 'reduce_lr': True, 'warmup_lr': True, 'batch_normalization': False, 'feature_subsample': 0, 'rng_seed': 2018, 'no_gen': False, 'verbose': False, 'preprocess_rnaseq': 'source_scale', 'gpus': [0], 'use_landmark_genes': True, 'no_feature_source': True, 'no_response_source': True, 'cp': True, 'save_path': 'save/uno', 'output_dir': '/homes/brettin/Singularity/workspace/top21_uno/output/uno/EXP000/RUN000', 'single': True, 'on_memory_loader': True, 'timeout': -1, 'train_bool': True, 'profiling': False, 'experiment_id': 'EXP000', 'run_id': 'RUN000', 'logfile': None, 'ckpt_restart_mode': 'auto', 'ckpt_checksum': False, 'ckpt_skip_epochs': 0, 'ckpt_directory': './save', 'ckpt_save_best': True, 'ckpt_save_best_metric': 'val_loss', 'ckpt_save_weights_only': False, 'ckpt_save_interval': 0, 'ckpt_keep_mode': 'linear', 'ckpt_keep_limit': 1000000, 'by_cell': None, 'by_drug': None, 'cell_subset_path': '', 'drug_subset_path': '', 'drug_median_response_min': -1, 'drug_median_response_max': 1, 'dense_cell_feature_layers': None, 'dense_drug_feature_layers': None, 'use_filtered_genes': False, 'feature_subset_path': '', 'cell_feature_subset_path': '', 'drug_feature_subset_path': '', 'es': False, 'tb': False, 'tb_prefix': 'tb', 'partition_by': None, 'cache': None, 'export_csv': None, 'export_data': None, 'growth_bins': 0, 'initial_weights': None, 'save_weights': None, 'config_file': '/homes/brettin/Singularity/workspace/configs/uno_auc_model.txt', 'data_type': <class 'numpy.float32'>}
/usr/local/Benchmarks/Pilot1/Uno/uno_baseline_keras2.py:14: DeprecationWarning: Please use pearsonr from the scipy.stats namespace, the scipy.stats.stats namespace is deprecated.
from scipy.stats.stats import pearsonr
WARNING:tensorflow:From /usr/local/Benchmarks/Pilot1/Uno/uno_baseline_keras2.py:269: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.

Traceback (most recent call last):
File "/usr/local/Benchmarks/Pilot1/Uno/uno_baseline_keras2.py", line 676, in
main()
File "/usr/local/Benchmarks/Pilot1/Uno/uno_baseline_keras2.py", line 672, in main
run(params)
File "/usr/local/Benchmarks/Pilot1/Uno/uno_baseline_keras2.py", line 272, in run
loader.load(
File "/usr/local/Benchmarks/Pilot1/Uno/uno_data.py", line 1142, in load
with pd.HDFStore(use_exported_data, "r") as store:
File "/usr/local/lib/python3.8/dist-packages/pandas/io/pytables.py", line 591, in init
self.open(mode=mode, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pandas/io/pytables.py", line 740, in open
self._handle = tables.open_file(self._path, self._mode, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tables/file.py", line 300, in open_file
return File(filename, mode, title, root_uep, filters, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tables/file.py", line 750, in init
self._g_new(filename, mode, **params)
File "tables/hdf5extension.pyx", line 368, in tables.hdf5extension.File._g_new
File "/usr/local/lib/python3.8/dist-packages/tables/utils.py", line 143, in check_file_access
raise OSError(f"{path} does not exist")
OSError: /homes/brettin/Singularity/workspace/top21_uno/top21_uno_v2.h5 does not exist

(base) brettin@lambda7:/Singularity/workspace/top21_uno$ echo $CANDLE_DATA_DIR
/homes/brettin/Singularity/workspace/data_dir
(base) brettin@lambda7:
/Singularity/workspace/top21_uno$ ls $CANDLE_DATA_DIR
Pilot1 top21_uno_v2.h5 uno_input_data.h5

Error with running Uno code

Hi,

I got the following error running the Uno code. This used to work earlier though, I believe changes in Benchmarks/common/candle/init.py resulted in this error.

Traceback (most recent call last):
  File "uno_baseline_keras2.py", line 8, in <module>
    import candle
  File "/lus/theta-fs0/projects/datascience/memani/uno-121422/Benchmarks-master/common/candle/__init__.py", line 175, in <module>
    raise Exception("No backend has been specified.")

model_name required

With the updates in candle_lib model_name is a required hyperparameter. Breaks UNO and might other Benchmarks also.

Globus download option

Investigate availability of Globus downloads for CANDLE input data. Collaborate with ExaLearn?

Settable data location

Do not force big input data to reside inside a git clone. This forces code and big data to reside on the same FS (I use a soft link to get around this), and is likely to trigger git mistakes.

create logging standard

./Pilot1/P1B3/p1b3.py:import logging
./Pilot1/P1B3/p1b3_baseline_keras2.py:import logging
./Pilot1/Uno_UQ/uno_holdoutUQ_data.py:import logging
./Pilot1/Uno_UQ/uno_inferUQ_keras2.py:import logging
./Pilot1/Uno_UQ/uno_trainUQ_keras2.py:import logging
./Pilot1/Uno_UQ/data_utils_/uno_combined_data_loader.py:import logging
./Pilot1/Uno_UQ/data_utils_/uno.py:import logging
./Pilot1/P1B2/p1b2.py:import logging
./Pilot1/P1B2/p1b2_baseline_neon.py:import logging
./Pilot1/TC1/tc1.py:import logging
./Pilot1/Combo/combo_baseline_keras2.py:import logging
./Pilot1/Combo/combo_dose.py:import logging
./Pilot1/Combo/combo.py:import logging
./Pilot1/UnoMT/utils/data_processing/label_encoding.py:import logging
./Pilot1/UnoMT/utils/data_processing/dataframe_scaling.py:import logging
./Pilot1/UnoMT/utils/data_processing/response_dataframes.py:import logging
./Pilot1/UnoMT/utils/data_processing/drug_dataframes.py:import logging
./Pilot1/UnoMT/utils/data_processing/cell_line_dataframes.py:import logging
./Pilot1/UnoMT/utils/datasets/drug_qed_dataset.py:import logging
./Pilot1/UnoMT/utils/datasets/drug_target_dataset.py:import logging
./Pilot1/UnoMT/utils/datasets/cl_class_dataset.py:import logging
./Pilot1/UnoMT/utils/datasets/drug_resp_dataset.py:import logging
./Pilot1/UnoMT/utils/miscellaneous/file_downloading.py:import logging
./Pilot1/UnoMT/networks/initialization/encoder_init.py:import logging
./Pilot1/UnoMT/unoMT.py:import logging
./Pilot1/P1B1/p1b1.py:import logging
./Pilot1/Uno/uno_mixedprecision_tfkeras.py:import logging
./Pilot1/Uno/uno_baseline_keras2.py:import logging
./Pilot1/Uno/uno_data.py:import logging
./Pilot1/Uno/uno.py:import logging
./Pilot2/P2B1/p2b1_baseline_keras2.py:import logging
./common/candle_keras/__init__.py:from keras_utils import LoggingCallback
./common/generic_utils.py:import logging
./common/default_utils.py:import logging
./common/candle/__init__.py:    from keras_utils import LoggingCallback

Download-only option

Allow user to invoke Benchmark in download-only mode, which will simply download the input data if it does not exist. This is necessary on supercomputers. This mode should not import keras or any other modules not required for data download.

P2B2 tf/keras API

Traceback (most recent call last): File "p2b1_baseline_keras2.py", line 298, in <module> main() File "p2b1_baseline_keras2.py", line 294, in main run(gParameters) File "p2b1_baseline_keras2.py", line 231, in run molecular_model.compile(optimizer=opt, loss=loss_func, metrics=['mean_squared_error', 'mean_absolute_error']) File "/nfs/gce/software/custom/linux-ubuntu18.04-x86_64/anaconda3/rolling/envs/candle-tf1/lib/python3.7/site-packages/keras/engine/training.py", line 95, in compile self.optimizer = optimizers.get(optimizer) File "/nfs/gce/software/custom/linux-ubuntu18.04-x86_64/anaconda3/rolling/envs/candle-tf1/lib/python3.7/site-packages/keras/optimizers.py", line 873, in get str(identifier)) ValueError: Could not interpret optimizer identifier: <tensorflow.python.keras.optimizer_v2.adam.Adam object at 0x7f06fa323d90>

This might be due to mixed Tensorflow keras and keras API in the code.

Display of available command line parameters

The suggestion is to allow the developer to "mask" what command line parameters are displayed when a user uses the --help option. Many of the default command line parameters are not used in a DNN.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.