mmalekzadeh / dana Goto Github PK
View Code? Open in Web Editor NEWDANA: Dimension-Adaptive Neural Architecture (UbiComp'21)( ACM IMWUT)
Home Page: https://arxiv.org/abs/2008.02397
License: MIT License
DANA: Dimension-Adaptive Neural Architecture (UbiComp'21)( ACM IMWUT)
Home Page: https://arxiv.org/abs/2008.02397
License: MIT License
Hi,
When I tried to train the DANA model using the notebook from the repo in Google colab; I get the following error
I have tried to print the various shapes.
Note that in Google Colab, the Tensorflow version is now 2.6; most likely you used an older version to train. But indeed the shape of X does look a little strange.
I have not changed anything in your notebook (except for from_logits=False)
Regards & thanks
Kapil
Hi,
I am using your 13th august copy of the paper.
The algorithm DAT from the paper is:
When I read this algorithm, I get the impression -
For every iteration in an epoch, we would construct few batches e.g. B=5 batches
a) Compute the loss for the given batch (in which dimension randomization has been applied)
b) Compute the gradient for this batch
c) Accumulate the gradients. Most importantly - do not apply the gradients
Once we have run the batches (i.e. 5 of them) then apply the accumulated gradients.
Now, when I look at the code, I see the following:
for epoch in range(num_epochs):
## Training
train_dataset = tf.data.Dataset.from_tensor_slices((X_train, Y_train))
train_dataset = iter(train_dataset.shuffle(len(X_train)).batch(batch_size))
n_iterations_per_epoch = len(X_train)//(batch_size*n_batch_per_train_setp)
epoch_loss_avg = tf.keras.metrics.Mean()
for i in range(n_iterations_per_epoch):
rnd_order_H = np.random.permutation(len(H_combinations))
rnd_order_W = np.random.permutation(len(W_combinations))
n_samples = 0.
with tf.GradientTape() as tape:
accum_loss = tf.Variable(0.)
for j in range(n_batch_per_train_setp):
try:
X, Y = next(train_dataset)
except:
break
X = X.numpy()
sample_weight = [data_class_weights[y] for y in Y.numpy()]
### Dimension Randomization
####### Random Sensor Selection
rnd_H = H_combinations[rnd_order_H[j%len(rnd_order_H)]]
X = X[:,:,rnd_H,:]
####### Random Sampling Rate Selection
rnd_W = W_combinations[rnd_order_W[j%len(rnd_order_W)]]
X = tf.image.resize(X, (rnd_W, len(rnd_H)))
logits = model(X)
accum_loss = accum_loss + loss_fn(Y, logits, sample_weight)
n_samples = n_samples + 1.
gradients = tape.gradient(accum_loss, model.trainable_weights)
gradients = [g*(1./n_samples) for g in gradients]
optimizer.apply_gradients(zip(gradients, model.trainable_weights))
epoch_loss_avg.update_state(accum_loss*(1./n_samples))
If I understood the code properly, you
The flow in code vs the algorithm in the paper seems different but maybe the end result is the same.
Would appreciate it if you could clarify/confirm.
Regards & thanks
Kapil
Hi,
First of all, thank you for an excellent paper and also for developing a very good corresponding code base.
I tried to run your notebook in the google colab and have observed few issues
def Ordonez2016DeepOriginal(inp_shape, out_shape):
nb_filters = 64
drp_out_dns = .5
nb_dense = 128
inp = Input(inp_shape)
x = Conv2D(nb_filters, kernel_size = (5,1),
strides=(1,1), padding='valid', activation='relu')(inp)
x = Conv2D(nb_filters, kernel_size = (5,1),
strides=(1,1), padding='valid', activation='relu')(x)
x = Conv2D(nb_filters, kernel_size = (5,1),
strides=(1,1), padding='valid', activation='relu')(x)
x = Conv2D(nb_filters, kernel_size = (5,1),
strides=(1,1), padding='valid', activation='relu')(x)
x = Reshape((x.shape[1],x.shape[2]*x.shape[3]))(x)
act = LSTM(nb_dense, return_sequences=True, activation='tanh', name="lstm_1")(x)
act = Dropout(drp_out_dns, name= "dot_1")(act)
act = LSTM(nb_dense, activation='tanh', name="lstm_2")(act)
act = Dropout(drp_out_dns, name= "dot_2")(act)
out_act = Dense(out_shape, activation='softmax', name="act_smx")(act)
model = keras.models.Model(inputs=inp, outputs=out_act)
return model
def standard_training(model, X_train, Y_train, X_val, Y_val, data_class_weights,
batch_size=128, num_epochs=128, save_dir=None):
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
Since your model's last layer i.e. out_act
is using softmax, I think you should use
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
As such now Tensorflow 2.6 is generating a warning to arrest this kind of situation.
I changed it and the testing accuracy on the standard model is 0.9213 and training loss is zero.... but of course, there is always an element of randomness here.
This issue of from_logits=True
is also present when you use the DANA model.
However, I am not able to run the DANA model because of another problem. I am creating a separate issue for that.
Regards & thanks
Kapil
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.