Giter Club home page Giter Club logo

dana's People

Contributors

mmalekzadeh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dana's Issues

Error when training DANA model

Hi,

When I tried to train the DANA model using the notebook from the repo in Google colab; I get the following error

dana-training

I have tried to print the various shapes.

Note that in Google Colab, the Tensorflow version is now 2.6; most likely you used an older version to train. But indeed the shape of X does look a little strange.

I have not changed anything in your notebook (except for from_logits=False)

Regards & thanks
Kapil

DAT - Potential difference between the paper & the code

Hi,

I am using your 13th august copy of the paper.

The algorithm DAT from the paper is:

image

When I read this algorithm, I get the impression -

For every iteration in an epoch, we would construct few batches e.g. B=5 batches

a) Compute the loss for the given batch (in which dimension randomization has been applied)
b) Compute the gradient for this batch
c) Accumulate the gradients. Most importantly - do not apply the gradients

Once we have run the batches (i.e. 5 of them) then apply the accumulated gradients.

Now, when I look at the code, I see the following:

for epoch in range(num_epochs):  

        ## Training
        train_dataset = tf.data.Dataset.from_tensor_slices((X_train, Y_train))
        train_dataset = iter(train_dataset.shuffle(len(X_train)).batch(batch_size))
        n_iterations_per_epoch = len(X_train)//(batch_size*n_batch_per_train_setp)
        epoch_loss_avg = tf.keras.metrics.Mean()           

        for i in range(n_iterations_per_epoch):
            rnd_order_H = np.random.permutation(len(H_combinations))
            rnd_order_W = np.random.permutation(len(W_combinations))
            n_samples = 0.
            with tf.GradientTape() as tape:
                accum_loss = tf.Variable(0.)
                for j in range(n_batch_per_train_setp):
                    try:
                        X, Y = next(train_dataset)
                    except:
                        break
                    X = X.numpy()
                    sample_weight = [data_class_weights[y] for y in Y.numpy()]
                    
                    ### Dimension Randomization 
                    ####### Random Sensor Selection
                    rnd_H = H_combinations[rnd_order_H[j%len(rnd_order_H)]]                    
                    X = X[:,:,rnd_H,:] 
                    ####### Random Sampling Rate Selection  
                    rnd_W = W_combinations[rnd_order_W[j%len(rnd_order_W)]]
                    X = tf.image.resize(X, (rnd_W, len(rnd_H)))    

                    logits =  model(X)               
                    accum_loss = accum_loss + loss_fn(Y, logits, sample_weight)
                    n_samples = n_samples + 1.
            gradients = tape.gradient(accum_loss, model.trainable_weights)
            gradients = [g*(1./n_samples) for g in gradients]
            optimizer.apply_gradients(zip(gradients, model.trainable_weights))
            epoch_loss_avg.update_state(accum_loss*(1./n_samples))

If I understood the code properly, you

  • accumulate the loss for the 5 batches.
  • then compute the gradient on this accumulated loss.
  • then take the average of gradients
  • then apply the gradients

The flow in code vs the algorithm in the paper seems different but maybe the end result is the same.

Would appreciate it if you could clarify/confirm.

Regards & thanks
Kapil

Potential bug - Usage of from_logits=True on the model that has softmax activated output

Hi,

First of all, thank you for an excellent paper and also for developing a very good corresponding code base.

I tried to run your notebook in the google colab and have observed few issues

def Ordonez2016DeepOriginal(inp_shape, out_shape):   
    nb_filters = 64 
    drp_out_dns = .5 
    nb_dense = 128 
    
    inp = Input(inp_shape)

    x = Conv2D(nb_filters, kernel_size = (5,1),
              strides=(1,1), padding='valid', activation='relu')(inp)    
    x = Conv2D(nb_filters, kernel_size = (5,1),
              strides=(1,1), padding='valid', activation='relu')(x)
    x = Conv2D(nb_filters, kernel_size = (5,1), 
              strides=(1,1), padding='valid', activation='relu')(x)
    x = Conv2D(nb_filters, kernel_size = (5,1), 
              strides=(1,1), padding='valid', activation='relu')(x)    
    x = Reshape((x.shape[1],x.shape[2]*x.shape[3]))(x)
    act = LSTM(nb_dense, return_sequences=True, activation='tanh', name="lstm_1")(x)        
    act = Dropout(drp_out_dns, name= "dot_1")(act)
    act = LSTM(nb_dense, activation='tanh', name="lstm_2")(act)        
    act = Dropout(drp_out_dns, name= "dot_2")(act)
    out_act = Dense(out_shape, activation='softmax',  name="act_smx")(act)
    
    model = keras.models.Model(inputs=inp, outputs=out_act)
    return model

def standard_training(model, X_train, Y_train, X_val, Y_val, data_class_weights,
                      batch_size=128, num_epochs=128, save_dir=None):
    loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

Since your model's last layer i.e. out_act is using softmax, I think you should use

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)

As such now Tensorflow 2.6 is generating a warning to arrest this kind of situation.

I changed it and the testing accuracy on the standard model is 0.9213 and training loss is zero.... but of course, there is always an element of randomness here.

This issue of from_logits=True is also present when you use the DANA model.

However, I am not able to run the DANA model because of another problem. I am creating a separate issue for that.

Regards & thanks
Kapil

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.