Hi Jinsung, Thanks for developing this method! It's very cool and us

How to deal with categorical variables in this method about gain HOT 8 CLOSED

jsyoon0823 commented on August 17, 2024 3

How to deal with categorical variables in this method

from gain.

Comments (8)

richardwu commented on August 17, 2024

I've also tried the same thing with categorical variables and the model seems to suffer from mode collapse (within a categorical variable). Seeking some guidance on this.

from gain.

redfungus commented on August 17, 2024

HI!

Thank you for providing the code to your paper.
I've realized that in the implementation, the loss function used for the reconstruction loss only considers continuous variables in contrast with the paper that considers both cases.
Do you happen to have the code for the categorical variables too?

from gain.

HaoXiao2018 commented on August 17, 2024

Don't have the code but these are the tips from the author.

On top of one-hot encoding, to make categorical variable work:
(1) Activation function: use softmax activation for each spanned categorical vector (instead of sigmoid activation)
(2) Change the MSE loss to the cross-entropy loss for the categorical variable.

from gain.

jsyoon0823 commented on August 17, 2024

Thanks, HaoXiao2018.
This is exactly what I did for the categorical variables.
As can be seen in the paper, I used cross-entropy for categorical variable and mse for continuous variable.
But more important thing is that we need to use separate output layer (activation function with softmax).

from gain.

redfungus commented on August 17, 2024

So, is it possible to have both categorical and continuous features in a dataset?

from gain.

jsyoon0823 commented on August 17, 2024

Yes. It would be better if you modify the codes based on the comments from HaoXiao2018 above and apply GAIN for your mixed-type data.

from gain.

samyakag commented on August 17, 2024

If I have categorical data, does that mean I also have to change the dimensions of all the network layers, etc as well?

from gain.

jsyoon0823 commented on August 17, 2024

Network parameters should be updated for different datasets.
Even for the datasets which only have continuous variables, those hyper-parameters should be optimized to maximize the performance.

from gain.

Recommend Projects

How to deal with categorical variables in this method about gain HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent