Add and thoroughly test

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Minimal working example: <div class="snippet-clipboard-content notranslate positio

Thank you very much! Perfect! I think I got it now, see <a class="issue-link js-is

Enable factor or logical features for tabnet about mlr3keras HOT 6 CLOSED

mlr-org commented on June 3, 2024

Enable factor or logical features for tabnet

from mlr3keras.

Comments (6)

pfistfl commented on June 3, 2024

@JackyP have you seen any example of how to use tabnet with categorical variables?

My go-to would be

if (type == "factor") {
    tensorflow::tf$feature_column$indicator_column(
      tensorflow::tf$feature_column$categorical_column_with_vocabulary_list(id, levels[[1]][[id]])
    )
}

but for the data, neither matrix(features[, get(x)]), nor converting to integer seem to work.
Would be cool to have some example where it is used, but I can not find anything I could work with.

from mlr3keras.

JackyP commented on June 3, 2024

@pfistfl

The original tabnet paper references mapping of categorical features with trainable embeddings, and both tf-tabnet which we are using, and fast.ai tabular use embeddings to fit categorical variables.

The fast.ai's introduction is here: https://www.fast.ai/2018/04/29/categorical-embeddings/ *
The tf-tabnet example in Python is here: https://github.com/titu1994/tf-TabNet/blob/master/examples/train_embedding.py.
an r example from Rstudio is here: https://blogs.rstudio.com/tensorflow/posts/2018-11-26-embeddings-fun-and-profit/
And the original google codebase (Tensorflow 1.1x) includes this (https://github.com/google-research/google-research/blob/master/tabnet/data_helper_covertype.py)

def get_columns():
  """Get the representations for all input columns."""

  columns = []
  if float_columns:
    columns += [tf.feature_column.numeric_column(ci) for ci in float_columns]
  if int_columns:
    columns += [tf.feature_column.numeric_column(ci) for ci in int_columns]
  if str_columns:
    # pylint: disable=g-complex-comprehension
    columns += [
        tf.feature_column.embedding_column(
            tf.feature_column.categorical_column_with_hash_bucket(
                ci, hash_bucket_size=int(3 * num)),
            dimension=1) for ci, num in zip(str_columns, str_nuniques)
    ]
  if bool_columns:
    # pylint: disable=g-complex-comprehension
    columns += [
        tf.feature_column.embedding_column(
            tf.feature_column.categorical_column_with_hash_bucket(
                ci, hash_bucket_size=3),
            dimension=1) for ci in bool_columns
    ]
  return columns

Your make_tf_feature_cols is very similar to the Google-original-code example, would str_columns from it work?

EDIT: No it does not...

from mlr3keras.

pfistfl commented on June 3, 2024

So I understood that TabNet basically expects a list feature_columns.
This seems to work as long as all feature_columns are numeric.
I then thought that an indicator_column (or embedding_column) would work for categorical variables.
Doing this, I can build the tabnet learner, but fitting fails.

Possible reasons:

The 'x' data is in the wrong format
TabNet does not work as I thought it would

Additional observation: tf.feature_column.numeric_column seems to have a shape, while tf.feature_column.embedding_column does not?

from mlr3keras.

JackyP commented on June 3, 2024

So my understanding of tensorflow falls a little short so this might be grasping at straws but:

What type is indicator_column vs numeric_column?
It looks like embedding_column example in py-tabnet requires wrapping the whole thing so to train the embedding...

Otherwise I'm also stuck.

from mlr3keras.

JackyP commented on June 3, 2024

Minimal working example:


  library("reticulate")
  library("tensorflow")
  library("keras")
  # keras::install_keras(extra_packages = c("tensorflow-hub", "tabnet==0.1.4.1"))
  
  use_implementation("tensorflow")
  
  tabnet <- import("tabnet") #0.1.4.1
  
  col_names = c('Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Species')
  
  feature_columns <- lapply(col_names, function(x) { 
    if(x == "Species") {
      tf$feature_column$indicator_column(
        tf$feature_column$categorical_column_with_vocabulary_list(
          x, levels(iris$Species))
        ) 
    } else {
      tf$feature_column$numeric_column(x)  
    }
  })
  
  # The trick is that the one hot categorical counts as multiple columns for num_features
  model = tabnet$TabNetRegressor(feature_columns, num_regressors=1, num_features = 3 + length(levels(iris$Species)),
                                 feature_dim=4, output_dim=4,
                                 num_decision_steps=2, relaxation_factor=1.0,
                                 sparsity_coefficient=1e-5, batch_momentum=0.98,
                                 virtual_batch_size=NULL, norm_type='group',
                                 num_groups=1)
  model %>% compile(
    loss='mean_squared_error',
    optimizer=optimizer_adam()
  )

  x <- lapply(col_names, function(x) { as.matrix(iris[x])})
  names(x) <- col_names
  
  y <- model.matrix(~ 0 + iris$Petal.Width)
  
  model %>%
    fit(x, y, epochs=100, verbose=2)

from mlr3keras.

pfistfl commented on June 3, 2024

Thank you very much!
Perfect! I think I got it now, see #16

from mlr3keras.

Enable factor or logical features for tabnet about mlr3keras HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent