Hi, this issue is related to ALBERT and especially the V2 models, specifically the <co

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

[ALBERT] albert-xlarge V2 seems to have a different behavior than the other models about albert HOT 5 OPEN

LysandreJik commented on May 20, 2024 8

[ALBERT] albert-xlarge V2 seems to have a different behavior than the other models

from albert.

Comments (5)

LysandreJik commented on May 20, 2024 1

Hi @insop, to add the module scope, I added the following line at line 194 of modeling.py:

with tf.variable_scope("module"):

Which results in the __init__ method of AlbertModel beginning with these few lines:

[...]
    config = copy.deepcopy(config)
    if not is_training:
      config.hidden_dropout_prob = 0.0
      config.attention_probs_dropout_prob = 0.0

    input_shape = get_shape_list(input_ids, expected_rank=2)
    batch_size = input_shape[0]
    seq_length = input_shape[1]

    if input_mask is None:
      input_mask = tf.ones(shape=[batch_size, seq_length], dtype=tf.int32)

    if token_type_ids is None:
      token_type_ids = tf.zeros(shape=[batch_size, seq_length], dtype=tf.int32)

    with tf.variable_scope("module"):
      with tf.variable_scope(scope, default_name="bert"):
        with tf.variable_scope("embeddings"):
          # Perform embedding lookup on the word ids.
          (self.word_embedding_output,
[...]

from albert.

insop commented on May 20, 2024

(a line has to be added to the scope so that the modeling scope begins with module, same as the HUB module).

Hi @LysandreJik

I have tried to add module scope, but I don't seem to get it working with a line.
Below is how I get the module scope done and get your compare_albert.py working, doesn't look good but it works.

Could you tell how you get module scope done?

Thank you,

$ diff -uN a.py b.py
--- a.py        2019-11-25 01:07:50.000000000 -0800
+++ b.py        2019-11-25 01:08:23.000000000 -0800
@@ -1,4 +1,4 @@
-def get_assignment_map_from_checkpoint(tvars, init_checkpoint, num_of_group=0):
+def get_assignment_map_from_checkpoint(tvars, init_checkpoint, num_of_group=0, add_scope='module'):
   """Compute the union of the current variables and checkpoint variables."""
   assignment_map = {}
   initialized_variable_names = {}
@@ -8,8 +8,15 @@
     name = var.name
     m = re.match("^(.*):\\d+$", name)
     if m is not None:
-      name = m.group(1)
-    name_to_variable[name] = var
+      # add 'module' scope name to match tf hub module
+      if add_scope is not None:
+          name = 'module/' + m.group(1)
+      else:
+          name = m.group(1)
+      # NOTE: store name as value for scope matching
+      # since 'var' value was not used
+      name_to_variable[name] = m.group(1)
+  
   init_vars = tf.train.list_variables(init_checkpoint)
   init_vars_name = [name for (name, _) in init_vars]
 
@@ -20,7 +27,7 @@
   else:
     assignment_map = collections.OrderedDict()
 
-  for name in name_to_variable:
+  for name, old_name in name_to_variable.items():
     if name in init_vars_name:
       tvar_name = name
     elif (re.sub(r"/group_\d+/", "/group_0/",
@@ -50,7 +57,12 @@
       if not group_matched:
         assignment_map[0][tvar_name] = name
     else:
-      assignment_map[tvar_name] = name
+      if add_scope is not None:
+        # add 'module' scope name to match tf hub module
+        # <'module/'+ xxx, xxx>
+        assignment_map[tvar_name] = old_name
+      else:
+        assignment_map[tvar_name] = name
     initialized_variable_names[name] = 1
     initialized_variable_names[six.ensure_str(name) + ":0"] = 1

from albert.

insop commented on May 20, 2024

Hi @LysandreJik
Thanks a lot, it works like a charm!

from albert.

LysandreJik commented on May 20, 2024

Great to hear, please let me know if you manage to convert the v2 models/reproduce the results!

from albert.

insop commented on May 20, 2024

Hi @LysandreJik

I have ran your script (compare_albert.py with different input_string, see below) for v2 models.
My run large model shows more difference, not as large as your data for xlarge.
For xlarge, difference seems okay.

~~I have a question to run squad, but I will post in other open issue link that I saw you were there.~~

I thought I saw you on other post, but I was mistaken.
Were you able to run run_squad_sp.py?
(in order to prevent being digressed, I could find other way to communicate in case you were able to run run_squad_sp.py without any issue).

Thank you,


$ python -c 'import tensorflow as tf; print(tf.__version__)'
1.15.0

// one change I've made is this
# Create inputs
#input_sentence = "this is nice".lower()
input_sentence = "The most difficult thing is the decision to act, the rest is merely tenacity. The fears are paper tigers. You can do anything you decide to do. You can act to change and control your life; and the procedure, the process is its own reward.".lower()


model: base

Comparing the HUB and TF1 layers
-- pooled            1.5154481e-05
-- full transformer  3.1471252e-05


model: large

Comparing the HUB and TF1 layers
-- pooled            0.014360733
-- full transformer  0.014184952


model: xlarge

Comparing the HUB and TF1 layers
-- pooled            1.6540289e-06
-- full transformer  4.9889088e-05

model: xxlarge

Comparing the HUB and TF1 layers
-- pooled            2.5779009e-05
-- full transformer  1.8566847e-05

from albert.

[ALBERT] albert-xlarge V2 seems to have a different behavior than the other models about albert HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent