Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Huggingface fine-tuning - how to build a custom model on top of pre-trained

Please help understand the cause of the issue below and how to build a Keras model for fine-tuning on top of the pre-trained model from the huggingface.

Objective

Create a custom model for DistilBERT fine tuning on top of TFDistilBertForSequenceClassification from Huggingface.

Input shape to the model

From the shape of the tokenizer output, I assumed it is (2, None, 256) as [input_ids, attention_mask] would go into the model.

The output of the tokenizer.

from transformers import DistilBertTokenizerFast
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')

max_sequence_length = 256
tokens = tokenizer(
    " ".join(["token"] * max_sequence_length), 
    truncation=True,
    padding=True,
    max_length=max_sequence_length,
    return_tensors="tf"
)
print(tokens)
---
{
  'input_ids':      <tf.Tensor: shape=(1, 256), dtype=int32, numpy=array([[  101, 19204, 19204, 19204, 19204, 19204, 19204, 19204, 19204, ...]], dtype=int32)>, 
  'attention_mask': <tf.Tensor: shape=(1, 256), dtype=int32, numpy=array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]], dtype=int32)>
}

Pretrained Model

model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
for layer in model.layers:
    if layer.name == "distilbert":
        layer.trainable = False
model.summary()
---
Model: "tf_distil_bert_for_sequence_classification_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
distilbert (TFDistilBertMain multiple                  66362880  
_________________________________________________________________
pre_classifier (Dense)       multiple                  590592    
_________________________________________________________________
classifier (Dense)           multiple                  1538      
_________________________________________________________________
dropout_99 (Dropout)         multiple                  0         
=================================================================
Total params: 66,955,010
Trainable params: 592,130
Non-trainable params: 66,362,880

Custom model

Added a Keras Dense layer on top of the pretrained model using Sequential.

seq = Sequential([
   model,
   Dense(
       name="output_softmax", 
       units=2, 
       activation="softmax"
   )
])
seq.compile(
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=tf.keras.optimizers.Adam()
)

Problem

The base Layer class says build method creates the weights.

build(self, input_shape): This method can be used to create weights that depend on the shape(s) of the input(s), using add_weight(). call() will automatically build the layer (if it has not been built yet) by calling build().

Run the method but got the error.

seq.build(input_shape=(2, None, max_sequence_length))
---
...
ValueError: You cannot build your model by calling `build` if your layers do not support float-type inputs. Instead, in order to instantiate and build your model, `call` your model on real tensor data (of the correct type).

As per the error message, feed the tokenizer output to the model and got another error.

seq(tokens)
---
TypeError: Failed to convert 'TFSequenceClassifierOutput(loss=None, logits=TensorShape([1, 2]), hidden_states=None, attentions=None)' to a shape: ''logits''could not be converted to a dimension. A shape should either be single dimension (e.g. 10), or an iterable of dimensions (e.g. [1, 10, None]).

Envirionment

python --version
---
Python 3.7.10

print(tf.__version__)
---
2.5.0

print(transformers.__version__)
---
4.8.2
like image 777
mon Avatar asked Oct 19 '25 09:10

mon


1 Answers

Without using Sequential or build method did you try the Keras Functional API?-(I have tried and worked with other pre-trained models) s.a:- (a pesuedo code)

def custom():
   x=Input(shape=(256,))
   y=Input(shape=(256,))
   out=distilbertlayer([x,y])
   out=Dense(2,activation='softmax')(out)
   mod=tf.keras.models.Model([x,y],out)
   return mod

custommodel=custom()

Here, in the given info. I think the error is due to wrong type of output passed to the custom Dense layer. As a suggestion, you could try passing different output from the distilbertlayer to the custom dense layer, like

out=distilbertlayer([x,y])
out=Dense(2,activation='softmax')(out[:,0])

However, first the output format of the distilbertlayer should be understood.

like image 130
Vinura Avatar answered Oct 22 '25 05:10

Vinura