I am using sentence-transformers for semantic search but sometimes it does not understand the contextual meaning and returns wrong result eg. BERT problem with context/semantic search in italian language
by default the vector side of embedding of the sentence is 78 columns, so how do I increase that dimension so that it can understand the contextual meaning in deep.
code:
# Load the BERT Model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('bert-base-nli-mean-tokens')
# Setup a Corpus
# A corpus is a list with documents split by sentences.
sentences = ['Absence of sanity',
'Lack of saneness',
'A man is eating food.',
'A man is eating a piece of bread.',
'The girl is carrying a baby.',
'A man is riding a horse.',
'A woman is playing violin.',
'Two men pushed carts through the woods.',
'A man is riding a white horse on an enclosed ground.',
'A monkey is playing drums.',
'A cheetah is running behind its prey.']
# Each sentence is encoded as a 1-D vector with 78 columns
sentence_embeddings = model.encode(sentences) ### how to increase vector dimention
print('Sample BERT embedding vector - length', len(sentence_embeddings[0]))
print('Sample BERT embedding vector - note includes negative values', sentence_embeddings[0])
Unfortunately the only way to INCREASE the dimension of the embedding in a meaningful way is retraining the model. :(
However, maybe this is not what you need...maybe you should consider fine-tuning a model:
I suggest you take a look at sentence-transformers from UKPLabs. They have pretrained models for sentence embedding for over 100 languages. The best part is that you can fine tune those models.
Good Luck!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With