Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in huggingface-tokenizers

How does one set the pad token correctly (not to eos) during fine-tuning to avoid model not predicting EOS?

what is the difference between len(tokenizer) and tokenizer.vocab_size

How can I make sentence-BERT throw an exception if the text exceeds max_seq_length, and what is the max possible max_seq_length for all-MiniLM-L6-v2?

Huggingface MarianMT translators lose content, depending on the model

How to add new special token to the tokenizer?

Tokenizer.from_file() HUGGINFACE : Exception: data did not match any variant of untagged enum ModelWrapper

Loading checkpoint shards takes too long

what is so special about special tokens?

pip on Docker image cannot find Rust - even though Rust is installed

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation

HuggingFace Bert Sentiment analysis

HuggingFace AutoModelForCasualLM "decoder-only architecture" warning, even after setting padding_side='left'

BERT - Is that needed to add new tokens to be trained in a domain specific environment?

Strange results with huggingface transformer[marianmt] translation of larger text

resize_token_embeddings on the a pertrained model with different embedding size