Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pytorch equivalent of tensorflow keras StringLookup?

I'm working with pytorch now, and I'm missing a layer: tf.keras.layers.StringLookup that helped with the processing of ids. Is there any workaround to do something similar with pytorch?

An example of the functionality I'm looking for:

vocab = ["a", "b", "c", "d"]
data = tf.constant([["a", "c", "d"], ["d", "a", "b"]])
layer = tf.keras.layers.StringLookup(vocabulary=vocab)
layer(data)

Outputs:
<tf.Tensor: shape=(2, 3), dtype=int64, numpy=
array([[1, 3, 4],
       [4, 1, 2]])>
like image 834
jiwidi Avatar asked Oct 17 '25 11:10

jiwidi


2 Answers

Package torchnlp,

pip install pytorch-nlp
from torchnlp.encoders import LabelEncoder

data = ["a", "c", "d", "e", "d"]
encoder = LabelEncoder(data, reserved_labels=['unknown'], unknown_index=0)

enl = encoder.batch_encode(data)

print(enl)
tensor([1, 2, 3, 4, 3])
like image 137
Damir Devetak Avatar answered Oct 20 '25 00:10

Damir Devetak


You can use Collections.Counter along with torchtext's vocab object to construct a lookup function from your vocabulary. You can then easily pass sequences to this and get their encodings as a tensor:

from torchtext.vocab import vocab
from collections import Counter

tokens = ["a", "b", "c", "d"]
samples = [["a", "c", "d"], ["d", "a", "b"]]

# Build string lookup
lookup = vocab(Counter(tokens))
>>> torch.tensor([lookup(s) for s in samples])
tensor([[0, 2, 3],
        [3, 0, 1]])
like image 35
iacob Avatar answered Oct 20 '25 01:10

iacob



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!