Pytorch equivalent of tensorflow keras StringLookup?

Question

I'm working with pytorch now, and I'm missing a layer: tf.keras.layers.StringLookup that helped with the processing of ids. Is there any workaround to do something similar with pytorch?

An example of the functionality I'm looking for:

vocab = ["a", "b", "c", "d"]
data = tf.constant([["a", "c", "d"], ["d", "a", "b"]])
layer = tf.keras.layers.StringLookup(vocabulary=vocab)
layer(data)

Outputs:
<tf.Tensor: shape=(2, 3), dtype=int64, numpy=
array([[1, 3, 4],
       [4, 1, 2]])>

Damir Devetak · Accepted Answer

Package torchnlp,

pip install pytorch-nlp

from torchnlp.encoders import LabelEncoder

data = ["a", "c", "d", "e", "d"]
encoder = LabelEncoder(data, reserved_labels=['unknown'], unknown_index=0)

enl = encoder.batch_encode(data)

print(enl)

tensor([1, 2, 3, 4, 3])

iacob · Answer

You can use Collections.Counter along with torchtext's vocab object to construct a lookup function from your vocabulary. You can then easily pass sequences to this and get their encodings as a tensor:

from torchtext.vocab import vocab
from collections import Counter

tokens = ["a", "b", "c", "d"]
samples = [["a", "c", "d"], ["d", "a", "b"]]

# Build string lookup
lookup = vocab(Counter(tokens))

>>> torch.tensor([lookup(s) for s in samples])
tensor([[0, 2, 3],
        [3, 0, 1]])

Pytorch equivalent of tensorflow keras StringLookup?

Tags:

python

tensorflow

keras

pytorch

jiwidi

2 Answers

Damir Devetak

iacob

Recent Activity

Donate For Us

Pytorch equivalent of tensorflow keras StringLookup?

Tags:

python

tensorflow

keras

pytorch

jiwidi

2 Answers

Damir Devetak

iacob

Related questions

Recent Activity

Donate For Us