When I created NLP model, I used keras tokenizer to tokenize my training data. So every word in training data has a number associated with it. Now I want to run the model in android app. So I converted the model into tflite format. Now in my app when the user gives me a text input I should convert it into array of numbers using the same tokens which I used for training data. I am unable to do so because tflite only contains the model and not the tokenizer. How to do this?
You need to migrate the vocabulary of tokenized words from Python to Android. Use the tf.keras.preprocessing.text.Tokenizer.word_index property. This is a dict of ( word , index ) which you need to export as a JSON file.
import json
with open( 'android/word_dict.json' , 'w' ) as file:
json.dump( tokenizer.word_index , file )
Now, we parse the JSON file in Android and create a Hashmap<String,Integer>.
int[] which is the input for our model.I have discussed the whole process in this blog -> Text Classification in Android with TensorFlow
Found a new layer in keras called tensorflow.keras.layers.experimental.preprocessing.TextVectorization.
This layer does the process of text tokenization.
This layer can be added in the model and will get imported when the model is imported. This was used in the NLP model program presented in Tensorflow Dev summit 2020.
Link to the talk: https://www.youtube.com/watch?v=aNrqaOAt5P4&list=LLyOAs3oTHjtkbQ9pqG0MYIQ&index=5&t=616s
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With