Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow: Inspecting Hash buckets for Categorical and Feature Columns

I am fitting Linear Classifier for pretty wide and sparse data using number of Categorical Columns with hash bucket and Crossed Feature Columns as Feature Columns.

Later I want to use the weights/coefficients of the model in a custom serving infrastructure. I know how to extract the weights from the model, but obviously, for aforementioned columns, they come for an already hashed feature values.

I can reconstruct a Hashtable (value -> hashed value) for a simple categorical columns using tf.string_to_hash_bucket_fast, but I am getting trouble doing that for Crossed Feature Columns.

For a pair of values of two categorical columns building up a Crossed Column - how can I understand which bucket they will get into?

like image 352
Sergio Kozlov Avatar asked Nov 15 '25 19:11

Sergio Kozlov


1 Answers

After inspecting the source code I found out that the simplest way would be to construct an Input Layer for input data consisting of the all the distinct values (or their combinations) in the column.

As a result you get a DenseTensor consisting of 0 and 1, each row corresponds to a distinct value and where 1s are sitting in the columns corresponding to the actual hash bucket number (I've verified that for Categorical Columns, should be the same for CrossedColumns).

Here is the example code (for both Categorical Column and Crossed Column):

import tensorflow as tf
from tensorflow.python.feature_column import feature_column as fc

actual_sex = {'sex': tf.Variable(['male', 'female', 'female', 'male'], tf.string)}
actual_nationality = {'nationality': tf.Variable(['belgian', 'french', 'belgian', 'belgian'], tf.string)}
actual_sex_nationality = dict(actual_sex, **actual_nationality)

# hashed_column
sex_hashed_raw = fc.categorical_column_with_hash_bucket("sex", 10)
sex_hashed = fc.indicator_column(sex_hashed_raw)

# crossed column
crossed_sn_raw = fc.crossed_column(['sex', 'nationality'], hash_bucket_size = 20)
crossed_sn = fc.indicator_column(crossed_sn_raw)

layer_s = tf.feature_column.input_layer(actual_sex_nationality, sex_hashed)
layer_sn = tf.feature_column.input_layer(actual_sex_nationality, crossed_sn)

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

print(sess.run(layer_s))
print(sess.run(layer_sn))
like image 117
Sergio Kozlov Avatar answered Nov 18 '25 09:11

Sergio Kozlov



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!