I see it in __init__
of e.g. Adam optimizer: self._set_hyper('beta_1', beta_1)
. There are also _get_hyper
and _serialize_hyperparameter
throughout the code. I don't see these in Keras optimizers - are they optional? When should or shouldn't they be used when creating custom optimizers?
They enable setting and getting Python literals (int
, str
, etc), callables, and tensors. Usage is for convenience and consistency: anything set via _set_hyper
can be retrieved via _get_hyper
, avoiding repeating boilerplate code. I've implemented Keras AdamW in all major TF & Keras versions, and will use it as reference.
t_cur
is a tf.Variable
. Each time we "set" it, we must invoke K.set_value
; if we do self.t_cur=5
, this will destroy tf.Variable
and wreck optimizer functionality. If instead we used model.optimizer._set_hyper('t_cur', 5)
, it'd set it appropriately - but this requires for it to have been defined via set_hyper
previously.Both _get_hyper
& _set_hyper
enable programmatic treatment of attributes - e.g., we can make a for-loop with a list of attribute names to get or set using just _get_hyper
and _set_hyper
, whereas otherwise we'd need to code conditionals and typechecks. Also, _get_hyper(name)
requires that name
was previously set via set_hyper
.
_get_hyper
enables typecasting via dtype=
. Ex: beta_1_t
in default Adam is cast to same numeric type as var
(e.g. layer weight), which is required for some ops. Again a convenience, as we could typecast manually (math_ops.cast
).
_set_hyper
enables the use of _serialize_hyperparameter
, which retrieves the Python values (int
, float
, etc) of callables, tensors, or already-Python values. Name stems from the need to convert tensors and callables to Pythonics for e.g. pickling or json-serializing - but can be used as convenience for seeing tensor values in Graph execution.
Lastly; everything instantiated via _set_hyper
gets assigned to optimizer._hyper
dictionary, which is then iterated over in _create_hypers
. The else
in the loop casts all Python numerics to tensors - so _set_hyper
will not create int
, float
, etc attributes. Worth noting is the aggregation=
kwarg, whose documentation reads: "Indicates how a distributed variable will be aggregated". This is the part a bit more than "for convenience" (lots of code to replicate).
_set_hyper
has a limitation: does not allow instantiating dtype
. If add_weight
approach in _create_hypers
is desired with dtype, then it should be called directly.When to use vs. not use: use if the attribute is used by the optimizer via TensorFlow ops - i.e. if it needs to be a tf.Variable
. For example, epsilon
is set regularly, as it's never needed as a tensor variable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With