I was looking at the scikit-learn implementation of sklearn.model_selection.train_test_split. Sklearn usually has high quality code so I read it from time to time to learn good practices. But recently I found stuff like this:
def train_test_split(*arrays, **options):
"""
...
"""
n_arrays = len(arrays)
if n_arrays == 0:
raise ValueError("At least one array required as input")
test_size = options.pop('test_size', None)
train_size = options.pop('train_size', None)
random_state = options.pop('random_state', None)
stratify = options.pop('stratify', None)
shuffle = options.pop('shuffle', True)
if options:
raise TypeError("Invalid parameters passed: %s" % str(options))
# ...
I was wondering, why this kind of approach was chosen? It seems like an antipatern for me, but I assume that sklearn developers knew what they were doing, so I probably miss some point. Why not just simply:
def train_test_split(*arrays, test_size=None, train_size=None, ...):
# ...
Are there any advantages of in-function unpacking?
It is to control the TypeError to give a more understandable error. I guess!
With your proposed approach the error would be:
TypeError: train_test_split() got an unexpected keyword argument 'not_a_valid_kw'
To condense all the answers :
1) Functions headers are less prolix needing less arguments.
2) Allows code to be easily extended without the need to redefine every invocation to the function.
3)As johnashu suggested it allows to better control errors as they can be customized based on the particular entry of **kwargs beeing missing or wrongly formatted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With