Reading pyspark documentation I know that a foreach is done as:
def f(x): print(x)
sc.parallelize([1, 2, 3, 4, 5]).foreach(f)
But, what if I use a function with several arguments?
An example:
def f(x,arg1,arg2,arg3):
print(x*arg1+arg2+arg3)
The point is to use something similar this syntax:
sc.parallelize([1, 2, 3, 4, 5]).foreach(f(arg1=11,arg2=21,arg3=31))
You can make a partial function:
from functools import partial
sc.parallelize([1, 2, 3, 4, 5]).foreach(
partial(f, arg1=11, arg2=21, arg3=31)
)
partial takes as input a function and a sequence of unnamed (*args) and named (**kwargs) parameters, and produces a new function that if you call that function will call the original function f, with the unnamed and named parameters already filled in.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With