I am new to spark and have a small doubt in spark. If I write some pyspark code which has some python code as shown below
from datetime import datetime
now = datetime.now()
current_time = now.strftime("%H:%M:%S")
print("Current Time =", current_time)
df = spark.read.format("csv").option("delimiter", ",").load('countries.csv')
df = df.withColumn('C_DT',lit(current_time))
print("new column added")
here does the executor run the datetime.now or each executor run the command. who runs the print commands, executor or the driver.
Both print commands and datetime.now() are executed in Spark driver. The current_time will be passed to executors on next action command to actually add it to DataFrame.
At the time of print("new column added") only df's schema has changed, and there was no actual work done.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With