Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distinguish PySpark and Pandas DataFrames in Python type hints (PyCharm)

In PyCharm it seems that the type hints do not trigger a warning if a pyspark.sql.DataFrame is used in place of a pandas.DataFrame or vice versa.

e.g. the following code will not generate any warnings at all:

from pyspark.sql import DataFrame as SparkDataFrame
from pandas import DataFrame as PandasDataFrame

def test_pandas_to_spark(a: PandasDataFrame) -> SparkDataFrame:
    return a

def test_spark_to_pandas(b: SparkDataFrame) -> PandasDataFrame:
    return b.toPandas()

test_spark_to_pandas(PandasDataFrame({'a': [1, 2, 3]}))

Is this known / possible to fix?

BTW: I do have pyspark stubs installed: pyspark-stubs==2.4.0.post2

like image 533
Robert Muil Avatar asked Jan 30 '26 07:01

Robert Muil


1 Answers

There is now a library called pandas-stubs which provides pandas type hints for static type checking tools to pick up on.

like image 172
amin_nejad Avatar answered Feb 01 '26 20:02

amin_nejad



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!