Troubleshooting PySparkTypeError: [NOT_ITERABLE]
Troubleshooting PySparkTypeError:
[NOT_ITERABLE]
Ever encountered a confusing error message like 'Column is not iterable' while working with PySpark? Here's a relatable scenario:
You're trying to find the highest salary from a list of employees using PySpark. You've got your DataFrame set up, but when you run your code, The error pops up, leaving you scratching your head.
The reason? You forgot a tiny but essential step: importing the 'max' function!
Here's the fix, in plain language:
# Don't forget to import the 'max' function!
from pyspark.sql.functions import max
# Now, let's find that maximum salary:
data = [("John", 2500.0), ("Alice", 3000.0), ("Bob", 2200.0)]
usr_data = spark.createDataFrame(data,["name","salary"])
+-----+----------+
| name|salary |
+-----+----------+
| John| 2500.0 |
|Alice | 3000.0 |
| Bob | 2200.0 |
+-----+----------+
max_sal = usr_data.agg(max(usr_data['salary']).alias('max_sal'))
max_sal.show()
Comments
Post a Comment