Troubleshooting PySparkTypeError: [NOT_ITERABLE]

Troubleshooting PySparkTypeError:

 [NOT_ITERABLE] 



Ever encountered a confusing error message like 'Column is not iterable' while working with PySpark? Here's a relatable scenario:

You're trying to find the highest salary from a list of employees using PySpark. You've got your DataFrame set up, but when you run your code, The error pops up, leaving you scratching your head.


The reason? You forgot a tiny but essential step: importing the 'max' function!


Here's the fix, in plain language:


# Don't forget to import the 'max' function!

from pyspark.sql.functions import max


# Now, let's find that maximum salary:
data = [("John", 2500.0), ("Alice", 3000.0), ("Bob", 2200.0)]
usr_data = spark.createDataFrame(data,["name","salary"])



+-----+----------+
| name|salary    |
+-----+----------+
| John| 2500.0   |
|Alice | 3000.0  |
| Bob | 2200.0   |
+-----+----------+


max_sal = usr_data.agg(max(usr_data['salary']).alias('max_sal'))
max_sal.show()









By adding that one line, you're back on track, finding the max salary without an obstacle.🚀

Comments

Popular posts from this blog

Departmental Salary Insights: with SQL and PySpark

Handling a PySpark TypeError: ‘unsupported operand type(s) for +: ‘int’ and ‘str’