Troubleshooting PySparkTypeError:

[NOT_ITERABLE]

Ever encountered a confusing error message like 'Column is not iterable' while working with PySpark? Here's a relatable scenario:

You're trying to find the highest salary from a list of employees using PySpark. You've got your DataFrame set up, but when you run your code, The error pops up, leaving you scratching your head.

The reason? You forgot a tiny but essential step: importing the 'max' function!

Here's the fix, in plain language:

# Don't forget to import the 'max' function!
from pyspark.sql.functions import max

# Now, let's find that maximum salary:
data = [("John", 2500.0), ("Alice", 3000.0), ("Bob", 2200.0)]
usr_data = spark.createDataFrame(data,["name","salary"])

+-----+----------+
| name|salary |
+-----+----------+
| John| 2500.0 |
|Alice | 3000.0 |
| Bob | 2200.0 |
+-----+----------+

max_sal = usr_data.agg(max(usr_data['salary']).alias('max_sal'))
max_sal.show()

By adding that one line, you're back on track, finding the max salary without an obstacle.🚀

Search This Blog

Big Data Engineering

Troubleshooting PySparkTypeError: [NOT_ITERABLE]

Troubleshooting PySparkTypeError:

[NOT_ITERABLE]

Comments

Post a Comment

Popular posts from this blog

Departmental Salary Insights: with SQL and PySpark

Handling a PySpark TypeError: ‘unsupported operand type(s) for +: ‘int’ and ‘str’