Handling a PySpark TypeError: ‘unsupported operand type(s) for +: ‘int’ and ‘str’
Handling a PySpark TypeError: ‘unsupported operand type(s) for +: ‘int’ and ‘str’
You’ve been working with PySpark, trying to calculate the rolling sum of a column in your DataFrame, and you’ve set everything up perfectly. However, when you run your code, you face the dreaded TypeError: ‘unsupported operand type(s) for +: ‘int’ and ‘str’ error.
The Reason:
The error indicates that there’s a type mismatch in your DataFrame. Specifically, you’re trying to perform arithmetic operations on columns with different data types — an integer (int) and a string (str).
In this scenario, you forgot a tiny but essential step: importing the ‘sum’ function!
Here is the sample Hospital data,

from pyspark.sql.window import Window as w
from pyspark.sql.functions import col
df_hospital = spark.read.options(header='True',inferSchema='True').csv('C:/Users/hp/Python_Sums_20thfeb/pyspark/hospital.csv')
ovr_sp = w.partitionBy('department').orderBy('discharge_date')
df1 = df_hospital.withColumn("amount_sum", sum('insurance_amount').over(ovr_sp))
df1.show()
Output of the Above Code:

The Fix:
The fix is straightforward. You need to import ‘sum’ function.
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("ReadHospitalData").getOrCreate()
df_hospital = spark.read.options(header='True', inferSchema='True').csv('C:/Users/hp/Python_Sums_20thfeb/pyspark/hospital.csv')
df_hospital = df_hospital.withColumn("insurance_amount", df_hospital["insurance_amount"].cast("double"))
from pyspark.sql.window import Window as w
window_spec = w.partitionBy('department').orderBy('patient_id')
df1 = df_hospital.withColumn("amount_sum", sum('insurance_amount').over(window_spec))
df1.show()
Now, the output is,

Conclusion:
In this scenario, the error occurred because the ‘sum’ function was used without importing it. The solution was to import the ‘sum’ function from the ‘pyspark.sql.functions’ module. However, the error message ‘unsupported operand type(s) for +: ‘int’ and ‘str’ can also occur when trying to perform arithmetic operations on columns with different data types — an integer (int) and a string (str). In such cases, it is essential to ensure that the columns are of compatible data types before performing operations.
Comments
Post a Comment