Pandas is a powerful Python library for data manipulation and analysis. Calculating the average (mean) of a column in a Pandas DataFrame is a frequently needed task. This article demonstrates two efficient methods to accomplish this: using the df.mean()
method and the df.describe()
method.
Table of Contents:
Calculating the Mean with df.mean()
The df.mean()
method offers a direct way to compute the average of all numeric columns in your DataFrame. To obtain the average of a specific column, simply select the column using bracket or dot notation and then apply the mean()
method.
Here’s an example:
import pandas as pd
# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 28],
'Score': [85, 92, 78, 88]}
df = pd.DataFrame(data)
# Average age using bracket notation
average_age = df['Age'].mean()
print(f"Average age: {average_age}")
# Average score using dot notation
average_score = df.Score.mean()
print(f"Average score: {average_score}")
This will produce:
Average age: 26.25
Average score: 85.75
Importantly, df.mean()
intelligently handles missing values (NaN) by excluding them from the calculation. However, if your column contains non-numeric data, you’ll encounter a TypeError
. Always ensure your column contains only numeric values before using this method.
Exploring Descriptive Statistics with df.describe()
The df.describe()
method generates a comprehensive summary of your DataFrame’s descriptive statistics. This includes the mean, count, standard deviation, minimum, maximum, and quartiles for each numeric column. While providing more than just the average, it’s a handy way to obtain the mean alongside other valuable statistical measures.
Using the same DataFrame:
import pandas as pd
# Sample DataFrame (same as before)
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 28],
'Score': [85, 92, 78, 88]}
df = pd.DataFrame(data)
# Descriptive statistics
summary_stats = df.describe()
print(summary_stats)
This will output a table like this:
Age Score
count 4.0 4.0000
mean 26.25 85.7500
std 3.50 6.2361
min 22.00 78.0000
25% 23.75 81.2500
50% 26.50 86.5000
75% 29.25 90.2500
max 30.00 92.0000
The mean for ‘Age’ and ‘Score’ are clearly visible. Remember that df.describe()
only processes numeric columns.
In summary, both df.mean()
and df.describe()
provide effective ways to compute column averages in Pandas DataFrames. Select the method that best suits your needs: df.mean()
for just the average, or df.describe()
for a broader statistical overview. Always handle potential data type errors before applying these methods.