Pandas DataFrames are essential for data manipulation in Python. Adding new columns is a common task, and Pandas offers several efficient ways to achieve this. This article explores four key methods, highlighting their strengths and weaknesses to help you choose the best approach for your situation.
Table of Contents
[] Operator Method: The Quick and Easy Way
This is the simplest method, ideal for adding columns based on existing data or straightforward calculations. You directly assign values to a new column using square brackets.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 28]}
df = pd.DataFrame(data)
df['City'] = ['New York', 'London', 'Paris']
print(df)
Limitations: This method can’t insert at a specific position and requires the new column data to match the DataFrame’s length.
df.insert()
Method: Precise Column Placement
df.insert()
provides more control, letting you specify the column’s index (position). It takes three arguments: the position, the column name, and the data.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 28]}
df = pd.DataFrame(data)
df.insert(1, 'City', ['New York', 'London', 'Paris'])
print(df)
Best for: Situations where the column’s order is critical.
df.assign()
Method: Adding Multiple Columns Efficiently
df.assign()
is particularly useful for adding multiple columns at once or creating new columns based on calculations. Importantly, it returns a *new* DataFrame, leaving the original unchanged.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 28]}
df = pd.DataFrame(data)
df = df.assign(City=['New York', 'London', 'Paris'], Age_Squared=df['Age']**2)
print(df)
Best for: Multiple column additions and calculated columns; its immutability prevents accidental data loss.
df.loc()
Method: Conditional Column Creation
df.loc()
offers the most flexibility, allowing conditional column creation based on row selection and boolean indexing.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 28]}
df = pd.DataFrame(data)
df.loc[df['Age'] < 30, 'Age_Group'] = 'Young'
df.loc[df['Age'] >= 30, 'Age_Group'] = 'Older'
print(df)
Best for: Adding columns based on complex conditions; requires familiarity with boolean indexing.
Conclusion: The optimal method depends on your specific needs. The [] operator is quick for simple additions, df.insert()
controls column position, df.assign()
handles multiple or calculated columns efficiently, and df.loc()
enables conditional column creation. Choose the method that best balances readability and functionality for your task.