Pandas DataFrames are a cornerstone of data manipulation in Python. Frequently, you’ll need to modify individual cells within your DataFrame. This article explores three efficient methods for achieving this using the DataFrame’s index.
Table of Contents
Setting Cell Values with .at
The .at
accessor offers a highly efficient way to access and modify a single cell in a DataFrame using its row and column labels. Its speed makes it ideal for single-value assignments.
import pandas as pd
# Sample DataFrame
data = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(data, index=['A', 'B', 'C'])
# Set the value of the cell at row 'B' and column 'col1' to 10
df.at['B', 'col1'] = 10
print(df)
This will output:
col1 col2
A 1 4
B 10 5
C 3 6
.at
is optimized for speed and simplicity when dealing with single cells. However, attempting to assign multiple values will result in a TypeError
.
Setting Cell Values with .loc
The .loc
accessor provides greater flexibility. It allows label-based indexing for rows and columns and can handle assignments of single values or arrays to multiple cells simultaneously.
import pandas as pd
# Sample DataFrame
data = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(data, index=['A', 'B', 'C'])
# Set the value of the cell at row 'A' and column 'col2' to 100
df.loc['A', 'col2'] = 100
print(df)
# Setting multiple values
df.loc[['A', 'B'], 'col1'] = [5, 15] #Assigns 5 to A and 15 to B
print(df)
This will first output:
col1 col2
A 1 100
B 2 5
C 3 6
And then:
col1 col2
A 5 100
B 15 5
C 3 6
.loc
is the most versatile method, suitable for both single-cell updates and more complex scenarios involving multiple rows and columns. While versatile, for single-cell changes, .at
is generally faster.
The Deprecated .set_value()
Method
The .set_value()
method is deprecated in newer Pandas versions. While it might still function in older versions, using .at
or .loc
is strongly recommended for better compatibility and performance. Avoid using .set_value()
in new code.
In summary, both .at
and .loc
offer effective ways to modify individual cells in a Pandas DataFrame. Prefer .at
for its speed and simplicity with single values, and .loc
for its flexibility when working with multiple cells or more complex modifications.