Pandas DataFrame高效行迭代

18 7 月, 2025 - By admin

Spread the love

Pandas DataFrame是Python数据处理的基石。虽然Pandas擅长向量化运算，但在某些情况下，需要逐行处理。本文探讨了迭代DataFrame行的最有效方法，并重点介绍了它们的优缺点。

`iterrows()`：逐行迭代器

iterrows() 是一种直接的方法，它将每一行作为 (索引, Series) 对返回。虽然对于简单的任务很方便，但由于为每一行创建 Series 的开销，对于较大的 DataFrame，它通常效率低于其他选项。


import pandas as pd

data = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(data)

for index, row in df.iterrows():
    print(f"Index: {index}, Row: {row}")

`itertuples()`：优化的行迭代

为了提高性能，尤其是在大型数据集的情况下，推荐使用itertuples()。它将每一行作为命名元组返回，提供通过名称更快地访问列的功能。这避免了iterrows()创建 Series 的开销。


import pandas as pd

data = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(data)

for row in df.itertuples():
    print(f"Col1: {row.col1}, Col2: {row.col2}")

`apply()`：用于行操作的函数应用

使用axis=1的apply() 是一个强大的工具，用于将函数应用于每一行。对于可以用函数表达的行操作，它通常是最有效和最符合Python风格的方法。


import pandas as pd

data = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(data)

def process_row(row):
    return row['col1'] * 2 + row['col2']

df['result'] = df.apply(process_row, axis=1)
print(df)

何时避免迭代

在求助于逐行迭代之前，请考虑您的任务是否可以使用向量化运算来完成。Pandas 的优势在于它能够同时对整列进行运算，从而显著提高性能。只有在向量化不可能或不切实际的情况下，才应考虑迭代方法，并优先考虑itertuples()或apply()以提高效率。

Pandas DataFrame高效行迭代

目录

`iterrows()`：逐行迭代器

`itertuples()`：优化的行迭代

`apply()`：用于行操作的函数应用

何时避免迭代

发表回复取消回复

目录

iterrows()：逐行迭代器

itertuples()：优化的行迭代

apply()：用于行操作的函数应用

何时避免迭代

相关文章：

发表回复 取消回复

`iterrows()`：逐行迭代器

`itertuples()`：优化的行迭代

`apply()`：用于行操作的函数应用

发表回复取消回复