Pandas Tutorials

Efficiently Extracting and Sorting Unique Values in Pandas DataFrames

Spread the love

Pandas is a powerful Python library for data manipulation and analysis. A common task involves extracting unique values from a DataFrame column and then sorting them. This article explores two efficient methods to accomplish this.

Table of Contents

Extracting Unique Values with the unique() Method

The unique() method provides a concise way to obtain unique values from a Pandas Series (a single column). It returns a NumPy array containing only the unique elements, preserving their original order.


import pandas as pd

data = {'col1': ['A', 'B', 'A', 'C', 'B', 'D'],
        'col2': [1, 2, 1, 3, 2, 4]}
df = pd.DataFrame(data)

unique_values = df['col1'].unique()
print(unique_values)  # Output: ['A' 'B' 'C' 'D']

This code creates a sample DataFrame and then uses unique() on the ‘col1’ column. The output is a NumPy array showing the unique values in their first appearance order.

Extracting Unique Values with the drop_duplicates() Method

The drop_duplicates() method offers more flexibility, particularly when dealing with multiple columns. While primarily used for removing duplicate rows, it can efficiently extract unique values from a single column.


import pandas as pd

data = {'col1': ['A', 'B', 'A', 'C', 'B', 'D'],
        'col2': [1, 2, 1, 3, 2, 4]}
df = pd.DataFrame(data)

unique_values = df['col1'].drop_duplicates().values
print(unique_values)  # Output: ['A' 'B' 'C' 'D']

This example directly applies drop_duplicates() to the ‘col1’ Series. The .values attribute converts the result to a NumPy array. The order of unique values mirrors their first occurrence in the DataFrame.

Sorting Unique Values

Both methods above return unique values, but not necessarily in sorted order. To sort, utilize NumPy’s sort() function or Pandas’ sort_values() method.


import pandas as pd
import numpy as np

data = {'col1': ['A', 'B', 'A', 'C', 'B', 'D'],
        'col2': [1, 2, 1, 3, 2, 4]}
df = pd.DataFrame(data)

# Using unique() and sort()
unique_values = np.sort(df['col1'].unique())
print(unique_values)  # Output: ['A' 'B' 'C' 'D']

# Using drop_duplicates() and sort_values()
unique_values = df['col1'].drop_duplicates().sort_values().values
print(unique_values)  # Output: ['A' 'B' 'C' 'D']

This showcases sorting using both approaches. np.sort() works on the NumPy array from unique(), while sort_values() is used on the Pandas Series from drop_duplicates(). Both yield a sorted array. For descending order with sort_values(), use ascending=False.

In summary, both unique() and drop_duplicates() efficiently extract unique values. The optimal choice depends on your specific needs and whether you’re working with single or multiple columns. Remember to sort the results using the appropriate method for your desired order.

Leave a Reply

Your email address will not be published. Required fields are marked *