Counting unique lines in a file is a common task in Linux. This article presents two efficient command-line methods: using sort
and uniq
, and using awk
.
Table of Contents
Counting Unique Lines with sort
and uniq
This method combines the power of sort
and uniq
for a straightforward approach. sort
arranges lines alphabetically, a prerequisite for uniq
, which counts only consecutive identical lines. The -c
option in uniq
adds a count prefix to each line.
To count unique lines in file.txt
:
sort file.txt | uniq -c
This displays each unique line with its count. To get the total number of unique lines, pipe the output to wc -l
:
sort file.txt | uniq -c | wc -l
Example:
If file.txt
contains:
apple
banana
apple
orange
banana
apple
sort file.txt | uniq -c
outputs:
3 apple
2 banana
1 orange
And sort file.txt | uniq -c | wc -l
outputs:
3
Counting Unique Lines with awk
awk
offers a flexible solution, particularly useful for more complex scenarios. This method employs an associative array to track unique lines and their counts.
To count unique lines and display them with their counts:
awk '{count[$0]++} END {for (line in count) print count[line], line}' file.txt
This script increments the count for each line in the count
array, using the line as the key. The END
block iterates through the array, printing each line’s count and the line itself.
To obtain only the total count of unique lines:
awk '{count[$0]++} END {print length(count)}' file.txt
This uses length(count)
to directly output the number of unique lines (the array’s size).
Example:
Using the same file.txt
, the first awk
command produces the same output as the sort | uniq -c
method. The second awk
command outputs 3
, indicating three unique lines.
Choose the method that best suits your needs. sort
and uniq
are simpler for basic tasks; awk
provides greater flexibility for complex scenarios.