Bash Scripting

Mastering Pattern Matching in Bash

Spread the love

Regular Expressions with the =~ Operator

Bash’s =~ operator enables powerful regular expression matching. Regular expressions offer far greater flexibility than simple wildcard matching, allowing you to define complex patterns for string manipulation and validation. The operator returns true if the string on the left matches the regular expression on the right, which must be enclosed in double quotes.


string="This is a test string with 123 digits"

if [[ "$string" =~ "test" ]]; then
  echo "The string contains 'test'"
fi

if [[ "$string" =~ "string$" ]]; then  # $ matches the end of the string
  echo "The string ends with 'string'"
fi

if [[ "$string" =~ "[0-9]+" ]]; then  # Matches one or more digits
  echo "The string contains digits"
fi

if [[ "$string" =~ ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$ ]]; then
  echo "The string looks like an email address"
fi
  

The examples above demonstrate basic regular expression usage. For more complex patterns, consult a comprehensive regular expression tutorial or reference. The flexibility of regular expressions makes them ideal for tasks like validating email addresses, IP addresses, or other structured data within strings.

Wildcard Matching with the * Operator

The * wildcard, simpler than regular expressions, matches zero or more characters. It’s frequently used in file globbing and basic conditional checks. While less powerful, it’s efficient for straightforward scenarios.


files=(*.txt)  # Expands to a list of all .txt files

if [[ "$filename" == "*.log" ]]; then
  echo "This is a log file"
fi

if [[ "$variable" == "pre*suf" ]]; then
  echo "The variable starts with 'pre' and ends with 'suf'"
fi
  

The first example showcases file globbing; the others demonstrate basic pattern matching within conditional statements. Note that ==, not =~, is used for wildcard matching.

Extracting Subpatterns

Both regular expressions and wildcards can extract portions of a matching string, though the methods differ significantly.

Regular Expressions (with =~)

Regular expressions use capturing groups, defined with parentheses (), to isolate specific parts of the matched string. These captured groups are accessible via the BASH_REMATCH array.


string="My user ID is 12345"
if [[ "$string" =~ "ID is ([0-9]+)" ]]; then
  user_id="${BASH_REMATCH[1]}"
  echo "User ID: $user_id"
fi
  

([0-9]+) captures one or more digits, stored in ${BASH_REMATCH[1]}.

Wildcards (with *)

Wildcard matching doesn’t directly support subpattern extraction. Instead, you need string manipulation techniques after a basic match.


filename="my_report_2024-10-26.txt"
if [[ "$filename" == "my_report_*.txt" ]]; then
  date="${filename%.*}"  # Remove the '.txt' extension
  date="${date##*_}"     # Remove everything before the last '_'
  echo "Report date: $date"
fi
  

This example uses parameter expansion to achieve subpattern extraction, demonstrating a less elegant but effective approach for simpler scenarios.

Bash offers versatile pattern-matching capabilities. Choose the method—regular expressions or wildcards—that best suits your needs, balancing power and simplicity.

Leave a Reply

Your email address will not be published. Required fields are marked *