Identify a string of consecutive characters in bash script
Identify a string of consecutive characters in bash script – While processing a data file for a client I was faced with a challenge to determine if a variable I was grabbing contained a string of identical characters (which in my case constituted some form of dummy / false record)
Todays tutorial will show you how to identify this situation so you can then process it however you need.
To start with lets create a variable with identical characters –
a="111111111"
My next step for this was to split the variable up into seperate lines for each character. sed is able to achieve this for you like this –
echo "${a}" | sed "s/\(.\)/\1\n/g"
Your output now looks like this –
# echo "${a}" | sed "s/\(.\)/\1\n/g"
1
1
1
1
1
1
1
#
It appears from the above that there is a blank record at the end so lets just remove that to our command –
echo "${a}" | sed "s/\(.\)/\1\n/g" | grep -v "^$"
# echo "${a}" | sed "s/\(.\)/\1\n/g" | grep -v "^$"
1
1
1
1
1
1
1
#
Much better! So now we can use sort and uniq to identify unique lines –
echo "${a}" | sed "s/\(.\)/\1\n/g" | grep -v "^$" | sort | uniq
# echo "${a}" | sed "s/\(.\)/\1\n/g" | grep -v "^$" | sort | uniq
1
#
So now we have just unique characters displayed. Last step is to count them so we can then make a decision based on the result –
echo "${a}" | sed "s/\(.\)/\1\n/g" | grep -v "^$" | sort | uniq | wc -l
Also displays 1, but this time is the count of unique lines. Put that output into a variable like this –
COUNT1=`echo "${a}" | sed "s/\(.\)/\1\n/g" | grep -v "^$" | sort | uniq | wc -l`
Lastly we can do our if statement –
if [[ $COUNT1 -eq 1 ]]
then
echo "Variable contains a single unique character"
else
echo "Variable contains multiple different characters
fi
To prove the point lets change our variable –
# a="123411"
# echo "${a}" | sed "s/\(.\)/\1\n/g" | grep -v "^$" | sort | uniq | wc -l
4
#
If you found this quick tutorial useful then check out some other useful bash scripts and tips here