Linux/UNIX - Using grep With Regular Expressions
Searching for Lines Containing Patterns
The grep command is perfect in these situations and we explore some of it’s capabilities here.
grep – Global Regular Expression Print
Linux and UNIX systems offer three variants of the grep command:
- grep
- egrep
- fgrep
grep supports basic regular expression characters and the other two support some of the more more advanced regular expression characters.
The basic characters supported by grep are:
- [….], [^….], ^, $, ., *, \
Here is a brief description of these special characters
- list of characters enclosed by [ and ] matches any single character in that list (if first character is the caret ^ then it matches any character not in the list)
- The caret ^ at the start of a string matches and the empty string at the beginning of the line
- The dollar sign $ at the end of a string matches the empty string at the end of a line
- The period . matches any single character.
- The asterisk * matches zero or more occurrences of the previous character
- The back slash \ is an escape character
Search for a pattern anywhere in a line
The following example matches all lines in the ps -ef output that have sh anywhere in them:
[ptr@srva ~]$ ps -ef | grep "sh"
root 139 7 0 13:40 ? 00:00:00 [pdflush]
root 140 7 0 13:40 ? 00:00:00 [pdflush]
root 2393 1 0 13:41 ? 00:00:00 /usr/sbin/sshd
root 2849 2779 0 14:00 ? 00:00:00 /bin/sh /usr/bin/startkde
root 2903 2849 0 14:00 ? 00:00:00 /usr/bin/ssh-agent /bin/sh -c exec -l /bin/bash -c "/usr/bin/dbus-launch
--exit-with-session /etc/X11/xinit/Xclients"
root 3062 3061 0 14:00 pts/1 00:00:00 /bin/bash
root 3089 2393 0 14:01 ? 00:00:02 sshd: root@pts/2
root 3093 3089 0 14:01 pts/2 00:00:00 -bash
ptr 3123 3122 0 14:02 pts/2 00:00:00 -bash
root 5055 2393 0 14:28 ? 00:00:00 sshd: root@pts/3
root 5063 5055 0 14:28 pts/3 00:00:00 -bash
ptr 15980 3123 0 15:11 pts/2 00:00:00 grep sh
[ptr@srva ~]$
Search for a pattern at the beginning of a line
The following example matches all lines in the ps -ef output that start with the string ptr:
[ptr@srva ~]$ ps -ef | grep "^ptr"
ptr 3123 3122 0 14:02 pts/2 00:00:00 -bash
ptr 3256 3123 0 14:17 pts/2 00:00:00 ps -ef
ptr 3257 3123 0 14:17 pts/2 00:00:00 grep ^ptr
[ptr@srva ~]$
Search for a pattern at the end of a line
The following example matches all lines in the ps -ef output that end in bash:
[ptr@srva ~]$ ps -ef | grep "sh$"
root 3062 3061 0 14:00 pts/1 00:00:00 /bin/bash
root 3093 3089 0 14:01 pts/2 00:00:00 -bash
ptr 3123 3122 0 14:02 pts/2 00:00:00 -bash
root 5063 5055 0 14:28 pts/3 00:00:00 -bash
[ptr@srva ~]$
Search for a pattern containing a range of characters
The following example matches all lines that contain a number in the range 1 to 6, followed by any single character, followed by a “d”.
[ptr@srva ~]$ ls -l /etc | grep "[0-6].d"
drwxr-xr-x 4 root root 4096 May 10 2012 dbus-1
drwxr-xr-x 2 root root 4096 Feb 2 17:35 default
drwxr-xr-x 2 root root 4096 May 10 2012 depmod.d
drwxr-xr-x 3 root root 4096 May 10 2012 dev.d
-rw-r--r-- 1 root root 178 Mar 6 2011 dhcp6c.conf
-rw-rw-r-- 1 root disk 0 Mar 6 2011 dumpdates
lrwxrwxrwx 1 root root 10 May 10 2012 rc0.d -> rc.d/rc0.d
lrwxrwxrwx 1 root root 10 May 10 2012 rc1.d -> rc.d/rc1.d
lrwxrwxrwx 1 root root 10 May 10 2012 rc2.d -> rc.d/rc2.d
lrwxrwxrwx 1 root root 10 May 10 2012 rc3.d -> rc.d/rc3.d
lrwxrwxrwx 1 root root 10 May 10 2012 rc4.d -> rc.d/rc4.d
lrwxrwxrwx 1 root root 10 May 10 2012 rc5.d -> rc.d/rc5.d
lrwxrwxrwx 1 root root 10 May 10 2012 rc6.d -> rc.d/rc6.d
[ptr@srva ~]$
We can see that the first 6 matching lines are matching on the number at the end of the modification time follwed by a space and the d from the first letter of the file/directory name.
Search for a pattern containinga dot
If we wanted to match just the lines that contain a number followed by “.d” then we need to escape the dot “.”
[ptr@srva ~]$ ls -l /etc | grep "[0-6]\.d"
lrwxrwxrwx 1 root root 10 May 10 2012 rc0.d -> rc.d/rc0.d
lrwxrwxrwx 1 root root 10 May 10 2012 rc1.d -> rc.d/rc1.d
lrwxrwxrwx 1 root root 10 May 10 2012 rc2.d -> rc.d/rc2.d
lrwxrwxrwx 1 root root 10 May 10 2012 rc3.d -> rc.d/rc3.d
lrwxrwxrwx 1 root root 10 May 10 2012 rc4.d -> rc.d/rc4.d
lrwxrwxrwx 1 root root 10 May 10 2012 rc5.d -> rc.d/rc5.d
lrwxrwxrwx 1 root root 10 May 10 2012 rc6.d -> rc.d/rc6.d
[ptr@srva ~]$
Search for a pattern in a specific “field”
In the following scenario we would like to match all long listing entries for files in /etc that have a size beginning with a 2. The files in /etc/ that matched this requirement at the time of carryimng out this challenge were as follows:
-rw-r--r-- 1 root root 2562 May 24 2008 a2ps-site.cfg
-rw-r--r-- 1 root root 298 Mar 28 2007 anacrontab
-rw-r--r-- 1 root root 2518 Jul 22 2011 DIR_COLORS
-rw-r--r-- 1 root root 2420 Jul 22 2011 DIR_COLORS.xterm
-rw-r--r-- 1 root root 22060 Jan 7 2007 fb.modes
lrwxrwxrwx 1 root root 22 May 10 2012 grub.conf -> ../boot/grub/grub.conf
-rw-r--r-- 2 root root 241 Feb 16 13:36 hosts
-rw-r--r-- 1 root root 235 Feb 3 09:47 hosts.allow
-rw-r--r-- 1 root root 293 Jul 22 2011 idmapd.conf
-rw-r--r-- 1 root root 28 Oct 8 2006 ld.so.conf
-rw-r--r-- 1 root root 2506 Jan 31 16:54 libuser.conf
-rw-r--r-- 1 root root 262 Jul 4 2011 lisarc
-rw-r--r-- 1 root root 293 Jan 7 2007 mailcap
-rw-r--r-- 1 root root 2706 Jul 22 2011 multipath.conf
-rw-r--r-- 1 root root 25 Jan 31 13:28 pam_smb.conf
-rw-r--r-- 1 root root 2431 Feb 2 17:45 passwd
-rw------- 1 root root 2489 Feb 2 13:37 passwd-
-rw-r--r-- 1 root root 2875 Jan 7 2007 pinforc
-rw-r--r-- 1 root root 220 May 4 2011 quotagrpadmins
-rw-r--r-- 1 root root 290 May 4 2011 quotatab
-rw-r--r-- 1 root root 27 Aug 29 2011 redhat-release
-rw-r--r-- 1 root root 216 Apr 3 2010 sestatus.conf
-rw-r--r-- 1 root root 21851 Jan 6 2007 slrn.rc
-rw-r--r-- 1 root root 2643 Jan 7 2007 tux.mime.types
-rw-r--r-- 1 root root 2657 May 4 2011 warnquota.conf
The first command we put together is:
[ptr@srva ~]$ ls -l /etc | grep "root 2"
-rw-r--r-- 1 root root 22060 Jan 7 2007 fb.modes
-rw-r--r-- 1 root root 21851 Jan 6 2007 slrn.rc
[ptr@srva ~]$
This matches only two of the lines we are after. The pattern “root 2” has exactly 3 spaces between the string root and 2. The challenge we have here is that we need the string root to indicate which number in the line we are trying to match (otherwise it would potentially match a 2 anywhere in the line and not just the size column), but we then have a varying number of spaces between the string root and the 2. Some have 3, some have 4, some have 5, and so on.
This is a job for asterisk *. Asterisk is effectively a padding character as it applies a replication to the previous character. The following example will match the string root followed by 0 or more spaces:
[ptr@srva ~]$ ls -l /etc | grep "root *2"
-rw-r--r-- 1 root root 2562 May 24 2008 a2ps-site.cfg
-rw-r--r-- 1 root root 298 Mar 28 2007 anacrontab
-rw-r--r-- 1 root root 2518 Jul 22 2011 DIR_COLORS
-rw-r--r-- 1 root root 2420 Jul 22 2011 DIR_COLORS.xterm
-rw-r--r-- 1 root root 22060 Jan 7 2007 fb.modes
lrwxrwxrwx 1 root root 22 May 10 2012 grub.conf -> ../boot/grub/grub.conf
-rw-r--r-- 2 root root 241 Feb 16 13:36 hosts
-rw-r--r-- 1 root root 235 Feb 3 09:47 hosts.allow
-rw-r--r-- 1 root root 293 Jul 22 2011 idmapd.conf
-rw-r--r-- 1 root root 28 Oct 8 2006 ld.so.conf
-rw-r--r-- 1 root root 2506 Jan 31 16:54 libuser.conf
-rw-r--r-- 1 root root 262 Jul 4 2011 lisarc
-rw-r--r-- 1 root root 293 Jan 7 2007 mailcap
-rw-r--r-- 1 root root 2706 Jul 22 2011 multipath.conf
-rw-r--r-- 1 root root 25 Jan 31 13:28 pam_smb.conf
-rw-r--r-- 1 root root 2431 Feb 2 17:45 passwd
-rw------- 1 root root 2489 Feb 2 13:37 passwd-
-rw-r--r-- 1 root root 2875 Jan 7 2007 pinforc
-rw-r--r-- 1 root root 220 May 4 2011 quotagrpadmins
-rw-r--r-- 1 root root 290 May 4 2011 quotatab
-rw-r--r-- 1 root root 27 Aug 29 2011 redhat-release
-rw-r--r-- 1 root root 216 Apr 3 2010 sestatus.conf
-rw-r--r-- 1 root root 21851 Jan 6 2007 slrn.rc
-rw-r--r-- 1 root root 2643 Jan 7 2007 tux.mime.types
-rw-r--r-- 1 root root 2657 May 4 2011 warnquota.conf
[ptr@srva ~]$
Now we get all of the files we wanted to match.
Now we add a new file to /etc that is called root2. Running the same command as above will result in this file being matched too:
[ptr@srva ~]$ ls -l /etc | grep "root *2"
-rw-r--r-- 1 root root 2562 May 24 2008 a2ps-site.cfg
-rw-r--r-- 1 root root 298 Mar 28 2007 anacrontab
-rw-r--r-- 1 root root 2518 Jul 22 2011 DIR_COLORS
-rw-r--r-- 1 root root 2420 Jul 22 2011 DIR_COLORS.xterm
-rw-r--r-- 1 root root 22060 Jan 7 2007 fb.modes
lrwxrwxrwx 1 root root 22 May 10 2012 grub.conf -> ../boot/grub/grub.conf
-rw-r--r-- 2 root root 241 Feb 16 13:36 hosts
-rw-r--r-- 1 root root 235 Feb 3 09:47 hosts.allow
-rw-r--r-- 1 root root 293 Jul 22 2011 idmapd.conf
-rw-r--r-- 1 root root 28 Oct 8 2006 ld.so.conf
-rw-r--r-- 1 root root 2506 Jan 31 16:54 libuser.conf
-rw-r--r-- 1 root root 262 Jul 4 2011 lisarc
-rw-r--r-- 1 root root 293 Jan 7 2007 mailcap
-rw-r--r-- 1 root root 2706 Jul 22 2011 multipath.conf
-rw-r--r-- 1 root root 25 Jan 31 13:28 pam_smb.conf
-rw-r--r-- 1 root root 2431 Feb 2 17:45 passwd
-rw------- 1 root root 2489 Feb 2 13:37 passwd-
-rw-r--r-- 1 root root 2875 Jan 7 2007 pinforc
-rw-r--r-- 1 root root 220 May 4 2011 quotagrpadmins
-rw-r--r-- 1 root root 290 May 4 2011 quotatab
-rw-r--r-- 1 root root 27 Aug 29 2011 redhat-release
-rw-r--r-- 1 root root 0 Mar 17 15:34 root2
-rw-r--r-- 1 root root 216 Apr 3 2010 sestatus.conf
-rw-r--r-- 1 root root 21851 Jan 6 2007 slrn.rc
-rw-r--r-- 1 root root 2643 Jan 7 2007 tux.mime.types
-rw-r--r-- 1 root root 2657 May 4 2011 warnquota.conf
[ptr@srva ~]$
This is because the asterisk (*) represents zero or more of the previous character. To ensure that we get at least one space before the 2 we must add an extra space (space, space, asterisk):
ls -l /etc | grep "root *2"
Now we will get the correct lines.
This command line could be improved further to cater for other directories where there may be varying owners of files:
[ptr@srva ~]$ ls -l | grep "[a-z] *2[0-9]* [A-Z]"
-rw-rw-rw- 1 ptr ptr 29 Feb 1 2016 f10
-rw-r--r-- 1 ptr ptr 27719 Mar 17 17:33 rpmpkgs
-rw-r--r-- 1 root root 27719 Mar 17 17:33 rpmpkgs.1
-rw-r--r-- 1 root root 27719 Mar 17 17:33 rpmpkgs.2
-rw-r--r-- 1 root root 27719 Mar 17 17:33 rpmpkgs.3
-rw-r--r-- 1 root root 29989 Mar 17 17:33 rpmpkgs.4
-rw-r--r-- 1 ptr ptr 261 Mar 17 17:33 vboxadd-install.log
[ptr@srva ~]$
The above pattern looks for lines that contain a lowercase letter (from the end of the group owner column), followed by one or more spaces, followed by a 2 and then zero or more digits (sizes of single or more digits beginning with a 2), followed by one space (the column separator between the size column and the modification time column, and finally followed by an uppercase letter to ensure it is the modification time column rather than the owner column that is matched.
Command Line Options for grep
The grep command offers a lot of options, here are a few of them:
-r Search a directory recursively
-l Display names of files with matching lines
-i Ignore case
-v Match lines that do not contain the pattern
-c Display Matching lines with a count of how many occurrences
The following example shows a list of filenames for files in the directory /etc that contain the pattern centos1:
[root@centos1 ~]# grep -rl centos1 /etc/*
/etc/default/grub
/etc/fstab
/etc/grub2.cfg
/etc/hostname
/etc/lvm/archive/centos_centos1_00000-1405482984.vg
/etc/lvm/archive/SalesVG_00000-1684602635.vg
/etc/lvm/archive/SalesVG_00001-1891684174.vg
/etc/lvm/archive/SalesVG_00002-134759568.vg
/etc/lvm/backup/centos_centos1
/etc/lvm/backup/SalesVG
/etc/mtab
[root@centos1 ~]#
The following example shows matching lines from the set output that contain the string name in any case:
[root@centos1 ~]# set | grep -i name=
HOSTNAME=centos1.ptr.local
LOGNAME=root
local remote_opts="--username= --config-dir= --no-auth-cache";
--no-auth-cache --username=
[root@centos1 ~]#
The following example shows all who output lines that do not contain the pattern root:
[root@centos1 ~]# who
ptr :0 2017-03-31 11:01 (:0)
ptr pts/0 2017-03-31 11:02 (:0)
root pts/1 2017-03-31 11:07 (1.0.0.116)
[root@centos1 ~]# who | grep -v root
ptr :0 2017-03-31 11:01 (:0)
ptr pts/0 2017-03-31 11:02 (:0)
[root@centos1 ~]#
The following example shows how many lines in each file in and below the /etc/lvm directory contain the pattern centos1:
[root@centos1 ~]# grep -cr centos1 /etc/lvm
/etc/lvm/archive/centos_centos1_00000-1405482984.vg:4
/etc/lvm/archive/SalesVG_00000-1684602635.vg:1
/etc/lvm/archive/SalesVG_00001-1891684174.vg:1
/etc/lvm/archive/SalesVG_00002-134759568.vg:2
/etc/lvm/backup/centos_centos1:4
/etc/lvm/backup/SalesVG:7
/etc/lvm/lvm.conf:0
/etc/lvm/lvmlocal.conf:0
/etc/lvm/profile/cache-mq.profile:0
/etc/lvm/profile/cache-smq.profile:0
/etc/lvm/profile/command_profile_template.profile:0
/etc/lvm/profile/metadata_profile_template.profile:0
/etc/lvm/profile/thin-generic.profile:0
/etc/lvm/profile/thin-performance.profile:0
[root@centos1 ~]#
Boost Your Linux/Unix System Administrator Toolbox
grep is a hugely powerful tool that a Linux or UNIX system administrator cannot live without. egrep extends this to provide even more potential.
We will take a look at egrep and fgrep in some later articles.