Find the right lines of text in Linux
uniqthe command is fast, flexible and excellent at what it does. However, like many Linux commands, it has a few oddities, which is good as long as you know them. If you take the step without a little inner knowledge, you might scratch your head with the results. We will point out these oddities as we move on.
uniqthe command is perfect for those in the field with a single purpose, designed to do one thing and do it well. That is why it is also particularly suitable for works with pipes and play your part in the control pipes. One of his the most frequent collaborators It is kind because uniq must have input command to work.
Let’s light it!
Running uniq without options
We have a text file that contains the lyrics Robert Johnson song I think I’ll clean my broom. Let’s see what uniqdeal with it.
We will write the following to channel the output Less:
uniq dust-my-broom.txt | Menos
We receive the entire song, including duplicate lines, in Less:
This does not appear to be either single lines or duplicate lines.
Right, because this is the first weirdness. If you run uniqwithout options, it behaves as if you were used -oroption (simple lines). This tells you uniqwhich prints only the single lines of the file. The reason you see duplicate lines is because, for uniq consider a line as a duplicate, it must be adjacent to its duplicate, which is the place kindcomes into play.
When sorting the file, group the duplicate lines and uniq treats as duplicates. We will use kind in the file, we will channel the commanded output uniqand then we will pass the final result to Less.
To do this, we write the following:
ordenar polvo-mi-escoba.txt | uniq | Menos
An ordered list of lines appears Less.
The line «I think I’ll clean my broom» certainly appears in the song several times. In fact, it is repeated twice in the first four lines of the song.
So why does it appear in a list of unique lines? Because the first time a line appears in the file, it is unique; only subsequent entries are duplicated. You can think of this as a list of the first occurrence of each unique line.
To use kindagain and redirect the output to a new file. That way, we don’t have to use it kindin all orders.
We write the following command:
ordenar polvo-mi-escoba.txt> sorted.txt
Each line starts whenever the line appears in the file. However, you will notice that the first line is empty. This tells you that there are five blank lines in the file.
If you want the output to be sorted in numerical order, you can feed the output from uniqon kind. In our example, we will use choice -r(reverse) and -n(numerical ordering) and we will channel the results to Less.
We write the following:
uniq -c sorted.txt | sort -rn | Menos
The list is arranged in descending order according to the frequency of occurrence of each line.
List only duplicate lines
If you only want to see repeated lines in a file, you can use the file doption (repeated). It doesn’t matter how many times a line is duplicated in a file, it only appears once.
To use this option, write the following:
uniq -d sorted.txt
Duplicate lines are listed for us. You will notice the blank line at the top, which means that the file contains duplicate blank lines; it is not a space left uniqto make up the cosmetic list.
We can combine and choice d(repeated) and -c(counting) and channel the output kind. This gives us an ordered list of lines that appear at least twice.
Enter the following to use this option:
uniq -d -c sorted.txt | sort -rn
List of all duplicate lines
If you want to see a list of each duplicate line, as well as an entry for each time a line appears in the file, you can use doption (all duplicate lines).
To use this option, type the following:
uniq -D sorted.txt | Menos
The listing contains an entry for each duplicate line.
If you use –group option, print each duplicate line one blank line before ( prepend) or after each group ( add), or before and after ( both) from each group.
We use add as our modifier, so we write the following:
uniq --group = append sorted.txt | Menos
Groups are separated by blank lines for easy reading.
Check a certain number of characters
Implicitly, uniqcheck the full length of each line. However, if you want to restrict checks to a certain number of characters, you can use the file -woption (check characters).
In this example, we will repeat the last command, but we will limit the comparisons to the first three characters. To do this, type the following command:
uniq -w 3 --group = agregar sorted.txt | Menos
The results and groupings we receive are quite different.
All lines starting with «I b» are grouped together, because those parts of the lines are identical, so they are considered duplicates.
Similarly, all lines beginning with “I’m «are treated as duplicates, even if the rest of the text is different.
Ignore a certain number of characters
There are some cases where it may be helpful to skip a certain number of characters at the beginning of each line, such as when the rows in a file are numbered. Or let’s say you need it uniqskip a timestamp and start checking the lines from the sixth character instead of the first character.
Below is a version of our file arranged with numbered lines.
If we want uniqstart comparison checks for character three, we can use -soption (skip characters) by typing the following:
uniq -s 3 -d -c numerado.txt
Lines are detected as duplicates and are counted correctly. Notice that the line numbers displayed are those of the first occurrence of each duplicate.
You can also omit fields (a series of characters and a white space) instead of characters. We will use -Foption (fields) to say uniqwhat fields to ignore.
We are writing the following to tell you uniqignoring first field:
uniq -f 1 -d -c numerado.txt
We get the same results we got when we said uniqthat we omit three characters at the beginning of each line.
Ignoring the case
Implicitly, uniqis case sensitive If the same letter is uppercase and lowercase, uniq consider that the lines are different.
For example, see the output of the following command:
uniq -d -c sorted.txt | sort -rn
The lines «I think I will clean my broom» and «I think I will clean my broom» are not treated as duplicates because of the difference in case «B» of «believe».
However, if we include themoption (ignore uppercase), these lines will be treated as duplicates. We write the following:
uniq -d -c -i sorted.txt | sort -rn
The lines are now treated as duplicates and grouped.
Linux offers you a multitude of special utilities. Like many of them, uniqIt is not a tool that you will use every day.
That’s why much of becoming a Linux expert remembers which tool will solve your current problem and where you can find it again. However, if you practice, you will be on the right track.
Or you can search anytime How-To Geek; we probably have an article about that.