With GNU uniq, which supports the -w
option:
$ cat data zikla13:Oct:20:22:34 zikla13:Oct:5:00:31 zikla14:Oct:17:22:01 zikla14:Oct:12:23:35 zikla14:Oct:12:23:34 zikla14:Oct:12:00:11 zikla14:Oct:11:23:52 zikla14:Oct:5:22:22 zilka13:Oct:13:23:48 zilka13:Oct:11:00:28 zilka13:Oct:9:22:40 $ uniq -c -w7 data 2 zikla13:Oct:20:22:34 6 zikla14:Oct:17:22:01 3 zilka13:Oct:13:23:48
As pointed out in the comments, that assumes the first field is always seven characters, which it is in your example, but if it's not in real life, I don't think there's a way to do it with uniq (plus if you don't have GNU uniq, even -w
won't work), so here's a perl solution:
$ perl -ne '/(.*?):(.*)/;unless (exists $x{$1}){$x{$1}=[0,$2];push @x, $1};$x{$1}[0]++;END[0],$_,$x{$_}[1]) foreach @x}' <data 2 zikla13:Oct:20:22:34 6 zikla14:Oct:17:22:01 3 zilka13:Oct:13:23:48
Here's how that works:
$ perl -ne
Run perl, not printing each line by default, and use the next argument as the script.
/(.*?):(.*)/
Split the input line into the stuff before the first colon and the stuff after the first colon, into $1
and $2
. split
would work here as well.
unless (exists $x{$1}){$x{$1}=[0,$2];push @x, $1}
The hash %x
is going to be used to uniquify the lines and array @x
to keep them in order (you could just use sort keys %x
, but that assumes perl's sort
will sort in the same way as the input is sorted.) So if we've never seen the current "key" (the stuff before the first colon), initialize a hash entry for the key and push the key on @x
. The hash entry for each key is a two-element array containing the count and the first value seen after the colon, so the output can contain that value.
$x{$1}[0]++
Increment the count.
END{
Start a block that will be run after all the input has been read.
printf("%8d %s:%s\n",$x{$_}[0],$_,$x{$_}[1])
Print the count, padded with spaces, a space, the "key", a colon, and the stuff from after the colon.
foreach @x}
Do that for each key seen, in order and end the END block.
<data
Read from the file called data in the current directory to get the input. You could also just pipe into perl if you have some other command or pipeline producing the data.