Typically I will identify a few traits in a log file that I want to distinguish on, and get percentages from. This can be done easily with sed, replacing anything you don't need in every line, and then counting the occurrence of each. For example to distinguish between Linux and Windows hits in a log file, you could do:
$ cat some.log | sed -r 's/.*(Windows|Linux).*/\1/' | sort | uniq -c | sort -rn 23940 Windows 12390 Linux
This gets you the absolute count for each trait you are looking for, but not percentage so it's not ideal yet.
It seems awk can not easily loop twice over the lines to first compute the total and then output percentages, but with a small hack we can first add a line on top that shows the sum of all matched traits:
$ ... | awk ' END ' Total 36330 Windows 23940 Linux 12390
Finally now that we have the total, we can easily compute and print percentages using this:
$ ... | awk '!max' Total 36330 100.00% Windows 23940 65.90% Linux 12390 34.10%
The combined one-liner would then be:
cat some.log | sed -r 's/.*(Windows|Linux).*/\1/' | sort | uniq -c | sort -rn | awk ' END ' | awk '!max'
Where some.log
is the file you want to inspect, and Windows|Linux
is a pipe-delimited list of terms to match on/distinguish.
Should you want to remove the Total line at the end since it becomes slightly irrelevant, you can append | tail -n +2
to it.