JDiskReport has a tab to display disk usage by file type, but the type data is based on file extensions, not actual content.
Otherwise here's a script that uses file
to determine types:
$ ./disk_usage_by_file_type -c /dir/to/analyze Collecting file type data, please wait ... Done. Now run 'disk_usage_by_file_type -s' to print disk usage.
(will take a while if directory is big)
$ ./disk_usage_by_file_type -s ... 154 Mb : application|pdf; charset=binary 170 Mb : video|x-msvideo; charset=binary 227 Mb : application|x-iso9660-image; charset=binary 690 Mb : application|octet-stream; charset=binary 810 Mb : audio|mpeg; charset=binary
To get a list of all files + sizes for a given type(s), sorted by file size:
$ ./disk_usage_by_file_type -d 'image|jpeg' | sort -n ... 590: /share/pictures/screenshot.jpg 1017: /share/pictures/cd_cover/Wheel cutout+drop.jpg 16496: /share/pictures/photos/landscape.jpg 17642: /share/pictures/photos/contrast.jpg