I created an account here so I could thank @BobC for his answer (and his question.) It was the catalyst I needed to solve our longstanding issue with Solr logs.
I modified BobC's script to optimize it a bit for the logrotate use case (using the $xfer_block_size
for ibs
, and an arbitrarily large (8M) obs
, followed by a tr -d "\000"
to eliminate the remaining nulls) and then used it in the firstaction
section of my logrotate
config.
My solution is slightly hacky, I guess, but it's much better than having to bounce critical production services when an 80+ GB log file threatens to fill up the disk...
This is what I ended up with:
#! /bin/bash # truncat.sh # Adapted from @BobC's script http://superuser.com/a/836950/539429 # # Efficiently cat log files that have been previously truncated. # They are sparse -- many null blocks before the interesting content. # This script skips the null blocks in bulk (except for the last) # and then uses tr to filter the remaining nulls. # for f in $@; do fields=( `stat -c "%o %B %b %s" $f` ) xfer_block_size=$ alloc_block_size=$ blocks_alloc=$ size_bytes=$ bytes_alloc=$(( $blocks_alloc * $alloc_block_size )) alloc_in_xfer_blocks=$(( ($bytes_alloc + ($xfer_block_size - 1))/$xfer_block_size )) size_in_xfer_blocks=$(( ($size_bytes + ($xfer_block_size - 1))/$xfer_block_size )) null_xfer_blocks=$(( $size_in_xfer_blocks - $alloc_in_xfer_blocks )) null_xfer_bytes=$(( $null_xfer_blocks * $xfer_block_size )) non_null_bytes=$(( $size_bytes - $null_xfer_bytes )) if [ "$non_null_bytes" -gt "0" -a "$non_null_bytes" -lt "$size_bytes" ]; then cmd="dd if=$f ibs=$xfer_block_size obs=8M skip=$null_xfer_blocks " $cmd | tr -d "\000" else cat $f fi done
Using larger blocks makes dd
orders of magnitude faster. dd
makes a first cut, then tr
trims the rest of the nulls. As a point of reference, for an 87 GiB sparse file (containing 392 MiB data):
# ls -l 2015_10_12-025600113.start.log -rw-r--r-- 1 solr solr 93153627360 Dec 31 10:34 2015_10_12-025600113.start.log # du -shx 2015_10_12-025600113.start.log 392M 2015_10_12-025600113.start.log # # time truncat.sh 2015_10_12-025600113.start.log > test1 93275+1 records in 45+1 records out 382055799 bytes (382 MB) copied, 1.53881 seconds, 248 MB/s real 0m1.545s user 0m0.677s sys 0m1.076s # time cp --sparse=always 2015_10_12-025600113.start.log test2 real 1m37.057s user 0m8.309s sys 1m18.926s # ls -l test1 test2 -rw-r--r-- 1 root root 381670701 Dec 31 10:07 test1 -rw-r--r-- 1 root root 93129872210 Dec 31 10:11 test2 # du -shx test1 test2 365M test1 369M test2
When I let logrotate
process this using copytruncate
, it took most of an hour, and resulted in a fully-materialized non-sparse file -- which then took over an hour to gzip
.
Here's my final logrotate
solution:
/var/log/solr/rotated.start.log { rotate 14 daily missingok dateext compress create firstaction # this actually does the rotation. At this point we expect # an empty rotated.start.log file. rm -f /var/log/solr/rotated.start.log # Now, cat the contents of the log file (skipping leading nulls) # onto the new rotated.start.log for i in /var/log/solr/20[0-9][0-9]_*.start.log ; do /usr/local/bin/truncat.sh $i >> /var/log/solr/rotated.start.log > $i # truncate the real log done endscript }
The hacky bit is that when you first set this up, you have to create an empty rotated.start.log
file, otherwise logrotate
will never pick it up and run the firstaction
script.
I did see your logrotate
bug ticket for which a fix was released in logrotate 3.9.0
. Unfortunately, if I'm reading it correctly, the implemented fix only addresses part of the problem. It correctly copies the sparse log file to create another sparse file. But as you observed, that's not really what we want; we want the copy to exclude all the irrelevant null blocks and retain just the log entries. After the copytruncate
, logrotate
still has to gzip
the file, and gzip
does not handle sparse files efficiently (it reads and processes every null byte.)
Our solution is better than the copytruncate
fix in logrotate 3.9.x
because it results in clean logs that can be easily compressed.