RAIDZ1 performance vs. conventional RAID5
Why do I get lower read than write performance? Shouldn't write converge to the speed of 3 discs and read to the speed of 4 discs, like a RAID5?
See this thread on ServerFault:
RAIDZ with one parity drive will give you a single disk's IOPS performance, but n-1 times aggregate bandwidth of a single disk.
And this comment:
I have a significant amount of experience with this, and can confirm for you that in most situations, RAIDZ is NOT going to outperform the same number of disks thrown into a traditional RAID5/6 equivalent array.
Your disks can sustain about 145 MB/s sequentially, so your theoretical results should be 435 MB/s. I would say that pretty closely matches your results.
L2ARC cache for sequential reads
Why doesn't kick the l2arc kick in? After multiple reads with no other data being read, I would have expected a read performance similar to the 1GB/s of the SSD RAID0.
Have a look at this mailing list post:
Is ARC satisfying the caching needs?
and
Post by Marty Scholes Are some of the reads sequential? Sequential reads don't go to L2ARC.
So, your main reasons are:
- Your (random) load is already served from ARC and L2ARC is not needed (because your data was always the same and can stay in ARC completely). Idea behind that: ARC is much faster than L2ARC (RAM vs. SSD), so your first choice for reads is always ARC, you need L2ARC only because your active data is too big for memory, but random disk access is too slow on spinning disks.
- Your benchmark was sequential in nature and thus not served from L2ARC. Idea behind that: sequential reads would poison the cache, because a single big file read would fill the cache completely and remove millions of small blocks from other users (ZFS is optimized for concurrent random access of many users), while not having any effect on your speed on the first read. On the second read it would be speed up, but normally you do not read large files twice. Maybe you can modify the behavior with ZFS tuneables.
Various questions
Should I use part of the SSDs for ZIL?
A separate SLOG device will only help for random synchronized writes, nothing else. To test this it is quite simple - set your benchmark file system property sync
to disabled
: zfs set sync=disabled pool/fs
, then benchmark again. If your performance is now suddenly great, you will benefit. If it does not change much, you won't.
PS: Why does cache look like a pool named cache, not something that belongs to the pool data?
I think it is that way because those extra devices (spares, caches, slog devices) can also consist of multiple vdevs. For example, if you have a mirrored slog device, you would have the same 3 levels like your normal disk (log - mirror - disk1/disk2).