This is almost certainly an implementation specific performance bottleneck somewhere. You're correct in that all other things equal having larger MTUs will result in lower overhead and thus higher performance. But other things are rarely equal... here, some part of the forwarding/packet moving code probably has a base buffer size of 5KB or something like that, so when you cross that barrier, all of a sudden you're asking the system to do twice as much work.
One way to support this theory would be to increase the MTU even further. If you see a big drop at 5KB, but then your thruput improves once beyond that, you've almost certainly hit some buffer size threshold somewhere in the code path.