There is no uncertainty in the answer and therefore no speculation. If you pick a set of benchmarks, fix the GPU, and profile them, you get a fixed answer
Well, yes. You could do that, if you had the right tools. It's not particularly easy to do power benchmarks without instrumenting the hardware yourself. You can run a simulation if you are NVIDIA or partner and have access to the processor Verilog. But the power consumption depends strongly on layout and implementation details - so it's very much subject to change.
You're talking of optimisation, but optimisation of what? If you're writing GPU code the optimal thing will nearly always be to minimise the total number of instructions executed rather than fiddling with the instruction mix; you have no real way to do that without changing functionality of the code anyway. If you're doing hardware design there are different answers, but you haven't said.