Different CPU (IA-32, ARM9 etc.) operations should be equivalent in their nature (move, read, write data etc.).
They should be but they are not. A CPU architecture implements these basic operation according to how its designers envisioned - and this can be very different depending on what the designers are trying to accomplish. Things that may be done in one instruction on some CPUs may require many, many instructions on others.
Could we simply convert an executable file, then execute it? Why it's so resource dependent anyway (why do I need a powerful CPU to emulate other CPU)?
If all you want to do is emulate the CPU, then this can be done and done relatively easily. "Converting" an executable file on the fly is called "dyamic recompliation" and many emulators do that already. Typically though, one wants to emulate an entire platform. This includes hardware other than the CPU, and sometimes that hardware is difficult to emulate (the Atari 2600 TIA, for example) or poorly documented (the NES PPU video hardware or even current GPU hardware) or both. CPUs always function in the context of a platform and usually software expects the CPU + platform together to work in a certain way. The requirements for emulating a platform, and doing so with the strict timing requirements often required, is what's the hard and resource intensive part.