Superscalar is a term that refers to the architecture of a CPU or single core.
To execute an instruction, a CPU has to perform a number of steps, and each step is handled by a specific dedicated part or "execution unit" of the CPU. Non-superscalar CPUs start from step 0 to X (X can vary depending on the type of instruction) and finish all the steps completely before starting the next instruction.
Superscalar CPUs will try to get new instructions in the "pipeline" at step 0 for example, before the previous instruction has completed all its steps. So the CPU is more fully utilized and faster (and much more complex) because its execution units are always working.
A workload achieves parallelism if it can be divided into units such that more than one of something can work on it effectively. Superscalar architectures attempt to parallelize executing a single instruction stream in a single core/CPU.
Parallel processing is similar in concept, but we are not talking about a CPU's instruction stream anymore, but rather some (likely big) set of data that needs to be processed. How can you split the workload where X amount of CPUs or even separate nodes connected by a network, running the same or different programs, such that the workload is processed most efficiently? What is needed to ensure the CPUs or systems don't step on each other and corrupt data? That's the type of questions parallel processing asks.