Just a fine point detail, most codecs start as specs defining the format, and the methodology by which encoding and decoding occur, including API specs if applicable. From there implementations of both the encoder and decoder may be created. Not all implementations are created equal. For instance the CoreAVC H.264 decoder is multithreaded, whereas many other implementations are not.
V4L2 is primarilly used for creating/capturing video on Linux systems, so it has an encoder built into it (or more likely it references an external one). It's largely used for webcams and video capture/TV cards. See more on Wikipedia. Apps that create raw streams use V4L2 to encode the video to their taste. The decoding is done by a different package (libavcodec perhaps).
ALSA contains codecs, but is itself the support stack for all audio operations in Linux, so wherever you hear sound, ALSA is whats making that happen. I'm sure ALSA has components for audio input (mic/line-in) so it must be capable of some form of encoding, but I've not worked with it.
In your example, I don't believe that V4L2 would be involved at all, since you are using an H.264 encoder, which likely provides much better quality and performance than V4L2. ALSA will be use to play your video back, but will not be used in encoding it.
When you play any sound, that's ALSA at work. When you use your webcam, that's L4V2 at work.