Since you're using a very recent version of ffmpeg, use the command below
ffmpeg -y -i ./images/video%04d.png -i music.wav -profile:v baseline -shortest -vcodec libx264 -s 720x480 -acodec aac -movflags +faststart video_file.mp4
profile
is set to the lowest common denominator baseline
for broad compatibility. The MOOV box is shifted to the head of the file, so the whole MP4 does not need to be downloaded before playback can start.
By default, ffmpeg assigns a framerate of 25 to image sequences, unless specified otherwise. Since Dec 2015, the native AAC encoder is no longer marked as experimental.
Both the size value of 720x480 and HTML5 video size - 640x480 - makes me think you're dealing with NTSC source material. If so, use
ffmpeg -y -framerate 30000/1001 -i ./images/video%04d.png -i music.wav -vf "scale=640x480,setsar=1' -profile:v baseline -shortest -vcodec libx264 -acodec aac -movflags +faststart video_file.mp4