This is because when we say a word, like "ooh", our lips stay in the "oo" shape for a while, even after we have stopped making any "oo" sounds. Junior animators will often snap back to a default mouth position, just a couple of frames after the sound stops, instead of holding the pose for longer, and staying in the mouth shape.
Try it yourself. Say the word "shoo" and see how long your lips hold the "oo" shape. It's probably about 6-8 frames longer than the sound itself lasts.








