According to the Motor Theory of Speech Perception (MTSP), listeners perceive speech by way of the articulatory gestures they would perform themselves in producing a similar signal. The theory postulates a module that allows extracting gestural information from the signal. The gestures constitute the event perceived.
According to the Modulation Theory (MDT), speech is modulated voice. Listeners perceive it by demodulating the signal. The properties of the voice convey nonlinguistic information while the linguistically coded information is conveyed by its modulation. The modulation pattern constitutes the linguistic event perceived.
The theories agree in requiring a linkage or mapping between perception and production. According to MDT, phonetically labeled links between exteroception and proprioception (mirror and echo neurons) are established in the brain during speech acquisition. The set of links embodies the knowledge of the relation. While MDT describes the device that MTSP would need in order to be implemented, it makes it redundant to recruit the motor system. Demodulation is also necessary in speechreading and in order to perceive sign language, when a face or body is 'modulated’ instead of a voice. In audiovisual speech perception, there are two percepts: a normally dominant vocal one and a gestural one that does not need to agree with it. MTSP knows of only one of these. It is concluded that all the specific claims of MTSP are false while MDT rests on ‘first principles’.
2007. 17-20 p.