OHSU Technology #0665 "Voice Transformation (High Resolution)"

Inventor: Alexander Kain, Ph.D.




The essence of the system is the following:

1.      Parallel recordings are obtained of a [to-be-transformed] source speaker and a [to-be-mimicked] target speaker.   "Parallel" refers to the fact that the exact same text is used.

2.      An automatic alignment system is used to find the correspondence between the phonemes in the two sets of   recordings.

3.      The target speaker's speech is analyzed using Linear Predictive Coding (LPC), producing LPC coefficients and LPC residuals. Both contain information about the speaker.

4.      A first mapping is trained that maps spectral envelopes of the source speaker on corresponding spectral enveloped of the target speaker.

5.      A second mapping is computed between the target LPC coefficients and the target LPC residuals.

6.      During operation, the first mapping computes target LPC coefficients from input source LPC coefficients. The second mapping computes target LPC residuals from the computed target LPC coefficients. LPC synthesis is used to generate speech from the computed target LPC coefficients and the computed target LPC residuals.


Competitive Advantages

The key invention and improvement over earlier systems is the usage of target LPC residuals. In older systems, this information is not used, yet it adds considerably to the quality of the mimic. For the "foreign accent reduction" system, the plan is to use this method, and train it on a speaker with an Asian Indian accent and a speaker with a US American accent, and test it on speech from the former speaker that was not used during training.




Dr. Kain’s website: 



to view samples:


“Improving the accuracy and quality of speaker transformation systems and designing speaker recognizability perceptual tests (transformation of natural speech: sourcetransformationtarget; transformation of TTS synthesis voices: sourcetransformationtarget)”




A. Kain, "High Resolution Voice Transformation", Ph.D. thesis, OGI School of Science & Engineering at Oregon Health & Science University, 2001.

