Spectral Parameters to Cued Speech Parameters Mapping : Multi-linear and GMM approaches (applied to French vowels)

Cued Speech (CS) is a visual communication system that uses hand shapes placed in different positions near the face, in combination with the natural speech lip-reading, to enhance speech perception from visual input for deaf people. However one of the important challenges is the question of speech communication between normal hearing people who do not practice CS but produce acoustic speech and deaf people who use lip-reading complemented by CS code for speech perception with no residual audition. In our work, we apply the multi-linear regression approach (MLR) and Gaussian Mixture Model (GMM)-based mapping approach to map acoustic spectral parameters to the hand position in CS and the accompanying lip shape. We hence contributed to the development of automatic translation system in the framework of visual speech synthesis. It proves that the MLR approach is good for estimating the lip parameters from the spectral parameters since there is strong linear correlation between the lip parameters and spectral parameters. However, the performance of MLR approach for estimating the hand position is poor since there is no relationship between the hand positions and spectral parameters. By introducing an intermediate space, it proves that the similar topology structure is the key of the MLR. In order to release the linear constraint of the MLR approach, we apply the GMM-based mapping approach which has both the classification-partition and regression properties. The parameters of GMM are estimated by the supervised, unsupervised and semi-supervised training methods separately in the view of the machine learning theory. The supervised training method shows high efficiency and good robustness. The Minimum Mean Square Error (MMSE) and Maximum A Posteriori Probability (MAP) are used as regression criteria separately in GMM-based mapping approach. It proves that the MLR approach is a special case of GMM-based mapping approach when the number of the Gaussians equals to one. Thus the GMM-based mapping approach can improve the performance significantly in comparison with the MLR by increasing the number of the Gaussians. Finally, a continuous transition achieved by the linear interpolation in the acoustic space is introduced to compare the different mapping approaches used in this work. It shows that the GMM-based mapping approach can perform well thanks to the classification-partitioning property when the source and target data has “no relationship” such as the case of the hand position estimation; and it can also improve the performance by the local regression property when the source and target data has strong correlation such as the case of the lip parameter estimation. Besides, a direct prediction of lip geometry features from the natural image of mouth region-of-interest (ROI) based on the 2D Discrete Cosine Transform (DCT) combined with a Principal Component Analysis (PCA) is proposed. The results show the possibility to estimate the geometric lip features with good accuracy using a reduced set of predictors derived from the DCT coefficients.

Data and Resources

Additional Info

Field Value
Source https://theses.hal.science/tel-00935286
Author Ming, Zuheng
Maintainer CCSD
Last Updated May 7, 2026, 07:09 (UTC)
Created May 7, 2026, 07:09 (UTC)
Identifier NNT: 2013GRENT032
Language fr
Rights https://about.hal.science/hal-authorisation-v1/
contributor Grenoble Images Parole Signal Automatique (GIPSA-lab) ; Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Stendhal - Grenoble 3-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP)-Centre National de la Recherche Scientifique (CNRS)
creator Ming, Zuheng
date 2013-06-24T00:00:00
harvest_object_id 18ac9936-ee65-4309-8a17-32fcda446162
harvest_source_id 3374d638-d20b-4672-ba96-a23232d55657
harvest_source_title test moissonnage SELUNE
metadata_modified 2026-03-31T00:00:00
set_spec type:THESE