printlogo
http://www.ethz.ch/index_EN
Welcome
 
print
  

Real-Time Speaker Feature Identification on DSP

Media

 Electronics Lab 

Info_Media

More »»

Job Links

 Open Position »»

Latest News

May 9, 2012
Franz Gravenhorst in 20 minutes ‘Uni-Sport erleichtert die Integration’ read more
May 1, 2012
Daniel Waltisberg joins the Wearable Computing Group.
February 1, 2012
Luisa Petti joins the Wearable Computing Group.

Type: Master's thesis
Student: Mirco Rossi
Advisors: Martin Kusserow, Oliver Amft

autumn_07_RTSpekaer

Up to now, monitoring conversations is conducted manually by observers, by interviews or is restricted to specially equipped rooms. The vision of this thesis is to have a tool which helps to annotate human oral communication anytime and anywhere. While it is carried by a user, this system should be able to log speaker interactions in a conversation in real-time. Speaker identification should be based on one sensor input: A lapel microphone.

In this thesis a real-time, text-independent, and open-set speaker identification system was developed, evaluated, and tested on a DSP platform (TMS320C6713 DSK). The focus was on the algorithms. Components of the system had to be chosen that they fit the mentioned constraints. Especially, real-time capability and low complexity had to be fulfilled by the algorithms. For speech feature extraction LPCC (12 coefficients) were implemented and for speaker modeling and matching a VQ approach (16 centroids per speaker model) was used. In a last step, the system has to decide whether the speaker is identified by the best matching speaker model or if the speaker is unknown. For this task a confidence measure was used. If the tested speaker is classified as unknown, the speaker is learned on-line.

The normalized accuracy of the system was with 90 seconds of training time and 10 seconds of classification time 98%. However, to identify small speech segments classification time should be shorter than 10 seconds. Additionally, to learn a new speaker in a conversation on-line, 90 seconds training time is too long. For the DSP system 20 seconds of training time and 5 seconds of classification time were used. The normalized accuracy results to 81%. On the DSP platform the speaker identification system can recognize 150 speakers in real-time. Furthermore, a new speaker model is calculated in 5 seconds.

 

Wichtiger Hinweis:
Diese Website wird in älteren Versionen von Netscape ohne graphische Elemente dargestellt. Die Funktionalität der Website ist aber trotzdem gewährleistet. Wenn Sie diese Website regelmässig benutzen, empfehlen wir Ihnen, auf Ihrem Computer einen aktuellen Browser zu installieren. Weitere Informationen finden Sie auf
folgender Seite.

Important Note:
The content in this site is accessible to any browser or Internet device, however, some graphics will display correctly only in the newer versions of Netscape. To get the most out of our site we suggest you upgrade to a newer browser.
More information

© 2012 ETH Zurich | Imprint | 22 April 2008
top