System for Automatic Transcription of Sessions of the Polish Senate

Szczegóły
Abstrakt

Tytuł:: System for Automatic Transcription of Sessions of the Polish Senate
Autorzy:: Marasek, K.
Koržinek, D.
Brocki, Ł.
Data publikacji:: 2014
Słowa kluczowe:: large vocabulary speech recognition
language modeling
transcription
transliteration
subtitles
Język:: angielski
Dostawca treści:: BazTech
: Artykuł

This paper describes research behind a Large-Vocabulary Continuous Speech Recognition (LVCSR) system for the transcription of Senate speeches for the Polish language. The system utilizes several components: a phonetic transcription system, language and acoustic model training systems, a Voice Activity Detector (VAD), a LVCSR decoder, and a subtitle generator and presentation system. Some of the modules relied on already available tools and some had to be made from the beginning but the authors ensured that they used the most advanced techniques they had available at the time. Finally, several experiments were performed to compare the performance of both more modern and more conventional technologies.

Informacja

System for Automatic Transcription of Sessions of the Polish Senate