Multi-dialect phrase-based speech recognition and machine translation for Qatari broadcast TV

This project developed a new set of methods for integrated semantic-parse-based automatic speech recognition and machine translation between Qatari broadcast TV (including Modern Standard Arabic, Qatari Arabic as spoken on Qatari TV, and dialects from across the Arab world as heard on Qatari satellite television talk shows) and English.

For more information, consult the menus at the top of this page, or follow any of these links:

People

  • Shown in the image:
    • Noha Selim, undergraduate in the College of Arts and Sciences, Qatar University
    • Dr. Eiman Mustafawi, Professor of Linguistics and Dean of the College of Arts and Sciences, Qatar University
    • Dr. Mohamed Elmahdy, Post-Doctoral Fellow at Qatar University
    • Dr. Mark Hasegawa-Johnson, Professor of Electrical and Computer Engineering, University of Illinois
    • Heba Al-Kababji, undergraduate in the College of Arts and Sciences, Qatar University
    • Afra Al-Qahtani, graduate of the College of Arts and Sciences, Qatar University
  • Not shown:
    • Mahmoud Abu-Nasser
      GRA, Linguistics, University of Illinois
    • Rania Al-Sabbagh
      GRA, Linguistics, University of Illinois
    • Elabbas Benmamoun
      Professor, Linguistics, University of Illinois
    • C. Roxana Girju
      Professor, Linguistics, University of Illinois
    • Po-Sen Huang
      GRA, ECE, University of Illinois
    • Austin Chen
      GRA, ECE, University of Illinois

Research

Investigators: Mark Hasegawa-Johnson, Eiman Mustafawi, Rehab Duwairi, Elabbas Ben-Mamoun, and Roxana Girju. Funded by Qatar National Research Fund. Research conducted at the University of Illinois and at Qatar University.

We propose novel algorithms for integrated semantic-parse-based automatic speech recognition and machine translation between Qatari broadcast TV (including both Modern Standard Arabic and Qatari Arabic as spoken on Qatari TV) and English. We propose novel algorithms specifically for two tasks. First, we propose novel algorithms for learning the similarities and differences between Qatari Arabic (QA) and Modern Standard Arabic (MSA), for purposes of both automatic speech translation and speech-to-text machine translation, building on our own definitive research in the relative phonological, morphological, and syntactic systems of QA and MSA, and in the application of translation to interlingual semantic parse. Second, we propose a novel efficient and accurate speech-to-text translation system, building on our research in landmark-based and segment-based automatic speech recognition (ASR), and demonstrating a tight integration of segment-based ASR into a chart parser for the generation of machine translation lattices.

Cross-Dialect Transfer Learning

Motivation: the Arabic language can be viewed as a family of related languages, with vocabulary overlap between dialects of only about 67 percent, but with a 90-95 percent overlap among the phoneme inventories. Training ASR in a regional dialect is difficult because the training data are limited. For example, while there are a lot of training data in Modern Standard Arabic, there are few data in Levantine Arabic. The idea of cross dialect transfer learning is to bridge the knowledge from one dialect to the other, assuming that different dialects still have knowledge in common.

Proposed method: Given the fact that some phonemes are shared across different dialects, we are testing a discriminative GMM training method, which will learn model parameters by maximizing an optimality criterion composed of three terms: (1) the mutual information (MMI) between labeled data and their labels, (2) the negative conditional label entropy of unlabeled data, and (3) the affinity between in-dialect and cross-dialect acoustic models for phonemes believed to be similar. For example, if one phoneme P in dialect A is believed to be similar to phoneme Q in dialect B, then the model parameters learned in dialect B are penalized for differences between phonemes P and Q. This knowledge transfer method can be further explored in other contexts, such as transfer in feature space or parameter space.

Qatari Arabic Corpus

Speech was recorded from four Qatari television programs in 2009-2011:

  • Al-Jazeera interviews: 207 minutes (multi-dialect recorded in Qatar; relatively formal)
  • LAKOM: 240 minutes (Moroccan dialect; not yet transcribed)
  • Sabah El-Doha: 110 minutes (multi-dialect recorded in Qatar; relatively informal)
  • Tesaneef 550 minutes (Qatari dialect, extremely informal)

  • Nineteen hours of monaural broadcast speech audio,
    • 16 bits/sample in WAV format,
    • recorded at 44.1kHz sampling rate, but
    • downsampled to 16kHz sampling rate for distribution.
  • Fifteen hours of phonetic transcription
    • Arabic script,
    • fully vowelized,
    • extended with Persian and Urdu characters in order to distinguish phonemes that are not part of the core Arabic orthography.
  • Fifteen hours of English gloss.

Status: Some of the audio is redistributable, but much is not. We are trying to determine intellectual property rights.