SiliconIntelligence

4.3 Language Model

The accuracy of recognition hypothesis produced by the acoustic model can be further enhanced using a language model. The acoustic model might produce several alternate similar words that the language model helps to disambiguate. Language models are also useful in limiting search time for beam search based acoustic models. N-gram models which predict the probability of a word based on the previous $N-1$ words are a common and effective approach. Current systems like Sphinx and HTK favor models with N=3, which are called trigrams. While there are alternatives to N-gram models that rely on grammar, syntax, subject verb agreement and trigger words, N-gram models have the distinct advantage of being easy to train since N-gram probabilities can be easily estimated from a large corpus of text automatically. A trigram model may be trained simply by using the equation:

\begin{displaymath}
P(w_{3}\vert w_{1},w_{2})=\frac{F(w_{1},w_{2},w_{3})}{F(w_{1},w_{2})}
\end{displaymath}

Here, $F(w_{1},w_{2},w_{3})$ refers to the frequency of occurrence of the trigram $(w_{1},w_{2},w_{3})$ in the training text and $F(w_{1},w_{2})$ refers to the frequency of occurrence of the bigram $(w_{1},w_{2})$. In practice, for a large vocabulary all possible trigrams will not be present in the training corpus. In that case bigram or unigram probabilities are used in the place of trigram probabilities after reducing the probability by a back-off weight, which accounts for the fact that the next higher n-gram has not been seen and therefore has a lower chance of occurring.



Binu Mathew