-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some fixes in LM part regarding ngram history length + MLE ngram #13
base: master
Are you sure you want to change the base?
Conversation
lecture_note.tex
Outdated
@@ -3568,11 +3568,12 @@ \section{$n$-Gram Language Model} | |||
conditional probability (Eq.~\eqref{eq:unidir_sentence}~(a)) is only conditioned | |||
on the $n-1$ preceding symbols only, meaning | |||
\begin{align*} | |||
p(w_k | w_{<k}) \approx p(w_k | w_{k-n}, w_{k-n+1}, \ldots, w_{k-1}). | |||
% p(w_k | w_{<k}) \approx p(w_k | w_{k-n}, w_{k-n+1}, \ldots, w_{k-1}). % history length should be n-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this commented line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
lecture_note.tex
Outdated
\end{align*} | ||
This results in | ||
\begin{align*} | ||
p(S) \approx \prod_{t=1}^T p(w_t | w_{t-n}, \ldots, w_{t-1}). | ||
p(S) \approx \prod_{t=1}^T p(w_t | w_{t-n+1}, \ldots, w_{t-1}). % history should have n-1 length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
lecture_note.tex
Outdated
\subsection{Smoothing and Back-Off} | ||
|
||
{\em Note that I am missing many references this section, as I am writing this | ||
on my travel. I will fill in missing references once I'm back from my | ||
travel.} | ||
|
||
The biggest issue of having an $n$-gram that never occurs in the training corpus | ||
is that any sentence containing the $n$-gram will be given a zero probability | ||
is that any sentence containing such $n$-gram will be given a zero probability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
such an
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
So given the definition of n-gram the text is 100% correct but in the formulas there are always histories of length n, which is probably a typo. I have also added small explanation about why relative freq. ngram estimator is optimal from the MLE perspective.