Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some fixes in LM part regarding ngram history length + MLE ngram #13

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

uralik
Copy link

@uralik uralik commented Dec 30, 2017

So given the definition of n-gram the text is 100% correct but in the formulas there are always histories of length n, which is probably a typo. I have also added small explanation about why relative freq. ngram estimator is optimal from the MLE perspective.

lecture_note.tex Outdated
@@ -3568,11 +3568,12 @@ \section{$n$-Gram Language Model}
conditional probability (Eq.~\eqref{eq:unidir_sentence}~(a)) is only conditioned
on the $n-1$ preceding symbols only, meaning
\begin{align*}
p(w_k | w_{<k}) \approx p(w_k | w_{k-n}, w_{k-n+1}, \ldots, w_{k-1}).
% p(w_k | w_{<k}) \approx p(w_k | w_{k-n}, w_{k-n+1}, \ldots, w_{k-1}). % history length should be n-1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this commented line

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

lecture_note.tex Outdated
\end{align*}
This results in
\begin{align*}
p(S) \approx \prod_{t=1}^T p(w_t | w_{t-n}, \ldots, w_{t-1}).
p(S) \approx \prod_{t=1}^T p(w_t | w_{t-n+1}, \ldots, w_{t-1}). % history should have n-1 length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

lecture_note.tex Outdated
\subsection{Smoothing and Back-Off}

{\em Note that I am missing many references this section, as I am writing this
on my travel. I will fill in missing references once I'm back from my
travel.}

The biggest issue of having an $n$-gram that never occurs in the training corpus
is that any sentence containing the $n$-gram will be given a zero probability
is that any sentence containing such $n$-gram will be given a zero probability
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

such an $n$-gram

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants