Latest posts

Language Models

I have been learning NLP from the lectures by Professor Dan Jurafsky & Chris Manning on youtube. In this blog post we will develop a python class for building a language model.

What is a language model?

By a language model we mean the joint probability of a sequence of words, which we will refer to as a sentence here. Hence, if we have the sentence $s = x_1, x_2, x_3, ... ,x_n$, where $x_1, x_2,\ldots,x_n$ represents the sequence of words in the sentence, we need to find $p(x_1, x_2, ..., x_n)$. Now we need to make a modelling choice to determine how to compute $p(x_1,...,x_n)$.

Testing

In this blogpost we are going to perform some tests on the word2vec embeddings. We will use the pretrained word2vec model available from Google. It has a total of $300$ million words, we will be using only $1,00,000$ from that set.

Tests

  1. word2vec is advertised by the king, queen example, i.e., king

Web Analytics