By a language model we mean the joint probability of a sequence of words, which we will refer to as a sentence here. Hence, if we have the sentence $s = x_1, x_2, x_3, ... ,x_n$, where $x_1, x_2,\ldots,x_n$ represents the sequence of words in the sentence, we need to find $p(x_1, x_2, ..., x_n)$. Now we need to make a modelling choice to determine how to compute $p(x_1,...,x_n)$.
In this blogpost we are going to perform some tests on the word2vec embeddings. We will use the pretrained word2vec model available from Google. It has a total of $300$ million words, we will be using only $1,00,000$ from that set.