How to matrix multiply the text?-NLP

What would matrix multiplication look like if you had words instead of numbers?

All the machine learning algorithms developed so far are based on linear algebra, calculus probability and statistics. So, all of them process numbers. However, in Natural Language Processing we are required to process text. After all, people do not always communicate in numbers.

Today we are going to learn about methods to convert words in a text into numbers. This process is called Feature Extraction. Consider the following text:

First, we select the unique words from the text and the list obtained is termed Vocabulary.

Now we have words let’s convert them into numbers.

Integer Representation

One of the easiest methods would be to simply assign a number to each of them.

There we are. We just represented the words using numbers. But there is a subtle problem with this method. We know that 8>4 because numbers have got their own values but we are assigning 8 to the word “dog” and 4 to the word “fox”. It does not sound fair to me. There is no reason for assigning the “dog” a larger value than “fox”. So the ordering does not make sense. Besides the number 8 does not tell anything about “dog”. That means we are not capturing the semantics of the word.

One-hot-vector

Another method to represent the word is one-hot-vector. In this method, we create a vector of size V (V is vocabulary size) for each word and insert 1 at the index of the given word and 0 everywhere else.

In the diagram above, the word “fox” is represented by a vector rather than a single number. Here, each vector tells that either it is a given word or not placing 1 at its index position and 0 elsewhere.
It is an improved method as compared to integer representation as it does not assume implied ordering. However, it is still unable to capture the word semantics. The vectors do not tell anything about their relationship. If we calculate Euclidean Distance between two vectors, we always get 1. Besides, a large amount of memory is getting wasted to store a bunch of zeros.

Therefore we need a method that captures word semantics and is memory efficient too. That’s where word embeddings come to the rescue.

Word Embeddings

For some time now, we have been complaining that our methods do not capture the semantics of the word. Let’s observe what are these semantics anyway.

If you are given the following five words, how would you rate them in terms of the “positivity” scale?

Here is my rating.

If a word gets a larger score, it is considered a positive word and vice versa. So the word “worry” got -4.4 and the word “excited” got 3. The word “paper” sounded neutral to me and got a score of 0.1. Now the numerical value has got some meaning associated with the word. This “positivity” of the word is semantics and the number is actually a vector with a single dimension.

Let’s add another dimension, Gender.

We have added one more dimension and the vectors are two dimensional. This new dimension:- Gender, represents one more semantic of the word. We can see that similar words are near to each other in vector space. The meaning of the words are embedded in these vectors and thus called word embeddings.

If we keep adding more dimensions, say 100, each word is represented by a hundred-dimensional vector space. However, the real world categorical meanings like “positivity” and “gender” may not be apparent to us. That means, in word embeddings, we do not require to know what each dimension refer to as long as we have vectors for each word.

The question is, how do we get these embeddings? It is obviously not possible to provide scale value to each word in vocabulary in 100 categories. Some of the most popular methods for creating word embeddings are:

Classical Methods

  • Word2Vec
  • Continuous Bag Of Word(CBOW)
  • Continuous Skip Gram
  • Global Vectors (GloVe)
  • fastText

Deep Learning Methods

  • BERT
  • ELMO
  • GPT-2

Now we have got an idea of how words are represented numerically(using vectors). All matrix operations are done using these representations and we don’t have to really multiply the texts.

References

Word Embedding, Coursera

https://coursera.org/share/15b44d98830ffc2aa9c39aace8040c5c

Dhruvil Karani (Sep 1, 2018), Introduction to Word Embedding and Word2Vec

https://towardsdatascience.com/introduction-to-word-embedding-and-word2vec-652d0c2060fa

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store