Winter Internship Report
Designing a Part Of Speech Tagger
NIT Patna, Bihar
Under the Guidance of
Dr. A. K. Singh
Department of Computer Science &
INDIAN INSTITUTE OF TECHNOLOGY
(BANARAS HINDU UNIVERSITY)
VARANASI – 221005
It could be taken as the superset of
machine learning which itself is a superset of deep learning. On a frank scale,
it could be said as the Technology which gives a machine human like
A branch of Artificial Intelligence
which deals with the way of communicating
with a machine/intelligent system with any natural language like English or Hindi.
Giving a computer the ability to learn
without being explicitly programmed on that very interest. Basically, training
a system on the past so that it could predict the output of present/future.
It has two Sub branches –
Machine learning is the superset of
The machines generate their features
by themselves, basically forming Algorithms to mimic human brain.
It is implemented through neural
networks which has a basic unit called perceptron which is the functional unit
of the neural networks.
The basic Structure of a perceptron. At
first the weights are randomly assigned to the inputs.
Compares the output with the given
output and changes the weight correspondingly.
Multiple neural network with several
hidden layers constitute of deep network
Networks that are not cyclic in
nature, i.e. the outputs are independent of each other.
Here, a neuron in a layer is only
connected to a small region of the layer before it. It’s a feed forward neural
network inspired from the visual cortex.
The neural network in which the
present output depends on the previous outputs (Could be understood as an
analogy to Dynamic programming).
Basic structure of a RNN
There are some limitations with RNN
When the change in weight is very very
small i.e(<<<<1), it corresponds to (de/dw)<<<1. The new weight is almost equal to the old one. This is removed by using another neural network known as LONG SHORT TERM MEMORY NETWORKS(LSTMs) Long short term memory networks(lstm) RNN equipped with long term dependencies. WORD2VEC A model that predicts between a center word and context words in terms of word vectors. It comprises of two models: · Skip – Gram model · Continuous Bag of words model Task Designing a Part of Speech tagger. Dataset A merged Bhojpuri dataset containing of sentences of Bhojpuri and the corresponding labels to the words. A sample of the dataset. Tools used · Python 3 · Keras · Tensor Flow Backend After having a thorough understanding of the above listed topics. I have first taken the Word2vec Embeddings of the words with their corresponding sentences. So, I have extracted a sentence and then created the vector word by word. The implementation could be taken as a 2D array with sentences and words. The very same I have done with the labels, I have created a 2D array of the corresponding words in the sentences. A dictionary is being used to map the words and the corresponding labels. For the label Vector Part, The total different tags were used to create the one hot vector, The total number of different labels are 29 in number and namely are: 'NNP', 'NN', 'PSP','NST','VM','JJ','RB', 'RP','CC','VAUX','SYM','RDP','QC','PRP','QF','NEG', 'DEM','RDP','WQ','INJ','CL','ECH','UT','INTF','UNK','NP','VGF','CCP','BLK' Another dictionary is used to map the labels to the vectors. Now, we have to take a sample test data, train the lstm model on that and then predict it on test values. We have encoded the test vector and labels of the test dataset as well which we have used as the validation data. A sequential model has been taken and as the size of the sentence with maximum words came out to be 226 Lstm was trained with an input shape of 226*100 as the vector size is 100 and the maximum size is 226 with the return sequences as True. 29 was passed to the Dense function as there are 29 different tags. After being trained in lstm attention mechanism is applied.