Midterm Project

By Kavin, Stanley, Noah, and Wallyson

Our topic was to create a model that can give recommendations for a person based on the articles they read!

We decided to approach this problem with an LDA model, which stands for Latent Dirichlet allocation. This model works by going through topics and selecting words that fit those topics. We first pre-processed our data, which means removing all the unnecessary words from the beginning and the end of the documents, like the citations and acknowledgments. Then we defined two functions. The first function was used to read and open all our data or files. The second function was our model. In the second function, we converted the words in the documents into a matrix which was fed into the LDA model. Then we used the cosine similarity to determine which documents had the highest similarity scores, and those were picked for the recommendation. We then used our model on our data. First we used the glob function to read all our files. Next, we split our data into training and testing data. In the training data, we put all but one document, and in the testing data, we put one document and was to simulate a person reading the article. However, this can easily be changed if the problem now became write a model that gives recommendations after ten articles a user has read.

Our resources

Scikit-learn library Numpy Library

Midterm Project

By Kavin, Stanley, Noah, and Wallyson

Our topic was to create a model that can give recommendations for a person based on the articles they read!

Our resources

View Our Code!