Assignment 4 — Feb 21

Assignment Link

I want to create an interactive database where users can navigate through the top 1,000 quotes from GoodReads and they are clustered based on psychological theme

I love looking at quotes on goodread and have this belief that there is psychology or spiritual utility in reading quotes. And I often wonder what the common themes and emotions are within the quotes and passages that resonate most.

So I want to sort quotes thematically in space and see what the similarities and clusters are

The input will be text data pulled from goodreads. The output prediction will be to ‘vectorize’ these quotes in two dimensions based on similarity. Maybe it is unsupervised so I don’t have a say in what the quotes are arranged based on. But ideally I would like control over the dimension of similarity: i want it to be based on the psychological similarity of the content. I wonder how I would control for this

I don’t know what kind of learning task it is actually! I guess one thing I could try would be to develop my own psychological labels, and then it would be a classification problem. But since I ideally would not start with labels, I think it is a regression problem.

I think the hardest part of this project might be gathering and cleaning the data! And then obviously organizing 1,000 references in a visually interesting and coherent way will be hard too. As far as i understand machine learning, the task itself doesn’t seem super complicated as long as I find the right model and approach