For this project I wanted to experiment with using ML to ‘vectorize’ words in a two-dimensional semantic space. I wanted to visualize how I think language works.

As a starting point for vectorizing words I wanted to start with 1 dimension, and I wanted to limit the subset of words so that the output is semantically legible to the user. So I generated a 1 dimensional space where users can enter animal names to see how they relate to one another by similarity spacially. I also thought it would be interesting to not only ENTER animals but click between spaces to see what particular animal sits between a space. This uses the LLAMA LLM

Screen Recording 2025-03-07 at 8.58.42 AM.mov

This was cool and more to the point the position of the animals relative to one another feel meaningful. Then I tried to create the same thing in two dimensions using an LLM but for whatever reason I couldn’t get the words to vectorize in two dimensions. After some digging I learned that I’d need to use a specific kind of model that allows me to select the number of dimensions that I want to vectorize words within, and found that UMAP was most common for something like this.

My first attempt placed animals in two dimensions however the positions werent meaningful. And I had to preload a library of animals becuase unlike an LLM it can’t just generate un unknown animal name. Here’s what that looks like and you can see what I’m getting at and also why its not working.

Screen Recording 2025-03-07 at 9.02.14 AM.mov

Here is the code: https://editor.p5js.org/ajt521/sketches/4KwfU5Scg

*FYI it is in javascript becuase I had been working on it in code, but wanted to bring it into p5 for easier experimentation

Main question / problem: I am struggling to figure out how to make the semantic space actually meaningful.