Latent Scope GitHub Python Module Discord Examples Source

enjalot's tweets

Looking in the mirror of years and 10,965 tweets.

Oh god this could get embarrassing. The problem with powerful visualization tools is that they will show you things whether you wanted to see them or not. For most of my career I've used Twitter mainly in a professional capacity, but as we will see my younger self was sometimes a bit flippant. Here goes nothing!

Before we begin the analysis, I should point out that each of the dots in the above map is a tweet, and all those tweets went through the 4 step process in Latent scope:

  1. Embed - run each piece of text through an embedding model
  2. Project - run the high-dimensional embeddings through UMAP
  3. Cluster - run the 2-dimensional UMAP coordinates through HDBSCAN
  4. Label - ask an LLM to create a label by summarizing a list of text taken from each cluster

So at the end of this process we have clusters carving up our tweets. Every row of our input data is annotated with a cluster index and label:

clusters is better than 10,000 but it's still quite a lot. Let's use some common metrics to explore our clusters, namely likes and rewteets. We can sort our list of clusters by the various column headers and see what's in each cluster:

Click on the radio button on the left of each cluster in the table to select it and see the details in the card below

As you can see from the first cluster in the list, I made a lot of tweets about Observable, especially Observable Plot which makes sense since it was a big part of my job to share my experience with these tools while I worked there. The opportunity to work with Mike Bostock at Observable was the culmination of 10 years of investment in the D3.js community, which you can see in the next most popular cluster:

Ok, but now I feel a little like I'm bragging. Let's look at a cluster that represents my more unhinged thoughts, hopefully none of these get me cancelled 🫣 Honestly they will probably just make you 🙄.

Alright, if you're still reading let's take a look at some clusters that are probably more relevant to your interests. Like these 4 that are all about AI and clustering!

Filtering on metadata

Alright, you've read through a bunch of the content. Let's take a break and consider an aspect of this data that is probably relevant to anyone who wants to cluster their data, whether its tweets or otherwise.

The tweets you get from the archive are one of three things, a "tweet", a "reply" or a "retweet". Most likely we are only really interested in tweets as they represent the "original" thoughts. It is conceivable you might want to analyze your reply-game, or maybe see if there are patterns in what kind of stuff you retweet. So in that spirit lets see what happens when we filter our overall data down to just one of those three categories:

If you still want to analyze your own tweets after all of this, the first step is to request your archive from Twitter. You can follow the instructions in this notebook to download and then process your tweets into a format that matches this analysis. The next step would be to run the CSV of your tweets through Latent Scope to get your clusters. Then you can look in your own tweet mirror!

If you have questions feel free to get in touch on our Discord server!