A Better Mastodon Client

Tue December 19, 2023

Last night I had an idea and went ahead and built it. I’d like to tell you about it. Find the source code here.

The Pain Point

I use Mastodon as my primary social media. I like it because the sheer density of good info in my feed. So much good conversation happens on Mastodon. But my timeline is getting a little out of control.

Mastodon let’s me follow hashtags, like #LLMs or #AI, at which point my timeline gets all toots that my server (hachyderm.io) handled that were tagged accordingly. It’s not a huge amount, but hachyderm is fairly large so I get a good amount of toots, probably 1,000-1,500 toots per day. It’s getting hard to keep up with.

I should be able to automate this!

A streamlit dashboard

So here’s my idea: a streamlit dashboard that

downloads latest toots in my timeline
cache them in SQLite
generate embeddings for each toot
do k-means clustering to group them by similar topic
use an LLM to summarize each cluster of toots
use tailscale to view it on my phone

I chose streamlit because it’s quick and dirty. I figure this isn’t going to be great on the first pass, so streamlit should help me iterate quickly to make it work better for me.

The great thing about Mastodon is it’s completely open source, so the API is open and always will be, unlike Twitter/X or the other platforms that have been locking down. FWIW I do think the fediverse is the long-term right model for social media, for a variety of reasons.

Embeddings

A quick note — embeddings are a numeric representation of text that corresponds to the meaning of the text. I like to think of it as an “AI secret language”, in that it’s the representation that large language models use to work with the text. We’re using a clustering algorithm here to group similar toots, there’s a lot of other things you can do with embeddings too!

Building It

I went from “oh! I have an idea” to a working solution in about 3.5 hours. I used Github Copilot, especially with the chat feature (CMD+I, type “create a SQLite DB with a toots table”). It’s incredible how quickly you can try out ideas.

If you want to take a peek:

The UI (dashboard.py)
The SQLite DB (core.py)
Download timeline (core.py) — I used requests, no special client
Generate embeddings (core.py — I used OpenAI’s text-embedding-ada-002. Its cheap and easy to setup.
K-means clustering (science.py) — scikit-learn makes this super easy, just 4 lines.
Summarize clusters (science.py) — I used gpt-3.5-turbo because it’s cheap-ish and good enough

The streamlit dashboard displays the clusters as an expander container. When the dashboard loads you see a list of cluster descriptions and you can choose which to dive into.

The toots are displayed poorly, imo, it could use a lot of work. I’d also like to be able to favorite and retoot from this UI, at which point I could probably use it as my primary client for my right-after-I-wake-up browsing.

Conclusion

I’ve used it for a few hours and I like being able to skip over vast stretches of my timeline with relative confidence that I know what I’m skipping. I’m in control again.

On a more philosophical note, I like the idea of social media algorithms but I hate the implementations. Viewing social media in timeline order is far too noisy. Algorithms that curate my feed make it far more manageable. On the other hand, I don’t know how X or Instagram are curating my feed. As far as I can tell, they’re optimizing for their own profit, which feels manipulative. I want my feed to serve me, no other way.

What do you think? How could it be improved?

Next: I wrote a followup to this post, about open source and societal alignment.