A Better Mastodon Client
Last night I had an idea and went ahead and built it. I’d like to tell you about it. Find the source code here.
The Pain Point
I use Mastodon as my primary social media. I like it because the sheer density of good info in my feed. So much good conversation happens on Mastodon. But my timeline is getting a little out of control.
Mastodon let’s me follow hashtags, like
#AI, at which point my timeline gets all toots that my server
handled that were tagged accordingly. It’s not a huge amount, but hachyderm is fairly large so I get a good amount of
toots, probably 1,000-1,500 toots per day. It’s getting hard to keep up with.
I should be able to automate this!
A streamlit dashboard
So here’s my idea: a streamlit dashboard that
- downloads latest toots in my timeline
- cache them in SQLite
- generate embeddings for each toot
- do k-means clustering to group them by similar topic
- use an LLM to summarize each cluster of toots
- use tailscale to view it on my phone
I chose streamlit because it’s quick and dirty. I figure this isn’t going to be great on the first pass, so streamlit should help me iterate quickly to make it work better for me.
The great thing about Mastodon is it’s completely open source, so the API is open and always will be, unlike Twitter/X or the other platforms that have been locking down. FWIW I do think the fediverse is the long-term right model for social media, for a variety of reasons.
A quick note — embeddings are a numeric representation of text that corresponds to the meaning of the text. I like to think of it as an “AI secret language”, in that it’s the representation that large language models use to work with the text. We’re using a clustering algorithm here to group similar toots, there’s a lot of other things you can do with embeddings too!
I went from “oh! I have an idea” to a working solution in about 3.5 hours. I used Github Copilot, especially with the chat feature (CMD+I, type “create a SQLite DB with a toots table”). It’s incredible how quickly you can try out ideas.
If you want to take a peek:
- The UI (dashboard.py)
- The SQLite DB (core.py)
- Download timeline (core.py) — I used requests, no special client
- Generate embeddings (core.py — I used OpenAI’s
text-embedding-ada-002. Its cheap and easy to setup.
- K-means clustering (science.py) — scikit-learn makes this super easy, just 4 lines.
- Summarize clusters (science.py) — I used
gpt-3.5-turbobecause it’s cheap-ish and good enough
The streamlit dashboard displays the clusters as an expander container. When the dashboard loads you see a list of cluster descriptions and you can choose which to dive into.
The toots are displayed poorly, imo, it could use a lot of work. I’d also like to be able to favorite and retoot from this UI, at which point I could probably use it as my primary client for my right-after-I-wake-up browsing.
I’ve used it for a few hours and I like being able to skip over vast stretches of my timeline with relative confidence that I know what I’m skipping. I’m in control again.
On a more philosophical note, I like the idea of social media algorithms but I hate the implementations. Viewing social media in timeline order is far too noisy. Algorithms that curate my feed make it far more manageable. On the other hand, I don’t know how X or Instagram are curating my feed. As far as I can tell, they’re optimizing for their own profit, which feels manipulative. I want my feed to serve me, no other way.
What do you think? How could it be improved?
Next: I wrote a followup to this post, about open source and societal alignment.