A Better Mastodon Client

Last night I had an idea and went ahead and built it. I’d like to tell you about it. Find the source code here.

The Pain Point

I use Mastodon as my primary social media. I like it because the sheer density of good info in my feed. So much good conversation happens on Mastodon. But my timeline is getting a little out of control.

Mastodon let’s me follow hashtags, like #LLMs or #AI, at which point my timeline gets all toots that my server (hachyderm.io) handled that were tagged accordingly. It’s not a huge amount, but hachyderm is fairly large so I get a good amount of toots, probably 1,000-1,500 toots per day. It’s getting hard to keep up with.

I should be able to automate this!

A streamlit dashboard

So here’s my idea: a streamlit dashboard that

This image shows a festive party scene with a realistic Mastodon as the centerpiece. The Mastodon stands in the middle of a crowded dance floor, surrounded by partygoers who are dancing and celebrating. Balloons in various colors float in the air, and string lights crisscross above the revelers, adding to the joyous atmosphere. In the foreground, there is a graphical user interface with "Entus controls" and a button labeled "Entiore," suggesting the integration of technology into the party setting. The overall mood is lively and vibrant, with a sense of fun and community celebration.

  1. downloads latest toots in my timeline
  2. cache them in SQLite
  3. generate embeddings for each toot
  4. do k-means clustering to group them by similar topic
  5. use an LLM to summarize each cluster of toots
  6. use tailscale to view it on my phone

I chose streamlit because it’s quick and dirty. I figure this isn’t going to be great on the first pass, so streamlit should help me iterate quickly to make it work better for me.

The great thing about Mastodon is it’s completely open source, so the API is open and always will be, unlike Twitter/X or the other platforms that have been locking down. FWIW I do think the fediverse is the long-term right model for social media, for a variety of reasons.

Embeddings

A quick note — embeddings are a numeric representation of text that corresponds to the meaning of the text. I like to think of it as an “AI secret language”, in that it’s the representation that large language models use to work with the text. We’re using a clustering algorithm here to group similar toots, there’s a lot of other things you can do with embeddings too!

Building It

A dynamic scene of a man and a Mastodon working together in a prehistoric landscape. The Mastodon, with its large tusks and woolly body, stands prominently in the center, pulling a wooden cart over a rocky terrain. The man, dressed in red, strains as he assists the Mastodon, guiding a rope attached to the cart. In the background, a cascade of waterfalls and lush greenery provide a majestic backdrop, while a herd of Mastodons is visible in the distance, hinting at a communal effort. The setting is serene with a soft glow of sunlight filtering through the mist, highlighting the cooperative relationship between humans and these ancient creatures.

I went from “oh! I have an idea” to a working solution in about 3.5 hours. I used Github Copilot, especially with the chat feature (CMD+I, type “create a SQLite DB with a toots table”). It’s incredible how quickly you can try out ideas.

If you want to take a peek:

  • The UI (dashboard.py)
  • The SQLite DB (core.py)
  • Download timeline (core.py) — I used requests, no special client
  • Generate embeddings (core.py — I used OpenAI’s text-embedding-ada-002. Its cheap and easy to setup.
  • K-means clustering (science.py) — scikit-learn makes this super easy, just 4 lines.
  • Summarize clusters (science.py) — I used gpt-3.5-turbo because it’s cheap-ish and good enough

The streamlit dashboard displays the clusters as an expander container. When the dashboard loads you see a list of cluster descriptions and you can choose which to dive into.

A list of clickable article headlines displayed on a digital interface with drop-down arrows next to each, suggesting additional content is available. The headlines are: Apple faces a setback with Apple Watch Series 9 and Ultra 2 after a losing patent lawsuit; Considerations for livestreaming coding projects and code writing in the Project Jupyter ecosystem; Discovery of variable swapping and destructuring across multiple programming languages; Controversial Economic Policy; Food and sports in North Carolina; Monday pizza night with a touch of spooky weather.

The toots are displayed poorly, imo, it could use a lot of work. I’d also like to be able to favorite and retoot from this UI, at which point I could probably use it as my primary client for my right-after-I-wake-up browsing.

Conclusion

I’ve used it for a few hours and I like being able to skip over vast stretches of my timeline with relative confidence that I know what I’m skipping. I’m in control again.

On a more philosophical note, I like the idea of social media algorithms but I hate the implementations. Viewing social media in timeline order is far too noisy. Algorithms that curate my feed make it far more manageable. On the other hand, I don’t know how X or Instagram are curating my feed. As far as I can tell, they’re optimizing for their own profit, which feels manipulative. I want my feed to serve me, no other way.

What do you think? How could it be improved?

Next: I wrote a followup to this post, about open source and societal alignment.

Comments