AI Engineering Primer
How do you get up to speed with AI engineering? Unfortunately, I don’t know of any good consolidated resources, so I’m going to attempt to make one here. My first attempt at this focused more on what an AI engineer is and made only a feeble attempt at providing resources to get started. Let’s go!
The reason it’s difficult is that AI Engineering is so new, it’s bleeding edge. People still scoff at the idea that it’s even a title that someone can hold. It’s moving so fast that 3 months is roughly equivalent to a decade, so any resources that might exist become obsolete within a few months.
Things to Avoid
Avoid: LangChain
LangChain is used pervasively in tutorials. They usually are one of the first to implement a new prompting technique right after the paper comes out. However, nobody I know uses it in production. Many attempt to, but then replace it with either a langchain competitor or a write their own code.
Instead:
- Hand-roll (has it’s own problems, but sometimes it’s easier than getting burnt repeatedly by solutions that almost work)
- LlamaIndex — direct langchain competitor
- griptape — direct langchain competitor, focused on DAG workflows & tools
- Haystack — oriented toward search, it’s more than a bare vector store
- DSPy — focused on automatic prompt optimization
- gradio — prototype apps quickly
- Vendor SDKs from Cohere, OpenAI and Anthropic are sometimes quite powerful.
There’s a very long list of other good options, both open source & proprietary. The reason LangChain doesn’t work is that the code isn’t structured well. It works seamlessly until you run into a case that they didn’t explicitly plan for. Experienced software engineers would say that LangChain doesn’t “compose well”.
Avoid: Prompt Influencers
There’s no shortage of people on LinkedIn or X that are hawking “one weird trick”, the magic prompt, or in one way or another trying to convince you that there are special words or phrases that magically make an LLM do your bidding. If it sounds like a salesman trying to sell you something, it’s definitely a salesman trying to sell you something. In fact, they’re almost always the sales type, and very rarely have any sort of engineering experience. Avoid.
Avoid: Traditional ML People
This is a contentious topic, I’ve writen about it. They can be an asset, but beware of blindly taking advice from people who have been deep into traditional pre-LLM machine learning.
Boring Advice
Advice: Use LLMs A Lot
They’re both amazingly intelligent and unexpectedly dumb. The only real way to know what you’re dealing with is to use them a lot, for everything. Yes, you do need to get burnt. Just do it in a way that doesn’t matter too much. The goal here is to develop an instinct. You should be able to tell yourself, “if I do X it’ll probably go poorly, but if I rephrase it as Y then I can be confident in what it says”.
Advice: Basic Design Patterns
You should know RAG inside & out. Chain of Thought (CoT), and the ReAct pattern. Skim the rest of this post for more leads.
Advice: Buy Apple Silicon
Better yet, get a gaming laptop with an NVIDIA graphics card and Linux. But if not, get a Macbook M1, M2, M3, etc. series. The main memory & GPU memory is all the same, shared, so you can rock some surprisingly big models, all local.
I’m a big advocate of local LLMs, especially for AI engineers. They’re worse than the big SOTA models, which means you learn the sharp edges faster; learn to properly distrust an LLM. Plus, you can send logs with passwords to a local model, but it’s highly unwise to send passwords to OpenAI, Anthropic, or any computer that isn’t your own.
Topics
Here are several large areas to learn about. Not all of them will be important to you.
Topic: New Models
As new models are released, their capabilities increase. As an AI engineer, it’s crucial you stay on top of this. You should know about the pre-training scaling laws that have brought LLMs into the public’s eye.
Ways that models improve:
- Benchmarks — MMLU, GSM8, HellaSwag, HumanEval, etc. There’s tons of these and they’re always improving and you also shouldn’t trust them. They’re easily gamed. Yet you also have to pay attention and know what they mean. The open LLM leaderboard has a lot of good info.
- Context width — The size of the input. As this improves, RAG becomes easier. But LLMs also get worse at recall with bigger context, so it’s not a slam dunk.
- Reasoning — Models like o1 do CoT natively without prompting to achieve better reasoning scores.
- Model size — measured in number of parameters. 13B = 13 billion parameters. Bigger models are generally more capable, but smaller models are faster. When you consider TTC, smaller is smarter.
- Modalities — Beyond text, being able to take or emit other modalities like image, video, audio, etc. can be a game changer. As of today, Google seems to be leading with Gemini 2.0
- APIs — Occasionally new APIs & features enable wildly new things. e.g. Anthropic’s prompt caching enabled the Contextual Retrieval pattern for embeddings.
Most of this shows up in blog announcements from the AI labs and announced on X.
Topic: New Patterns
AI Engineering is still being figured out. If you go back far enough in programming history, languages didn’t
even have control structures like if
/then
or for
loops. It took time to figure that stuff out.
We’re in a similar spot with AI engineering, where the patterns are still emerging.
Check out Prompting Guide for a comprehensive list of current patterns. Also subscribe to Latent Space and read Simon Willison to keep up to date.
Topic: Infrastructure
Outside of the AI labs, you may want to watch some providers:
- Cerebras — Fast
- Groq — Fast (here’s a technical deep dive from a distributed systems perspective of how Groq works)
- Together.AI — Recommended place to rent GPUs
Additionally, pay attention to vector stores:
- Pinecone
- Qdrant
- pgvector — Postgres extension to treat it as just another SQL index on any table rather than a standalone database. This is a winning strategy, your SQL DB probably already has something like this. Use it.
- Redis — Classic NoSQL database. Watch this, though, because it’s creator, antirez has been talking about some wildly different ideas where the index is more of a plain data structure. This might be the key to enabling a lot more patterns, like clustering. Watch antirez’ work for updates.
Also, look into edge compute. Ollama for personal computers, vLLM for Linux servers, but also pay attention to work being done to run LLMs on IoT devices and phones.
Topic: Model Development & Optimization
Generally, do not do this unless you know you need to. It’s often tempting to try to fine tune, but it’s usually a red herring.
Topics:
- LoRA — The cheapest form of fine-tuning
- Transfer Learning
- Model distillation
- Quantization — Make models smaller to take up less memory
- Memory bandwidth — btw LLMs are so large that typically it’s the memory bandwidth that’s slowing you down, not the operations/sec.
- Transformer architecture
- Mixture of Experts (MoE) — I have a feeling this might be a key to further innovation soon.
Topic: Evaluation & Testing
This is quickly evolving and there’s unfortunately not much here.
Topics
- Benchmarks (see above)
- Robustness testing
- Mech Interp — There’s some exciting work being done here to understand how LLMs work on the inside. I’d say Anthropic is where the most interesting stuff happens.
- Compliance — This is a wide topic, definitely check out the EU AI Act.
- Alignment
Topic: Test Time Compute (TTC)
As I’m writing, this is a hot topic. The train time scaling laws seem to be fading and the new promising area is having models “think” longer during inference (see o1). This also seems to be a significant key to agents.
Generally follow any of the sources below. The information is spread out.
Topic: Agents
There’s two kinds of perspectives here:
- “Agent” is anything that uses tools
- “Agent” is autonomous and interacts with the world
The former isn’t very interesting, it’s just the ReAct pattern. The latter is an area of active research. Within agents you have topics like:
- Embodied vs disembodied agents
- Autonomy
- World models
- Agent Design & Orchestration
In my experience, present agents are like riding a unicycle. It’s possible to make them work, but it takes a lot of experience to not fall off. The main blocker to having them rolled out more broadly is reasoning & planning. I think Test Time Compute (TTC) might be part of the puzzle, others are betting on world models. In reality, it’s going to be a bit of everything; the whole field needs to evolve.
Sources
Primers
- Prompting Guide — Exhaustive coverage of individual topics. All prompting. Very useful for any AI engineer.
- Hugging Face docs — More oriented toward training new models
The AI Labs’s documentation often also has good primers:
Courses
- Cohere’s LLM University
- DeepLearning.AI — “short” courses to know what’s out there
AI Labs
- OpenAI
- Anthropic
- Hugging Face – Not the typical lab, focused on open source and small models.
- Cohere – Caters to enterprises & RAG.
- Qwen
- DeepSeek
- Allen Institute for AI (Ai2)
People to Watch
- Simon Willison — READ EVERYTHING SIMON WRITES, also follow him on one of the social platforms: BlueSky, X Mastodon, Github
- Nathan Lambert — Academic side, mostly RL. BlueSky, X, Github
- antirez — creator of Redis, he’s doing something interesting around vector indices — Bluesky, Github
- Eugene Yan
- hamel — Bluesky, X, Github
- Jason Liu — X, Github
- Chip Huyen — See her books — Bluesky, X, Github
News Venues & Newsletters
- The LocalLlama subredit — Great coverage on new models & design patterns
- Alpha Signal — breakthroughs, models, repos & research
- The Rundown AI
- Interconnects — More academic. Has substack, podcast
- Latent Space — AI Engineer newsletter. More high level.
- Threat Prompt Newsletter — The security perspective
Github
This is a new one for me, but some highly recommend following people on Github first and then maybe follow individual repos. It’s far better to follow people, because then you learn about new repos. Whereas following repos gets noisy very fast, so only do that when you want to keep close tabs. Look for new repos, new ideas, and new trends.
See People to Watch for Github links.
Huggingface
[Huggingface][(https://huggingface.co/) is like “Github for AI/ML models”. Typically, the code for the
model is kept in Github and the model artifacts are hosted in huggingface. The
transformers
library makes it very easy to download models
off huggingface and run them, or fine-tune, or disassemble and use just the tokenizer, or steal the attention
layers from an LLM to fine-tune an embedding model, etc.
Also, Huggingface offers inference. So you can host model inference there. For example, the Open LLM Leaderboard is hosted there, so it’s also not limited to just model inference.
Additionally, a lot of papers are posted to huggingface (sometimes instead of arXiv). There seems to be a social networking aspect to it, where you can comment on papers, follow authors, etc. It’s safe to say that huggingface is a core part of the AI ecosystem. While it’s not an AI lab in the traditional sense, it’s in many ways just as critical to AI development, maybe more so.
Discussion
- The original bluesky conversation
If I forgot something contact me, or else use the Github repo for this blog to create an issue or PR. Or add to one of the discussion links.