AI Engineering Primer

Thu December 19, 2024

How do you get up to speed with AI engineering? Unfortunately, I don’t know of any good consolidated resources, so I’m going to attempt to make one here. My first attempt at this focused more on what an AI engineer is and made only a feeble attempt at providing resources to get started. Let’s go!

The reason it’s difficult is that AI Engineering is so new, it’s bleeding edge. People still scoff at the idea that it’s even a title that someone can hold. It’s moving so fast that 3 months is roughly equivalent to a decade, so any resources that might exist become obsolete within a few months.

Things to Avoid

Avoid: LangChain

LangChain is used pervasively in tutorials. They usually are one of the first to implement a new prompting technique right after the paper comes out. However, nobody I know uses it in production. Many attempt to, but then replace it with either a langchain competitor or a write their own code.

Instead:

Hand-roll (has it’s own problems, but sometimes it’s easier than getting burnt repeatedly by solutions that almost work)
LlamaIndex — direct langchain competitor
griptape — direct langchain competitor, focused on DAG workflows & tools
Haystack — oriented toward search, it’s more than a bare vector store
DSPy — focused on automatic prompt optimization
gradio — prototype apps quickly
Vendor SDKs from Cohere, OpenAI and Anthropic are sometimes quite powerful.

There’s a very long list of other good options, both open source & proprietary. The reason LangChain doesn’t work is that the code isn’t structured well. It works seamlessly until you run into a case that they didn’t explicitly plan for. Experienced software engineers would say that LangChain doesn’t “compose well”.

Avoid: Prompt Influencers

There’s no shortage of people on LinkedIn or X that are hawking “one weird trick”, the magic prompt, or in one way or another trying to convince you that there are special words or phrases that magically make an LLM do your bidding. If it sounds like a salesman trying to sell you something, it’s definitely a salesman trying to sell you something. In fact, they’re almost always the sales type, and very rarely have any sort of engineering experience. Avoid.

Avoid: Traditional ML People

This is a contentious topic, I’ve writen about it. They can be an asset, but beware of blindly taking advice from people who have been deep into traditional pre-LLM machine learning.

Boring Advice

Advice: Use LLMs A Lot

They’re both amazingly intelligent and unexpectedly dumb. The only real way to know what you’re dealing with is to use them a lot, for everything. Yes, you do need to get burnt. Just do it in a way that doesn’t matter too much. The goal here is to develop an instinct. You should be able to tell yourself, “if I do X it’ll probably go poorly, but if I rephrase it as Y then I can be confident in what it says”.

Advice: Basic Design Patterns

You should know RAG inside & out. Chain of Thought (CoT), and the ReAct pattern. Skim the rest of this post for more leads.

Advice: Buy Apple Silicon

Better yet, get a gaming laptop with an NVIDIA graphics card and Linux. But if not, get a Macbook M1, M2, M3, etc. series. The main memory & GPU memory is all the same, shared, so you can rock some surprisingly big models, all local.

I’m a big advocate of local LLMs, especially for AI engineers. They’re worse than the big SOTA models, which means you learn the sharp edges faster; learn to properly distrust an LLM. Plus, you can send logs with passwords to a local model, but it’s highly unwise to send passwords to OpenAI, Anthropic, or any computer that isn’t your own.

Topics

Here are several large areas to learn about. Not all of them will be important to you.

Topic: New Models

As new models are released, their capabilities increase. As an AI engineer, it’s crucial you stay on top of this. You should know about the pre-training scaling laws that have brought LLMs into the public’s eye.

Ways that models improve:

Benchmarks — MMLU, GSM8, HellaSwag, HumanEval, etc. There’s tons of these and they’re always improving and you also shouldn’t trust them. They’re easily gamed. Yet you also have to pay attention and know what they mean. The open LLM leaderboard has a lot of good info.
Context width — The size of the input. As this improves, RAG becomes easier. But LLMs also get worse at recall with bigger context, so it’s not a slam dunk.
Reasoning — Models like o1 do CoT natively without prompting to achieve better reasoning scores.
Model size — measured in number of parameters. 13B = 13 billion parameters. Bigger models are generally more capable, but smaller models are faster. When you consider TTC, smaller is smarter.
Modalities — Beyond text, being able to take or emit other modalities like image, video, audio, etc. can be a game changer. As of today, Google seems to be leading with Gemini 2.0
APIs — Occasionally new APIs & features enable wildly new things. e.g. Anthropic’s prompt caching enabled the Contextual Retrieval pattern for embeddings.

Most of this shows up in blog announcements from the AI labs and announced on X.

Topic: New Patterns

AI Engineering is still being figured out. If you go back far enough in programming history, languages didn’t even have control structures like if/then or for loops. It took time to figure that stuff out. We’re in a similar spot with AI engineering, where the patterns are still emerging.

Check out Prompting Guide for a comprehensive list of current patterns. Also subscribe to Latent Space and read Simon Willison to keep up to date.

Topic: Infrastructure

Outside of the AI labs, you may want to watch some providers:

Cerebras — Fast
Groq — Fast (here’s a technical deep dive from a distributed systems perspective of how Groq works)
Together.AI — Recommended place to rent GPUs

Additionally, pay attention to vector stores:

Pinecone
Qdrant
pgvector — Postgres extension to treat it as just another SQL index on any table rather than a standalone database. This is a winning strategy, your SQL DB probably already has something like this. Use it.
Redis — Classic NoSQL database. Watch this, though, because it’s creator, antirez has been talking about some wildly different ideas where the index is more of a plain data structure. This might be the key to enabling a lot more patterns, like clustering. Watch antirez’ work for updates.

Also, look into edge compute. Ollama for personal computers, vLLM for Linux servers, but also pay attention to work being done to run LLMs on IoT devices and phones.

Topic: Model Development & Optimization

Generally, do not do this unless you know you need to. It’s often tempting to try to fine tune, but it’s usually a red herring.

Topics:

LoRA — The cheapest form of fine-tuning
Transfer Learning
Model distillation
Quantization — Make models smaller to take up less memory
Memory bandwidth — btw LLMs are so large that typically it’s the memory bandwidth that’s slowing you down, not the operations/sec.
Transformer architecture
Mixture of Experts (MoE) — I have a feeling this might be a key to further innovation soon.

Topic: Evaluation & Testing

This is quickly evolving and there’s unfortunately not much here.

Topics

Benchmarks (see above)
Robustness testing
Mech Interp — There’s some exciting work being done here to understand how LLMs work on the inside. I’d say Anthropic is where the most interesting stuff happens.
Compliance — This is a wide topic, definitely check out the EU AI Act.
Alignment

Topic: Test Time Compute (TTC)

As I’m writing, this is a hot topic. The train time scaling laws seem to be fading and the new promising area is having models “think” longer during inference (see o1). This also seems to be a significant key to agents.

Generally follow any of the sources below. The information is spread out.

Topic: Agents

There’s two kinds of perspectives here:

“Agent” is anything that uses tools
“Agent” is autonomous and interacts with the world

The former isn’t very interesting, it’s just the ReAct pattern. The latter is an area of active research. Within agents you have topics like:

Embodied vs disembodied agents
Autonomy
World models
Agent Design & Orchestration

In my experience, present agents are like riding a unicycle. It’s possible to make them work, but it takes a lot of experience to not fall off. The main blocker to having them rolled out more broadly is reasoning & planning. I think Test Time Compute (TTC) might be part of the puzzle, others are betting on world models. In reality, it’s going to be a bit of everything; the whole field needs to evolve.

Sources

Primers

Prompting Guide — Exhaustive coverage of individual topics. All prompting. Very useful for any AI engineer.
Hugging Face docs — More oriented toward training new models

The AI Labs’s documentation often also has good primers:

Courses

Cohere’s LLM University
DeepLearning.AI — “short” courses to know what’s out there
Blue Vs Brown YouTube videos — Excellent video series explaining how LLMs work in a very simple, visual way

AI Labs

OpenAI
Anthropic
Hugging Face – Not the typical lab, focused on open source and small models.
Cohere – Caters to enterprises & RAG.
Qwen
DeepSeek
Allen Institute for AI (Ai2)

People to Watch

Simon Willison — READ EVERYTHING SIMON WRITES, also follow him on one of the social platforms: BlueSky, X Mastodon, Github
Nathan Lambert — Academic side, mostly RL. BlueSky, X, Github
antirez — creator of Redis, he’s doing something interesting around vector indices — Bluesky, Github
Eugene Yan
hamel — Bluesky, X, Github
Jason Liu — X, Github
Chip Huyen — See her books — Bluesky, X, Github
Lilian Weng — X Github

News Venues & Newsletters

The LocalLlama subredit — Great coverage on new models & design patterns
Alpha Signal — breakthroughs, models, repos & research
The Rundown AI
Interconnects — More academic. Has substack, podcast
Latent Space — AI Engineer newsletter. More high level.
Threat Prompt Newsletter — The security perspective

Github

This is a new one for me, but some highly recommend following people on Github first and then maybe follow individual repos. It’s far better to follow people, because then you learn about new repos. Whereas following repos gets noisy very fast, so only do that when you want to keep close tabs. Look for new repos, new ideas, and new trends.

See People to Watch for Github links.

HuggingFace

[HuggingFace][(https://huggingface.co/) is like “Github for AI/ML models”. Typically, the code for the model is kept in Github and the model artifacts are hosted in HuggingFace. The transformers library makes it very easy to download models off HuggingFace and run them, or fine-tune, or disassemble and use just the tokenizer, or steal the attention layers from an LLM to fine-tune an embedding model, etc.

Also, HuggingFace offers inference. So you can host model inference there. For example, the Open LLM Leaderboard is hosted there, so it’s also not limited to just model inference.

Additionally, a lot of papers are posted to HuggingFace (sometimes instead of arXiv). There seems to be a social networking aspect to it, where you can comment on papers, follow authors, etc. It’s safe to say that HuggingFace is a core part of the AI ecosystem. While it’s not an AI lab in the traditional sense, it’s in many ways just as critical to AI development, maybe more so.

Discussion

The original bluesky conversation

If I forgot something contact me, or else use the Github repo for this blog to create an issue or PR. Or add to one of the discussion links.