Large Language Models: A Series

Building products on LLMs and AI generally.

Spending Too Much Money on a Coding Agent

June 30, 2025

On making use of large thinking models.

For a year, I’d been coding almost every day with Cursor and Claude Sonnet. Anthropic’s 3.5 and 3.7 Sonnet each rightly earned their dominant place on the programming model charts: they were the least-bad coding models yet. In the earliest days of LLMs, there was tremendous interest in ever-larger model...

7 min read →

Post-Chat UI

April 30, 2025

How LLMs are making traditional apps feel broken.

First, there was the terminal. You typed text. Scrolling text came back. It was: Powerful Flexible for power users Easy to program But also, since it was centered around a blank input field, it was: Daunting Unintuitive Bad for selecting and manipulating stuff Fortunately, in the intervening decades our user...

8 min read →

The Era of Tab Continuation

January 31, 2025

Press tab to complete your work.

If you’ve used a code editor before, you’ve seen tab completion. When you start typing a keyword or phrase, the editor might offer to complete the rest of what you’re typing: If I press tab, Sublime Text will complete “InfoRow” for me. Neat. An analogous thing happens in your browser:...

4 min read →

It's Good for Apple, and Okay for You

November 30, 2024

Apple Intelligence, so far.

The first big wave of Apple Intelligence features are arriving shortly, with iOS 18.2. For the last month, a beta has been available, offering a peek into this new AI-powered future. I’ve been curious what Apple’s ML teams have been cooking, especially given the industry-leading security and privacy commitments they’ve...

9 min read →

Testing the Untestable

October 31, 2024

The four phases of automated evals for LLM-powered features.

I gave a talk version of this article at the first Infer meetup earlier this month. Let’s say you want to build an LLM-powered app. With a modern model and common-sense prompting, it’s easy to get a demo going with reasonable results. Of course, before going live, you test various...

9 min read →

Link: Infer, an AI Eng Meetup in Vancouver

October 2, 2024

Next week, we’ll be kicking off a new speaker series in Vancouver called Infer. The goal of the meetup is to bring together folks who are doing great AI engineering work, so we can learn from one another.

The format will be familiar to folks that have attended my previous meetups: two speakers, often one of whom will be visiting from out of town, with time to chat afterward. Events will happen roughly every two months, when we have compelling topics lined up.

If you’re building LLM-powered apps in Vancouver, you can subscribe to our event on Luma. There are still a few spots open for our first “beta” event on October 9th, and we’ll be hosting another during NeurIPS in December.

There’s something electric about getting smart people who are working in a rapidly-changing field in a room together. I recommend it.

Starting Forestwalk

August 16, 2024

A wild startup appears.

Last month, I started full-time on a new startup. It’s early days, but we’re having a lot of fun. A startup, fundamentally, is a search for a repeatable, scalable business model. You rapidly try things, run experiments, learn, and iterate your theories about how to build a useful product that...

2 min read →

Pushing the Frontier

July 31, 2024

If – and when – GPT-5 might eat your lunch

Lately I’ve been working with a lot of teams and founders that are building products on top of LLMs. It’s a lot of fun! To be an AI product engineer today is to constantly ask new questions that impact how you build products. Questions like: “Is there a way we...

5 min read →

LLMs Aren’t Just “Trained On the Internet” Anymore

May 31, 2024

A path to continued model improvement.

I often see a misconception when people try to reason about the capability of LLMs, and in particular how much future improvement to expect. It’s frequently said that that LLMs are “trained on the internet,” and so they’ll always be bad at producing content that is rare on the web....

5 min read →

From Chatbot to Everything Engine

January 10, 2024

A curious design constraint signals an ambitious future.

This morning, OpenAI launched the GPT Store: a simple way to browse and distribute customized versions of ChatGPT. GPTs – awkwardly named to solidify OpenAI’s claim to the trademark “GPT” – consist of a custom ChatGPT prompt, an icon, and optionally some reference data or hookups to external APIs. In...

5 min read →

Going Way Beyond ChatGPT

June 30, 2023

Techniques for building products on LLMs today.

Modern instruction-tuned language models, or LLMs, are the latest tool in software engineers’ toolboxes. Joining classics like databases, networking, hypertext, and async web applications, we now have a new enabling technology that seems wickedly powerful, but whose best applications aren’t yet clear. ChatGPT lets you poke at those possibilities. You...

11 min read →

32K of Context in Your Pocket

March 15, 2023

A wild large-context LLM appears.

One month ago, I wrote about on the limits of 4K-token AI models, and the wild capabilities and costs that large-context language models may one day have. Today, OpenAI not only debuted GPT-4 with a doubly large 8K token limit, but demoed and began trials of a version that supports...

2 min read →

A 175-Billion-Parameter Goldfish

February 16, 2023

The problem and opportunity of language model context.

It has been a wild week in AI. By now, we’re getting used to the plot twist that rather than the cold Spock-like AIs of science fiction, large language models tend to be charismatic fabulists with a tenuous understanding of facts. Into that environment, last week Microsoft launched a Bing...

13 min read →