chapter three

3 Workflow: Build your own Codebase Knowledge Builder

 

This chapter covers

  • Building a Codebase Knowledge Builder
  • Filtering large repos with smart crawl
  • Applying overview-then-zoom-in
  • Chaining sequential context across calls
  • Swapping instruction files for output

In chapter 2, you pasted nanoGPT into a chat window, asked layered questions, and built a learning note. It worked because nanoGPT is 19 files, and you had good instincts about what to follow up on: you noticed vocab_size = 50304 was interesting, and you knew to ask about gradient accumulation. But those instincts came from you. For a codebase you know nothing about, you wouldn't even know what to ask, and at 500 files, you can't just paste everything and hope for the best.

What you need is something systematic: a tool that reads the whole codebase, identifies the core concepts, maps how they depend on each other, and explains them in the right order. Not a conversation where you hope to stumble on the right question, but a pipeline that asks the right questions every time. That's what you're building in this chapter: a Codebase Knowledge Builder. No framework, no dependencies beyond call_llm() and a YAML parser. By the end, you'll have three techniques that transfer to every chapter in this book: structured output from LLMs, the overview-then-zoom-in decomposition, and instruction files that encode domain knowledge.

3.1 Turn any codebase into a tutorial for $2

3.2 What the LLM sees determines everything

3.2.1 Most repos fit. The interesting question is what to do when they don't.

3.2.2 Prune the obvious, then let the LLM pick the rest

3.2.3 The LLM needs guidance to pick well

3.3 Analyze: the overview-then-zoom-in pattern

3.3.1 Step 1: Get the overview (1 LLM call)

3.3.2 Step 2: Map relationships (1 LLM call)

3.3.3 The pattern that scales

3.4 Write chapters: Sequential context and the instructions file

3.4.1 Sequential context: Each chapter knows what came before

3.4.2 The instructions file: Where the real engineering lives

3.4.3 Putting it all together

3.5 Swap the instructions, change the output

3.5.1 Same codebase, different lenses

3.5.2 Different questions for different layers

3.6 Summary