These notes collect small observations on chain-of-thought reasoning and search.
Basic View
Let a model generate a sequence of intermediate reasoning states:
\[z_{1:T} = (z_1, z_2, \ldots, z_T)\]The final answer can be written as:
\[y = f_\theta(x, z_{1:T})\]This perspective makes it natural to compare CoT generation with search, refinement, or denoising over latent reasoning trajectories.
Minimal Pseudocode
def reason(model, question, steps=4):
state = []
for _ in range(steps):
state.append(model.next_step(question, state))
return model.answer(question, state)
Reading Questions
- How should incorrect intermediate states be detected?
- Can refinement be trained without full step-level supervision?
- What is the right tradeoff between sampling diversity and verification cost?