Shidong Cao

These notes collect small observations on chain-of-thought reasoning and search.

Basic View

Let a model generate a sequence of intermediate reasoning states:

\[z_{1:T} = (z_1, z_2, \ldots, z_T)\]

The final answer can be written as:

\[y = f_\theta(x, z_{1:T})\]

This perspective makes it natural to compare CoT generation with search, refinement, or denoising over latent reasoning trajectories.

Minimal Pseudocode

def reason(model, question, steps=4):
    state = []
    for _ in range(steps):
        state.append(model.next_step(question, state))
    return model.answer(question, state)

Reading Questions

How should incorrect intermediate states be detected?
Can refinement be trained without full step-level supervision?
What is the right tradeoff between sampling diversity and verification cost?