Observe & Refine — The Jawz Loop

What this chapter does

The first three chapters are forward-looking. Chapter 4 is the only one that looks backward — and uses the result to refine the model the user runs the rest of the loop with.

But Chapter 4 only works if there is something to look back on. A retrospective built only on current prices is not a retrospective; it is a story dressed up as one. Chapter 4 only earns its output if prior decisions, theses, or portfolio snapshots were captured in usable form. When that record does not exist, this chapter's first job is not to fabricate it — it is to create the baseline that future runs will draw from.

That makes Chapter 4 the slowest-compounding chapter in the loop. The first run creates memory. Later runs extract lessons from it. Run it consistently and it sharpens; run it once and it does little.

Like the rest of the loop, every output ends in questions, not instructions.

Storage boundary

Jawz publishes the framework. Jawz does not store the user's portfolio, decisions, or theses. The user's AI is responsible for retrieving any private memory it has — past sessions, captured Decision Records, journal notes, broker statements — and applying this chapter's modes to that memory. If the AI has no usable memory for a given mode, it must say so plainly and run Mode 4.0 (Baseline Capture) instead of producing weak output.

This is architectural, not a disclaimer. The publishing-house model only holds if Jawz is the reference body, not the user's portfolio database.

Memory depth — what each mode requires

Each mode declares its memory requirement. If the requirement is unmet, the AI runs Mode 4.0 first and reports that as the response.

Mode	Requires
4.0 Baseline Capture	Nothing — this is the entry point when memory is thin
4.1 Period Review	Current holdings + price history (thin OK)
4.2 Position Retrospective	Acquisition date + at least an informal thesis (usable)
4.3 Thesis Status Sweep	Documented theses for top positions (rich)
4.4 Drawdown Post-Mortem	Holdings during the drawdown + ideally an action log (usable)
4.5 Process Mirror	Decision Records across multiple sessions (rich)

Every Chapter 4 output ends with a Review Quality classification — Thin / Usable / Strong — plus the gap that future captures should fill.

Shared preamble — memory check + priced book

Before any mode runs:

Memory check. What does the user's AI have access to? Prior session notes, Decision Records, Chapter 3 outputs, portfolio snapshots, broker statements, journal entries. Surface what is and isn't available.
Mode requirement check. Does the available memory meet the requested mode's minimum? If not, divert to Mode 4.0.
Priced book. Run Chapter 2's shared preamble — load and price the current book. State coverage.
Window. Determine the review window. Default for periodic modes is one week.
Benchmark. Pick one and name it: S&P 500 for US-equity-heavy books, ACWI for global, BTC for crypto-heavy. Do not silently switch benchmarks across modes within a session.

Mode 4.0 — Baseline Capture

When to use: First time the user runs Chapter 4. Also: any time another mode's memory check fails. This is the chapter's default entry point.

Memory required: None. This mode creates the memory.

Procedure:

Acknowledge the gap honestly. There is no usable history to review. That is not a failure of the chapter — it is the starting condition.
Capture a portfolio snapshot from current state.
For each top position, ask the user — in their own words — what they think the thesis is. Distinguish known theses (the user can articulate the reason) from undocumented theses (the user holds it but cannot explain why right now).
Record current regime context from Chapter 1 so future reviews can compare against it.
Capture what the user is currently concerned about — the questions or risks on their mind.
Set a review cadence (weekly is the default) and a next-review trigger (date or event).
Output the Baseline Record. The user's AI saves it locally. From this point on, future Chapter 4 runs have something to compare against.

Output contract:

Output format

## Baseline Record — [DATE]

**Date:** [YYYY-MM-DD]
**Portfolio snapshot available:** yes / no
**Positions reviewed:** [list]
**Known theses:** [position → one-line thesis, in user's words]
**Unknown / undocumented theses:** [positions with no current rationale]
**Current regime context:** [Ch1 regime read summary]
**Current user concerns:** [what's on the user's mind today]
**Review cadence:** [weekly / monthly / event-driven]
**Next review trigger:** [date or condition]

**Review Quality:** N/A — this run creates the baseline
**Missing memory captured:** [what the user couldn't articulate yet — flag for future Mode 2.4 walkthroughs]

The session ends with: future Chapter 4 reviews will be better because this is now the first anchor point.

Mode 4.1 — Period Review

When to use: "How did my book do this week / month / quarter," "what happened in my book," "what should I pay attention to from the last [period]." Default cadence: weekly.

Memory required: Thin OK — current holdings and price history are enough to run, though documented theses sharpen the output meaningfully.

Procedure:

Establish the period boundaries (start date, end date) and state them explicitly.
For every priced position: compute period return and contribution to the portfolio (return × weight). Group into top contributors and top detractors.
For each top mover: pull the news/events of the period that explain the move. One line per name. If a move has no obvious explanation, say so — that is itself a signal.
Compute book-level metrics where the data supports it: total return vs. the chosen benchmark, number of new positions opened, positions closed, turnover.
Reference the Chapter 1 regime read at the start vs. end of the period. If the regime shifted, note it and call out the implication for the book.
Surface anything unusual: positions that moved outside their typical range, correlations that broke, factor moves the user wouldn't have predicted from the regime alone.
If theses are documented: flag any contributor or detractor whose move is thesis-confirming or thesis-breaking. If not: state that thesis-relevance cannot be assessed in this run, and surface it as missing memory.

Output contract:

Output format

## Period Review — [START DATE] to [END DATE]

**Book return:** [+/- X%] vs. [benchmark, +/- Y%]
**Coverage:** [X% of book priced]
**Regime:** [start regime] → [end regime] ([shifted / unchanged])

### Top Contributors
| Position | Weight | Return | Contribution | What drove it |
|---|---|---|---|---|

### Top Detractors
| Position | Weight | Return | Contribution | What drove it |
|---|---|---|---|---|

### What was unusual
[One paragraph on anything that moved outside its typical range or behaved unexpectedly. If nothing did, say so.]

### Regime change in the period
[If applicable — what shifted, and what it implies for the book going into next period. Otherwise omit.]

### Questions to sit with
1. [Specific to a contributor or detractor — was the move thesis-confirming or thesis-breaking?]
2. [Specific to an unusual move — does the user's mental model account for it?]
3. [Forward-looking — given what just happened, what should be re-examined?]

**Review Quality:** Thin / Usable / Strong
**Why:** [what memory was available]
**Missing memory:** [what would improve next review — typically theses, acquisition dates, prior session notes]

Mode 4.2 — Position Retrospective

When to use: A specific position the user has held for at least four weeks. "How is X doing," "remind me about X," "where am I on X."

Memory required: Usable — acquisition date plus at least an informal thesis. The acquisition date also feeds score_position_drift; without it, regime-fit drift can't be computed and you note that gap. If thesis is missing, divert to Mode 4.0 first to capture it; do not fabricate one.

Procedure:

Memory check — acquisition date and original thesis. If missing, divert to Mode 4.0.
Compute total return since acquisition. Compare to a sensible benchmark over the same window — sector ETF for stocks, broad index, or asset-class proxy. Name the benchmark.
Regime-fit drift. Call score_position_drift with the ticker and acquisition date. Capture fit_score_at_entry → fit_score_now, regime_at_entry → regime_now, drift_label (improved / stable / deteriorated / review_now), and the tool's review_questions. This quantifies whether the macro backdrop for the position has improved or deteriorated since entry — independent of price. Use it to ground the regime read rather than narrating the timeline by hand.
Surface the 3–5 events that moved the price most during the hold — earnings, guidance, macro shocks, sector rotations.
Thesis status: Holds, Evolved, or Broken. If Evolved, name the new thesis explicitly.
If exit conditions were specified: check whether any have triggered.
Identify what the user has learned during the hold that they didn't know at acquisition. The user names this; the AI prompts.

Output contract:

Output format

## [Ticker] Retrospective — held since [DATE]

**Total return:** [+/- X%] vs. [benchmark, +/- Y%] over the same window
**Regime during hold:** [GREEN / YELLOW / RED — and any shifts]

### Regime-fit drift
- **Fit at entry → now:** [score_at_entry] → [score_now]  ([drift_label])
- **Regime at entry → now:** [regime_at_entry] → [regime_now]
- [One line: is the macro tailwind/headwind for this position stronger or weaker than at entry — and does that agree or disagree with the price action?]

### What's happened
- [Event 1 — date — price impact]
- [Event 2 — date — price impact]
- [Event 3 — date — price impact]

### Thesis status
- **Original thesis:** [as documented or restated by the user]
- **Status:** [Holds / Evolved / Broken]
- **What's changed:** [specific facts that update or invalidate the thesis]

### Exit conditions
[Original exit conditions and whether any are now met. Omit section if none were ever specified.]

### What you've learned
[2–3 things the user knows now that they didn't at acquisition, in their own words.]

### Questions to sit with
1. Is the conviction the user feels today supported by the thesis evidence, or by the price?
2. If they were evaluating this name today with no holding, would they buy it at this size?
3. What would the user need to see to reduce or exit?

**Review Quality:** Thin / Usable / Strong
**Why:** [what memory was available]
**Missing memory:** [what would improve next review]

Mode 4.3 — Thesis Status Sweep

When to use: Periodic check across the whole book — which theses still hold, which have evolved, which are broken. Especially valuable after a regime shift, a major macro surprise, or any event that should update priors broadly.

Memory required: Rich — documented theses for the top positions. If thin, divert to Mode 4.0 (or run Chapter 2 Mode 2.4 first). Do not produce a sweep on undocumented holdings; that is the failure mode this chapter is built to prevent.

Procedure:

Memory check. Theses available for top positions? If thin, divert.
List every position with a documented thesis.
Run the drift scan. Call get_drift_alerts with the documented book. It scores each position's regime-fit drift and returns flags: drift_severe (review_now), drift_meaningful (deteriorated), fit_low (≤2), conviction_gap, thesis_undocumented. Hold this alongside the thesis classification — they are different lenses: thesis status is about the story, drift is about the macro backdrop.
Classify each as Holds, Evolved, or Broken:
- Holds: core claim still supported by recent evidence.
- Evolved: core claim has changed shape — the position is now held for a different reason than at acquisition. Yellow flag, not always a problem.
- Broken: core claim is contradicted by evidence.
Cross-reference drift against thesis status. A position whose thesis Holds but whose drift_label is deteriorated (or fit is low) is a yellow flag — the story is intact while the regime has turned against it. Surface these explicitly; they are the sweep's sharpest output.
For broken theses: state the specific evidence that broke them. Vague is not allowed.
For evolved theses: name the new thesis explicitly. If the user can't name it, that's the signal.
List undocumented positions separately — flag them for a Mode 2.4 walkthrough.
Summarize the share of the book by thesis status.

Output contract:

Output format

## Thesis Status Sweep — [DATE]

**Holds:** [N positions, X% of book]
**Evolved:** [N positions, X% of book]
**Broken:** [N positions, X% of book]
**Undocumented:** [N positions, X% of book]

### Holds
| Position | Original thesis | Supporting evidence |
|---|---|---|

### Evolved
| Position | Original thesis | Current thesis (named) | What changed |
|---|---|---|---|

### Broken
| Position | Original thesis | Specific contradicting evidence |
|---|---|---|

### Undocumented
[Positions where no thesis was ever recorded — flag for Mode 2.4 walkthrough.]

### Regime-fit drift flags
| Position | Thesis status | Fit (entry → now) | Drift flag |
|---|---|---|---|
[Positions where regime-fit drift diverges from thesis status — especially "Holds" + deteriorating fit, or "Broken" the drift scan caught before you did.]

### Questions to sit with
1. For evolved theses — is the new thesis one the user would have bought into at acquisition?
2. For broken theses — what is keeping the position in the book?
3. For undocumented positions — which one is most worth thinking about first?

**Review Quality:** Thin / Usable / Strong
**Why:** [what memory was available — typically theses captured via Ch2 Mode 2.4 / Ch3 Decision Records]
**Missing memory:** [theses still missing, prior status snapshots not available, etc.]

Mode 4.4 — Drawdown Post-Mortem

When to use: A meaningful drawdown — a position down >10% from its hold-period high, the book down >5% over a short window, or any drawdown the user names as worth reviewing.

Memory required: Usable — holdings during the drawdown plus ideally an action log (what the user did during it). Without an action log, the "what the user did" section degrades to current recollection, which is exactly the kind of memory hindsight distorts.

Procedure:

Bound the drawdown precisely: peak date, trough date, peak-to-trough magnitude.
Decompose: what fraction of the drawdown came from each driver — a single position, a factor (rates, dollar, growth), a regime shift, a known event.
Compare to historical drawdowns of comparable shape. Was this drawdown in a normal range for the book's composition, or genuinely tail?
Separate what was knowable in advance from what was a true surprise. Be honest about both — hindsight will tempt the user to call everything "obvious."
Record what the user actually did during the drawdown — added, trimmed, held. Factual record from the action log; no judgment.
The user may write an operating note to their future self. The AI prompts; the user authors. The AI does not prescribe.

Output contract:

Output format

## Drawdown Post-Mortem — [PERIOD]

**Magnitude:** [-X%] over [N days], peak [DATE] → trough [DATE]
**Recovery status:** [recovered / partial / ongoing]

### Decomposition
| Driver | Contribution to drawdown |
|---|---|

### Was it in range?
[Comparison to historical drawdowns for comparable book composition or asset class. Tail or in-range.]

### Knowable in advance vs. surprise
- **Knowable:** [signals that were present beforehand]
- **Surprise:** [what genuinely could not have been anticipated]

### What the user did
[Factual record of any actions during the drawdown — sourced from action log if available; otherwise flagged as recollection.]

### Operating note to future self
[User-authored. The AI prompts but does not prescribe.]

**Review Quality:** Thin / Usable / Strong
**Why:** [what memory was available]
**Missing memory:** [action-log gaps, position theses not documented at the time, etc.]

Mode 4.5 — Process Mirror

When to use: Quarterly. Once enough loop history has accumulated that patterns are visible — typically after three or more months of consistent use.

Memory required: Rich — Decision Records spanning multiple sessions plus loop_invocations history. Without these it produces noise; divert to Mode 4.0 if memory is thin. State the sample size up front and let it govern how strongly anything is claimed.

The only mode that studies the user's own decision-making rather than the market or the book. It improves mechanically as the loop is used — every captured Decision Record and drift score (Ch4 4.2 / 4.3) thickens the dataset it reflects on. Early runs will be thin and should say so; that is the honest state, not a failure.

Procedure:

Memory check. Decision Records across multiple sessions + invocation history available? If not, divert to Mode 4.0. Count what you have — N Decision Records, M sessions, date span — and put it at the top of the output.
Decision → outcome pairing. Where a Decision Record names a position that can be tracked, line up what happened against the recorded thesis. Did the things that would confirm or would break it actually occur — and did the user act when they did?
Look for four pattern families (report only those the data supports):
- Process → outcome: do positions opened after a Ch3 Mode 3.2 (due diligence) walkthrough fare differently than impulse adds? Do positions with a documented exit condition get exited more cleanly than those without?
- Behavioral tendencies: adding into drawdowns, trimming winners early, holding Hollow positions (Mode 2.4) long after conviction lapsed, letting a thesis drift to Evolved/Broken (Mode 4.3) without acting.
- Loop adherence: which modes the user runs vs. skips — e.g. deciding (Ch3) without observing (Ch4), or never capturing Decision Records so retrospectives stay thin.
- Recurring blind spots: the same factor mismatch (Mode 2.1) or conviction gap (Mode 2.4) flagged across quarters without resolution.
Specific, not general. Every observation cites the records behind it. Banned: "be more disciplined." Wanted: "the three positions you opened without a Mode 3.2 walkthrough are each now your lowest-conviction holdings per Mode 2.4 — small sample (3), but worth noting."
Small-sample honesty. Name N every time. Correlation, not causation. A pattern over 4 decisions is a hypothesis, not a law — say so.
Reflect, don't prescribe. Surface the pattern; the user names the takeaway and decides what, if anything, to change.

Output contract:

Output format

## Process Mirror — [QUARTER]

**Sample:** [N Decision Records · M sessions · date span] — governs how strongly anything below is claimed.

### Process → outcome
[Patterns linking how a decision was made to how it turned out. Cite the records. Omit if sample too thin.]

### Behavioral tendencies
[Recurring behaviors visible in the data, each with specific instances.]

### Loop adherence
[Which modes the user leans on / skips, and the consequence.]

### Recurring blind spots
[The same flag appearing across quarters without resolution.]

### For the user to name
[1–2 reflective prompts. The user writes the takeaway; the AI does not prescribe.]

**Review Quality:** Thin / Usable / Strong
**Why:** [what memory was available]
**Missing memory:** [Decision Records still missing, sessions not captured, etc.]

Chapter 4 compounds only if the user's AI keeps good records. The first run creates memory. Later runs extract lessons from it. Run 4.0 first when records are thin. Run 4.1 weekly. Run 4.2 when a name feels uncertain. Run 4.3 after regime shifts. Run 4.4 when the book asks for it. Run 4.5 quarterly, once history allows.

Source: The Jawz Loop, by Mako · Chapter 4 v0.3.0.

↻ The loop returns to See the World — the world has moved since you started.