Chapter 4 — Observe & Refine
Node: observe
Purpose: Make what the book actually did legible to the user, so the view going forward updates on evidence rather than memory.
What this chapter does
The first three chapters are forward-looking. Chapter 4 is the only one that looks backward — and uses the result to refine the model the user runs the rest of the loop with.
But Chapter 4 only works if there is something to look back on. A retrospective built only on current prices is not a retrospective; it is a story dressed up as one. Chapter 4 only earns its output if prior decisions, theses, or portfolio snapshots were captured in usable form. When that record does not exist, this chapter's first job is not to fabricate it — it is to create the baseline that future runs will draw from.
That makes Chapter 4 the slowest-compounding chapter in the loop. The first run creates memory. Later runs extract lessons from it. Run it consistently and it sharpens; run it once and it does little.
Like the rest of the loop, every output ends in questions, not instructions.
Storage boundary
Jawz publishes the framework. Jawz does not store the user's portfolio, decisions, or theses. The user's AI is responsible for retrieving any private memory it has — past sessions, captured Decision Records, journal notes, broker statements — and applying this chapter's modes to that memory. If the AI has no usable memory for a given mode, it must say so plainly and run Mode 4.0 (Baseline Capture) instead of producing weak output.
This is architectural, not a disclaimer. The publishing-house model only holds if Jawz is the reference body, not the user's portfolio database.
Memory depth — what each mode requires
Each mode declares its memory requirement. If the requirement is unmet, the AI runs Mode 4.0 first and reports that as the response.
| Mode | Requires |
|---|---|
| 4.0 Baseline Capture | Nothing — this is the entry point when memory is thin |
| 4.1 Period Review | Current holdings + price history (thin OK) |
| 4.2 Position Retrospective | Acquisition date + at least an informal thesis (usable) |
| 4.3 Thesis Status Sweep | Documented theses for top positions (rich) |
| 4.4 Drawdown Post-Mortem | Holdings during the drawdown + ideally an action log (usable) |
| 4.5 Process Mirror | Decision Records across multiple sessions (rich) |
Every Chapter 4 output ends with a Review Quality classification — Thin / Usable / Strong — plus the gap that future captures should fill.
Shared preamble — memory check + priced book
Before any mode runs:
- Memory check. What does the user's AI have access to? Prior session notes, Decision Records, Chapter 3 outputs, portfolio snapshots, broker statements, journal entries. Surface what is and isn't available.
- Mode requirement check. Does the available memory meet the requested mode's minimum? If not, divert to Mode 4.0.
- Priced book. Run Chapter 2's shared preamble — load and price the current book. State coverage.
- Window. Determine the review window. Default for periodic modes is one week.
- Benchmark. Pick one and name it: S&P 500 for US-equity-heavy books, ACWI for global, BTC for crypto-heavy. Do not silently switch benchmarks across modes within a session.
Mode 4.0 — Baseline Capture
When to use: First time the user runs Chapter 4. Also: any time another mode's memory check fails. This is the chapter's default entry point.
Memory required: None. This mode creates the memory.
Procedure:
- Acknowledge the gap honestly. There is no usable history to review. That is not a failure of the chapter — it is the starting condition.
- Capture a portfolio snapshot from current state.
- For each top position, ask the user — in their own words — what they think the thesis is. Distinguish known theses (the user can articulate the reason) from undocumented theses (the user holds it but cannot explain why right now).
- Record current regime context from Chapter 1 so future reviews can compare against it.
- Capture what the user is currently concerned about — the questions or risks on their mind.
- Set a review cadence (weekly is the default) and a next-review trigger (date or event).
- Output the Baseline Record. The user's AI saves it locally. From this point on, future Chapter 4 runs have something to compare against.
Output contract:
## Baseline Record — [DATE]
**Date:** [YYYY-MM-DD]
**Portfolio snapshot available:** yes / no
**Positions reviewed:** [list]
**Known theses:** [position → one-line thesis, in user's words]
**Unknown / undocumented theses:** [positions with no current rationale]
**Current regime context:** [Ch1 regime read summary]
**Current user concerns:** [what's on the user's mind today]
**Review cadence:** [weekly / monthly / event-driven]
**Next review trigger:** [date or condition]
**Review Quality:** N/A — this run creates the baseline
**Missing memory captured:** [what the user couldn't articulate yet — flag for future Mode 2.4 walkthroughs]
The session ends with: future Chapter 4 reviews will be better because this is now the first anchor point.
Mode 4.1 — Period Review
When to use: "How did my book do this week / month / quarter," "what happened in my book," "what should I pay attention to from the last [period]." Default cadence: weekly.
Memory required: Thin OK — current holdings and price history are enough to run, though documented theses sharpen the output meaningfully.
Procedure:
- Establish the period boundaries (start date, end date) and state them explicitly.
- For every priced position: compute period return and contribution to the portfolio (return × weight). Group into top contributors and top detractors.
- For each top mover: pull the news/events of the period that explain the move. One line per name. If a move has no obvious explanation, say so — that is itself a signal.
- Compute book-level metrics where the data supports it: total return vs. the chosen benchmark, number of new positions opened, positions closed, turnover.
- Reference the Chapter 1 regime read at the start vs. end of the period. If the regime shifted, note it and call out the implication for the book.
- Surface anything unusual: positions that moved outside their typical range, correlations that broke, factor moves the user wouldn't have predicted from the regime alone.
- If theses are documented: flag any contributor or detractor whose move is thesis-confirming or thesis-breaking. If not: state that thesis-relevance cannot be assessed in this run, and surface it as missing memory.
Output contract:
## Period Review — [START DATE] to [END DATE]
**Book return:** [+/- X%] vs. [benchmark, +/- Y%]
**Coverage:** [X% of book priced]
**Regime:** [start regime] → [end regime] ([shifted / unchanged])
### Top Contributors
| Position | Weight | Return | Contribution | What drove it |
|---|---|---|---|---|
### Top Detractors
| Position | Weight | Return | Contribution | What drove it |
|---|---|---|---|---|
### What was unusual
[One paragraph on anything that moved outside its typical range or behaved unexpectedly. If nothing did, say so.]
### Regime change in the period
[If applicable — what shifted, and what it implies for the book going into next period. Otherwise omit.]
### Questions to sit with
1. [Specific to a contributor or detractor — was the move thesis-confirming or thesis-breaking?]
2. [Specific to an unusual move — does the user's mental model account for it?]
3. [Forward-looking — given what just happened, what should be re-examined?]
**Review Quality:** Thin / Usable / Strong
**Why:** [what memory was available]
**Missing memory:** [what would improve next review — typically theses, acquisition dates, prior session notes]
Mode 4.2 — Position Retrospective
When to use: A specific position the user has held for at least four weeks. "How is X doing," "remind me about X," "where am I on X."
Memory required: Usable — acquisition date plus at least an informal thesis. If thesis is missing, divert to Mode 4.0 first to capture it; do not fabricate one.
Procedure:
- Memory check — acquisition date and original thesis. If missing, divert to Mode 4.0.
- Compute total return since acquisition. Compare to a sensible benchmark over the same window — sector ETF for stocks, broad index, or asset-class proxy. Name the benchmark.
- Map the holding period against the regime timeline from Chapter 1. Was the regime favorable, unfavorable, or did it shift mid-hold?
- Surface the 3–5 events that moved the price most during the hold — earnings, guidance, macro shocks, sector rotations.
- Thesis status: Holds, Evolved, or Broken. If Evolved, name the new thesis explicitly.
- If exit conditions were specified: check whether any have triggered.
- Identify what the user has learned during the hold that they didn't know at acquisition. The user names this; the AI prompts.
Output contract:
## [Ticker] Retrospective — held since [DATE]
**Total return:** [+/- X%] vs. [benchmark, +/- Y%] over the same window
**Regime during hold:** [GREEN / YELLOW / RED — and any shifts]
### What's happened
- [Event 1 — date — price impact]
- [Event 2 — date — price impact]
- [Event 3 — date — price impact]
### Thesis status
- **Original thesis:** [as documented or restated by the user]
- **Status:** [Holds / Evolved / Broken]
- **What's changed:** [specific facts that update or invalidate the thesis]
### Exit conditions
[Original exit conditions and whether any are now met. Omit section if none were ever specified.]
### What you've learned
[2–3 things the user knows now that they didn't at acquisition, in their own words.]
### Questions to sit with
1. Is the conviction the user feels today supported by the thesis evidence, or by the price?
2. If they were evaluating this name today with no holding, would they buy it at this size?
3. What would the user need to see to reduce or exit?
**Review Quality:** Thin / Usable / Strong
**Why:** [what memory was available]
**Missing memory:** [what would improve next review]
Mode 4.3 — Thesis Status Sweep
When to use: Periodic check across the whole book — which theses still hold, which have evolved, which are broken. Especially valuable after a regime shift, a major macro surprise, or any event that should update priors broadly.
Memory required: Rich — documented theses for the top positions. If thin, divert to Mode 4.0 (or run Chapter 2 Mode 2.4 first). Do not produce a sweep on undocumented holdings; that is the failure mode this chapter is built to prevent.
Procedure:
- Memory check. Theses available for top positions? If thin, divert.
- List every position with a documented thesis.
- Classify each as Holds, Evolved, or Broken:
- Holds: core claim still supported by recent evidence.
- Evolved: core claim has changed shape — the position is now held for a different reason than at acquisition. Yellow flag, not always a problem.
- Broken: core claim is contradicted by evidence.
- For broken theses: state the specific evidence that broke them. Vague is not allowed.
- For evolved theses: name the new thesis explicitly. If the user can't name it, that's the signal.
- List undocumented positions separately — flag them for a Mode 2.4 walkthrough.
- Summarize the share of the book by thesis status.
Output contract:
## Thesis Status Sweep — [DATE]
**Holds:** [N positions, X% of book]
**Evolved:** [N positions, X% of book]
**Broken:** [N positions, X% of book]
**Undocumented:** [N positions, X% of book]
### Holds
| Position | Original thesis | Supporting evidence |
|---|---|---|
### Evolved
| Position | Original thesis | Current thesis (named) | What changed |
|---|---|---|---|
### Broken
| Position | Original thesis | Specific contradicting evidence |
|---|---|---|
### Undocumented
[Positions where no thesis was ever recorded — flag for Mode 2.4 walkthrough.]
### Questions to sit with
1. For evolved theses — is the new thesis one the user would have bought into at acquisition?
2. For broken theses — what is keeping the position in the book?
3. For undocumented positions — which one is most worth thinking about first?
**Review Quality:** Thin / Usable / Strong
**Why:** [what memory was available — typically theses captured via Ch2 Mode 2.4 / Ch3 Decision Records]
**Missing memory:** [theses still missing, prior status snapshots not available, etc.]
Mode 4.4 — Drawdown Post-Mortem
When to use: A meaningful drawdown — a position down >10% from its hold-period high, the book down >5% over a short window, or any drawdown the user names as worth reviewing.
Memory required: Usable — holdings during the drawdown plus ideally an action log (what the user did during it). Without an action log, the "what the user did" section degrades to current recollection, which is exactly the kind of memory hindsight distorts.
Procedure:
- Bound the drawdown precisely: peak date, trough date, peak-to-trough magnitude.
- Decompose: what fraction of the drawdown came from each driver — a single position, a factor (rates, dollar, growth), a regime shift, a known event.
- Compare to historical drawdowns of comparable shape. Was this drawdown in a normal range for the book's composition, or genuinely tail?
- Separate what was knowable in advance from what was a true surprise. Be honest about both — hindsight will tempt the user to call everything "obvious."
- Record what the user actually did during the drawdown — added, trimmed, held. Factual record from the action log; no judgment.
- The user may write an operating note to their future self. The AI prompts; the user authors. The AI does not prescribe.
Output contract:
## Drawdown Post-Mortem — [PERIOD]
**Magnitude:** [-X%] over [N days], peak [DATE] → trough [DATE]
**Recovery status:** [recovered / partial / ongoing]
### Decomposition
| Driver | Contribution to drawdown |
|---|---|
### Was it in range?
[Comparison to historical drawdowns for comparable book composition or asset class. Tail or in-range.]
### Knowable in advance vs. surprise
- **Knowable:** [signals that were present beforehand]
- **Surprise:** [what genuinely could not have been anticipated]
### What the user did
[Factual record of any actions during the drawdown — sourced from action log if available; otherwise flagged as recollection.]
### Operating note to future self
[User-authored. The AI prompts but does not prescribe.]
**Review Quality:** Thin / Usable / Strong
**Why:** [what memory was available]
**Missing memory:** [action-log gaps, position theses not documented at the time, etc.]
Mode 4.5 — Process Mirror
When to use: Quarterly. Once enough loop history has accumulated that patterns are visible — typically after three or more months of consistent use.
Memory required: Rich — Decision Records spanning multiple sessions plus invocation history. Without these, this mode produces noise. Divert to Mode 4.0 if memory is thin.
Status: v0.1 sketch. Sharpens as loop_invocations history thickens and Decision Records accumulate.
Procedure (sketch):
- Memory check. Decision Records across multiple sessions? If not, divert to Mode 4.0.
- Pull the user's invocation history — which chapters and modes they run, against which kinds of question.
- Where Decision Records exist and the relevant positions can be tracked, line up the outcomes alongside.
- Surface specific patterns, not generalities. Avoid "you should be more disciplined." Prefer "positions you opened after a Mode 3.2 walkthrough have outperformed positions opened without one by X% over the last Y months — small sample, but worth noting."
- Surface behavioral patterns visible in the data: tendency to add into drawdowns, tendency to trim winners, time-of-day patterns in decision-making.
- Reflect, don't prescribe. The user names the takeaway.
Output contract:
## Process Mirror — [QUARTER]
[Specific, evidence-grounded pattern observations from invocation history and Decision Records.]
**Review Quality:** Thin / Usable / Strong
**Why:** [what memory was available]
**Missing memory:** [Decision Records still missing, sessions not captured, etc.]
Chapter 4 compounds only if the user's AI keeps good records. The first run creates memory. Later runs extract lessons from it. Run 4.0 first when records are thin. Run 4.1 weekly. Run 4.2 when a name feels uncertain. Run 4.3 after regime shifts. Run 4.4 when the book asks for it. Run 4.5 quarterly, once history allows.
Source: The Jawz Loop, by Mako · Chapter 4 v0.1.0.