返回 Skill 列表
extension
分类: 效率与办公无需 API Key

probabilistic-thinking

每当用户需要在不确定性下进行推理、比较风险、在选项之间优先排序、根据新证据更新信念或在信息不完全的情况下做出决策时,应用概率和贝叶斯思维。触发短语如“几率是多少?”、“这种情况有多可能?”、“我应该担心X吗?”、“哪个风险更大?”、“这些数据改变了什么?”、“这是信号还是噪音?”、“概率是多少?”、“我们有多大的信心?”,或者在基于不完整或模糊的证据做决策的任何情况下。当有人将不确定的结果视为确定性,或者在没有量化的情况下松散地使用概率语言(如“可能”、“不太可能”、“非常可能”)时也应触发。不要忽视不确定性。

person作者: jakexiaohubgithub

Probabilistic & Bayesian Thinking

Core principle: Probabilistic thinking replaces vague confidence with calibrated estimates. Bayesian thinking updates those estimates as evidence arrives — neither clinging to priors nor overreacting to new data.


Core Concepts

Probability as Degree of Belief

"Will probably work" → 60%? 90%? Forcing a number exposes vague confidence and creates a baseline for updating.

Base Rates

Find the base rate before estimating a specific event — how often does this event type occur in a reference class?

"Will this feature succeed?" → What % of similar features in similar products succeeded?

Ignoring base rates (base rate fallacy) is a top reasoning error.

Bayesian Updating

Update proportionally — not by ignoring priors, not by overwriting them.

New Belief = Prior Belief × Weight of New Evidence
  • Prior: belief before evidence
  • Likelihood: P(evidence | hypothesis true) vs. false
  • Posterior: belief after evidence

Expected Value

EV = Probability × Value

A 10% chance of +€100 (EV = €10) beats a 90% chance of +€5 (EV = €4.50).

Confidence Intervals

Point estimates are usually wrong. Ranges are honest.

  • "4 weeks" → "3–7 weeks (80% confidence)"
  • Wide intervals on uncertain things = calibration, not weakness.

Output Format

Probability Estimates

| Claim | Prior | Evidence | Updated | Confidence | |-------|-------|----------|---------|------------| | "Feature will succeed" | 30% (base rate) | Strong user signal | 55% | Medium | | "Will ship on time" | 40% (historical) | Experienced team | 50% | Low |

Base Rate Check

  • Reference class for this situation?
  • Historical base rate for this outcome?
  • How does this case differ from base rate (and does that justify adjustment)?

Bayesian Update

  • Prior: belief before
  • New evidence: what we now know
  • Likelihood ratio: more consistent with hypothesis true or false?
  • Posterior: belief now
  • Update size: did evidence move the needle? (Strong evidence → large; weak → small.)

Expected Value Comparison

| Option | Probability | Value if succeeds | Value if fails | EV | |--------|------------|------------------|----------------|----| | A | 70% | +€50k | -€10k | +€32k | | B | 30% | +€200k | -€20k | +€46k |

Confidence Ranges

  • Optimistic (10th pct): [value]
  • Expected (50th pct): [value]
  • Pessimistic (90th pct): [value]
  • Black swan: [tail scenario]

Probability Hygiene Flags

  • Probabilities treated as certainties (0%/100%)? Almost nothing is certain.
  • Base rate ignored for the specific case?
  • Overreaction to latest evidence (anchoring)?
  • Conjunction fallacy? (P(A and B) < P(A) — more specific = lower probability)

Calibration Heuristics

Fermi Estimation — break unknowns into estimable parts:

  • "How many users?" → market size × awareness % × conversion % × retention %

Reference Class Forecasting — historical data from similar projects:

  • "This feature type took 4–8 weeks for 80% of teams in our class"

Outside View vs. Inside View:

  • Inside: "We're special, we'll beat the average"
  • Outside: "What does the data say for projects like this?"
  • Default outside. Adjust only with specific, strong evidence.

Pre-commit to what would change your mind:

  • "If we see X, I'll move probability from 60% to below 30%"
  • Prevents post-hoc rationalization.

Thinking Triggers

  • "What's the base rate?"
  • "Are we treating 70% like certainty?"
  • "What's the EV of each option, not just the upside?"
  • "How much should this evidence actually move our belief?"
  • "What would change our mind significantly?"
  • "Are we in the reference class we think we're in?"
  • "What's the downside, and are we weighting it correctly?"

Example Applications

  • "Should we build this?" → % of similar features that drove retention? Cost if it fails?
  • "A/B test showed a lift" → Sample size sufficient? Prior for this change type?
  • "We'll ship in 2 weeks" → Historical distribution? 80th percentile?
  • "Agent failed once — bug?" → Base rate of one-off failures? Evidence that would confirm systematic?