Scoring Methodology โ
๐ง The scoring methodology is actively evolving. Weights, thresholds, and formula details will change as research findings are validated and promoted. Check the Scoring Changelog for a versioned record of what has changed. ๐ง
This document describes every step of the pipeline from raw source code to the ranked output Hotspots produces. Each stage builds on the previous one.
Pipeline overview โ
Source Code
โ
Raw Metrics (CC, ND, FO, NS, LOC)
โ
Risk Components (log-scaled, bounded transforms)
โ
Local Risk Score (LRS) + Risk Band
โ
Pattern Classification (Tier 1: structural ยท Tier 2: enriched)
โ
[optional enrichment: call graph, git churn, touch counts]
โ
Activity Risk Score (LRS + activity modifiers)
โ
Driver Label (primary dimension diagnosis)
โ
Quadrant Assignment (2-D: complexity ร activity)
โ
Ranked OutputStages above the enrichment line run on source code alone and are always computed. Stages below require a git repository.
Step 1 โ Collect raw metrics โ
Hotspots parses each source file and visits every function, extracting four structural measurements:
CC โ Cyclomatic Complexity The number of independent decision paths through the function. Every if, else if, loop, case, catch, &&, ||, and ternary adds one path. A function with no branches has CC 1.
ND โ Nesting Depth The maximum depth of nested control structures โ how many layers of if/loop/try are present at the deepest point.
FO โ Fan-Out The number of distinct functions called from within this function.
NS โ Non-Structured Exits The count of early returns, throws, breaks, and continues inside the function body, excluding the final tail return.
LOC โ Lines of Code Physical line count. Used only for pattern detection (see Step 4), not for the risk score itself.
See Metrics Reference for exact counting rules per language.
Step 2 โ Transform to risk components โ
Raw metric values are passed through bounded, monotonic transforms before being combined:
R_cc = min(log2(CC + 1), 6.0) # logarithmic, capped at 6
R_nd = min(ND, 8.0) # linear, capped at 8
R_fo = min(log2(FO + 1), 6.0) # logarithmic, capped at 6
R_ns = min(NS, 6.0) # linear, capped at 6Logarithmic scaling for CC and FO gives more weight to early growth than to increases at already-high values โ the marginal risk of going from CC 1 to CC 4 is larger than going from CC 40 to CC 44. Fan-out follows the same reasoning.
Linear scaling for ND and NS reflects that each additional nesting level or exit point contributes more uniformly to complexity in practice.
Caps prevent a single extreme metric from dominating the score. Each dimension is bounded independently so the combined score reflects overall structural complexity.
Step 3 โ Compute the Local Risk Score (LRS) โ
The four risk components are combined into a single score using a weighted sum:
LRS = 1.0 ร R_cc + 0.8 ร R_nd + 0.6 ร R_fo + 0.7 ร R_nsWeight rationale:
- CC (1.0) โ highest weight; control-flow complexity is the primary correlate of defect density and testing difficulty.
- ND (0.8) โ nesting depth captures a dimension of complexity that CC alone can miss; a function can have moderate CC but still be hard to follow due to deep nesting.
- NS (0.7) โ non-structured exits increase the number of implicit exit conditions and make postconditions harder to reason about.
- FO (0.6) โ fan-out represents external coupling rather than internal complexity; weighted lower because some degree of fan-out is expected in most functions.
LRS is always โฅ 1.0. The theoretical maximum is 20.2 (all four components at their caps: 1.0ร6 + 0.8ร8 + 0.6ร6 + 0.7ร6). The theoretical minimum for a trivial single-path function with no nesting, calls, or exits is 1.0.
Risk bands:
| Band | LRS range | Meaning |
|---|---|---|
| Critical | โฅ 9.0 | High structural risk |
| High | 6.0โ8.9 | Elevated structural risk |
| Moderate | 3.0โ5.9 | Moderate structural risk |
| Low | < 3.0 | Low structural risk |
See LRS Specification for the complete formula derivation, worked examples, and precision notes.
Step 4 โ Classify patterns โ
Patterns are named labels that identify specific structural combinations. They complement LRS by describing what kind of issue a function has, not just its overall score. A function can match multiple patterns simultaneously.
Tier 1 โ structural (source code only) โ
Detected from raw metrics alone; always computed:
| Pattern | Trigger | Description |
|---|---|---|
complex_branching | CC โฅ 10 and ND โฅ 4 | High branching combined with deep nesting |
deeply_nested | ND โฅ 5 | Maximum nesting depth at or above threshold |
exit_heavy | NS โฅ 5 | High number of non-structured exits |
god_function | LOC โฅ 60 and FO โฅ 10 | Long function with high fan-out |
long_function | LOC โฅ 80 | High physical line count |
Tier 2 โ enriched (call graph + git data) โ
Require git history and the call graph; computed only when that data is available:
| Pattern | Trigger | Description |
|---|---|---|
churn_magnet | file churn โฅ 200 lines and CC โฅ 8 | High complexity combined with high change volume |
cyclic_hub | SCC size โฅ 2 and fan-in โฅ 6 | Part of a dependency cycle with many callers |
hub_function | fan-in โฅ 10 and CC โฅ 8 | High fan-in with high complexity |
middle_man | fan-in โฅ 8 and FO โฅ 8 and CC โค 4 | High fan-in and fan-out with low internal complexity |
neighbor_risk | neighbor churn โฅ 400 and FO โฅ 8 | High fan-out into frequently changing functions |
shotgun_target | fan-in โฅ 8 and file churn โฅ 150 | Many callers in a frequently changed file |
stale_complex | CC โฅ 10 and LOC โฅ 60 and days since change โฅ 180 | High complexity with no recent changes |
Derived pattern: volatile_god fires only when both god_function and churn_magnet are true.
All thresholds are configurable in .hotspotsrc.json. See Configuration.
Step 5 โ Compute the Activity Risk Score โ
When git history is available, Hotspots extends LRS with activity signals:
Activity Risk = LRS
+ (lines_added + lines_deleted) / 100 ร 0.5
+ min(touch_count_30d / 10, 5.0) ร 0.3
+ max(0, 5.0 โ days_since_change / 7) ร 0.2
+ min(fan_in / 5, 10.0) ร 0.4
+ (scc_size, if in cycle, else 0) ร 0.3
+ min(dependency_depth / 3, 5.0) ร 0.1
+ neighbor_churn / 500 ร 0.2Each modifier is non-negative, so Activity Risk is always โฅ LRS. When no git data is available, Activity Risk equals LRS.
| Signal | Weight | What it captures |
|---|---|---|
| Churn (lines added + deleted) | 0.5 | Volume of recent change |
| Fan-in (call-graph callers) | 0.4 | Number of functions that depend on this one |
| Touch count (30-day commits) | 0.3 | Frequency of recent modification |
| SCC membership | 0.3 | Presence in a dependency cycle |
| Recency (days since last change) | 0.2 | How recently the function was last modified |
| Neighbor churn | 0.2 | Change volume in called functions |
| Dependency depth | 0.1 | Depth in the call graph from entry points |
Step 6 โ Assign driver labels โ
Every function gets a single driver label identifying which dimension contributes most to its risk. Labels are assigned using population-relative percentile thresholds, computed independently per dimension across all functions in the current scope.
The label is assigned by checking dimensions in the following priority order:
| Label | Condition | Interpretation |
|---|---|---|
cyclic_dep | Function is part of a dependency cycle | Risk is primarily structural โ a cycle in the call graph |
high_complexity | CC above P75 | Cyclomatic complexity is the dominant dimension |
deep_nesting | ND above P75 | Nesting depth is the dominant dimension |
high_fanout_churning | FO above P75 and touches above P50 | High fan-out combined with active change |
high_fanin_complex | Fan-in above P75 and CC above P50 | High caller count combined with elevated complexity |
high_churn_low_cc | Touches above P75 and CC below P25 | High activity relative to structural complexity |
composite | No single dimension clearly dominates | Multiple dimensions are elevated |
Because percentiles are codebase-relative, the absolute metric value that triggers a label varies across repos.
Step 7 โ Assign quadrants โ
Every function is placed in one of four quadrants by combining its risk band with its activity level:
| Low activity | High activity | |
|---|---|---|
| High or Critical band | debt | fire |
| Low or Moderate band | ok | watch |
Activity is considered high if either of the following is true:
- 30-day touch count is above the population median, or
- Function was changed within the last 30 days
| Quadrant | Signal | Typical action |
|---|---|---|
fire | High complexity and high activity | Prioritize for review or refactoring |
debt | High complexity, low activity | Schedule for future refactoring |
watch | Low complexity, high activity | Monitor for complexity increases |
ok | Low complexity, low activity | No immediate action indicated |
Note: a high Activity Risk score does not by itself place a function in fire. Quadrant is determined by band (from LRS) and activity independently. Always check quadrant alongside touches_30d for context.
Step 8 โ Rank output โ
Default ranking (no trained ranker):
- LRS descending
- File path ascending (tiebreak)
- Line number ascending (tiebreak)
- Function name ascending (tiebreak)
With --mode snapshot triage view:
Functions are grouped by quadrant (fire โ debt โ watch โ ok), then sorted by Activity Risk descending within each group.
With a trained ranker (hotspots train):
Functions are re-scored using a RandomForest model trained on the repo's bug-fix history and sorted by predicted probability descending. LRS, band, and quadrant remain in the output.
File risk score โ
In addition to per-function scoring, Hotspots computes a per-file score for the file-risk view:
File Risk Score = max_cc ร 0.4
+ avg_cc ร 0.3
+ log2(function_count + 1) ร 0.2
+ min(file_churn / 100, 10.0) ร 0.1The score weights the highest-complexity function most heavily, incorporates the average complexity distribution, accounts for file size by function count, and includes recent change volume. Files are ranked descending by this score.
Version history โ
Every change to a formula, weight, threshold, or ranking rule is recorded in the Scoring Changelog.
Coming soon: ranker scoring โ
The trained ranker layer is currently in active development. Once complete, the ranker will:
- Assign a
rank_score(predicted probability this function appears in a future bug-fix commit) - Surface functions that are statistically over-represented in past defects, even when LRS is moderate
- Blend structural risk with historical signal rather than treating them as separate steps
The heuristic pipeline above will remain the default for repos without training data.