Skip to content

Scoring Methodology โ€‹

๐Ÿšง The scoring methodology is actively evolving. Weights, thresholds, and formula details will change as research findings are validated and promoted. Check the Scoring Changelog for a versioned record of what has changed. ๐Ÿšง

This document describes every step of the pipeline from raw source code to the ranked output Hotspots produces. Each stage builds on the previous one.


Pipeline overview โ€‹

Source Code
  โ†“
Raw Metrics  (CC, ND, FO, NS, LOC)
  โ†“
Risk Components  (log-scaled, bounded transforms)
  โ†“
Local Risk Score  (LRS) + Risk Band
  โ†“
Pattern Classification  (Tier 1: structural ยท Tier 2: enriched)
  โ†“
[optional enrichment: call graph, git churn, touch counts]
  โ†“
Activity Risk Score  (LRS + activity modifiers)
  โ†“
Driver Label  (primary dimension diagnosis)
  โ†“
Quadrant Assignment  (2-D: complexity ร— activity)
  โ†“
Ranked Output

Stages above the enrichment line run on source code alone and are always computed. Stages below require a git repository.


Step 1 โ€” Collect raw metrics โ€‹

Hotspots parses each source file and visits every function, extracting four structural measurements:

CC โ€” Cyclomatic Complexity The number of independent decision paths through the function. Every if, else if, loop, case, catch, &&, ||, and ternary adds one path. A function with no branches has CC 1.

ND โ€” Nesting Depth The maximum depth of nested control structures โ€” how many layers of if/loop/try are present at the deepest point.

FO โ€” Fan-Out The number of distinct functions called from within this function.

NS โ€” Non-Structured Exits The count of early returns, throws, breaks, and continues inside the function body, excluding the final tail return.

LOC โ€” Lines of Code Physical line count. Used only for pattern detection (see Step 4), not for the risk score itself.

See Metrics Reference for exact counting rules per language.


Step 2 โ€” Transform to risk components โ€‹

Raw metric values are passed through bounded, monotonic transforms before being combined:

R_cc = min(log2(CC + 1), 6.0)    # logarithmic, capped at 6
R_nd = min(ND, 8.0)              # linear, capped at 8
R_fo = min(log2(FO + 1), 6.0)   # logarithmic, capped at 6
R_ns = min(NS, 6.0)              # linear, capped at 6

Logarithmic scaling for CC and FO gives more weight to early growth than to increases at already-high values โ€” the marginal risk of going from CC 1 to CC 4 is larger than going from CC 40 to CC 44. Fan-out follows the same reasoning.

Linear scaling for ND and NS reflects that each additional nesting level or exit point contributes more uniformly to complexity in practice.

Caps prevent a single extreme metric from dominating the score. Each dimension is bounded independently so the combined score reflects overall structural complexity.


Step 3 โ€” Compute the Local Risk Score (LRS) โ€‹

The four risk components are combined into a single score using a weighted sum:

LRS = 1.0 ร— R_cc  +  0.8 ร— R_nd  +  0.6 ร— R_fo  +  0.7 ร— R_ns

Weight rationale:

  • CC (1.0) โ€” highest weight; control-flow complexity is the primary correlate of defect density and testing difficulty.
  • ND (0.8) โ€” nesting depth captures a dimension of complexity that CC alone can miss; a function can have moderate CC but still be hard to follow due to deep nesting.
  • NS (0.7) โ€” non-structured exits increase the number of implicit exit conditions and make postconditions harder to reason about.
  • FO (0.6) โ€” fan-out represents external coupling rather than internal complexity; weighted lower because some degree of fan-out is expected in most functions.

LRS is always โ‰ฅ 1.0. The theoretical maximum is 20.2 (all four components at their caps: 1.0ร—6 + 0.8ร—8 + 0.6ร—6 + 0.7ร—6). The theoretical minimum for a trivial single-path function with no nesting, calls, or exits is 1.0.

Risk bands:

BandLRS rangeMeaning
Criticalโ‰ฅ 9.0High structural risk
High6.0โ€“8.9Elevated structural risk
Moderate3.0โ€“5.9Moderate structural risk
Low< 3.0Low structural risk

See LRS Specification for the complete formula derivation, worked examples, and precision notes.


Step 4 โ€” Classify patterns โ€‹

Patterns are named labels that identify specific structural combinations. They complement LRS by describing what kind of issue a function has, not just its overall score. A function can match multiple patterns simultaneously.

Tier 1 โ€” structural (source code only) โ€‹

Detected from raw metrics alone; always computed:

PatternTriggerDescription
complex_branchingCC โ‰ฅ 10 and ND โ‰ฅ 4High branching combined with deep nesting
deeply_nestedND โ‰ฅ 5Maximum nesting depth at or above threshold
exit_heavyNS โ‰ฅ 5High number of non-structured exits
god_functionLOC โ‰ฅ 60 and FO โ‰ฅ 10Long function with high fan-out
long_functionLOC โ‰ฅ 80High physical line count

Tier 2 โ€” enriched (call graph + git data) โ€‹

Require git history and the call graph; computed only when that data is available:

PatternTriggerDescription
churn_magnetfile churn โ‰ฅ 200 lines and CC โ‰ฅ 8High complexity combined with high change volume
cyclic_hubSCC size โ‰ฅ 2 and fan-in โ‰ฅ 6Part of a dependency cycle with many callers
hub_functionfan-in โ‰ฅ 10 and CC โ‰ฅ 8High fan-in with high complexity
middle_manfan-in โ‰ฅ 8 and FO โ‰ฅ 8 and CC โ‰ค 4High fan-in and fan-out with low internal complexity
neighbor_riskneighbor churn โ‰ฅ 400 and FO โ‰ฅ 8High fan-out into frequently changing functions
shotgun_targetfan-in โ‰ฅ 8 and file churn โ‰ฅ 150Many callers in a frequently changed file
stale_complexCC โ‰ฅ 10 and LOC โ‰ฅ 60 and days since change โ‰ฅ 180High complexity with no recent changes

Derived pattern: volatile_god fires only when both god_function and churn_magnet are true.

All thresholds are configurable in .hotspotsrc.json. See Configuration.


Step 5 โ€” Compute the Activity Risk Score โ€‹

When git history is available, Hotspots extends LRS with activity signals:

Activity Risk = LRS
             + (lines_added + lines_deleted) / 100 ร— 0.5
             + min(touch_count_30d / 10, 5.0) ร— 0.3
             + max(0, 5.0 โˆ’ days_since_change / 7) ร— 0.2
             + min(fan_in / 5, 10.0) ร— 0.4
             + (scc_size, if in cycle, else 0) ร— 0.3
             + min(dependency_depth / 3, 5.0) ร— 0.1
             + neighbor_churn / 500 ร— 0.2

Each modifier is non-negative, so Activity Risk is always โ‰ฅ LRS. When no git data is available, Activity Risk equals LRS.

SignalWeightWhat it captures
Churn (lines added + deleted)0.5Volume of recent change
Fan-in (call-graph callers)0.4Number of functions that depend on this one
Touch count (30-day commits)0.3Frequency of recent modification
SCC membership0.3Presence in a dependency cycle
Recency (days since last change)0.2How recently the function was last modified
Neighbor churn0.2Change volume in called functions
Dependency depth0.1Depth in the call graph from entry points

Step 6 โ€” Assign driver labels โ€‹

Every function gets a single driver label identifying which dimension contributes most to its risk. Labels are assigned using population-relative percentile thresholds, computed independently per dimension across all functions in the current scope.

The label is assigned by checking dimensions in the following priority order:

LabelConditionInterpretation
cyclic_depFunction is part of a dependency cycleRisk is primarily structural โ€” a cycle in the call graph
high_complexityCC above P75Cyclomatic complexity is the dominant dimension
deep_nestingND above P75Nesting depth is the dominant dimension
high_fanout_churningFO above P75 and touches above P50High fan-out combined with active change
high_fanin_complexFan-in above P75 and CC above P50High caller count combined with elevated complexity
high_churn_low_ccTouches above P75 and CC below P25High activity relative to structural complexity
compositeNo single dimension clearly dominatesMultiple dimensions are elevated

Because percentiles are codebase-relative, the absolute metric value that triggers a label varies across repos.


Step 7 โ€” Assign quadrants โ€‹

Every function is placed in one of four quadrants by combining its risk band with its activity level:

Low activityHigh activity
High or Critical banddebtfire
Low or Moderate bandokwatch

Activity is considered high if either of the following is true:

  • 30-day touch count is above the population median, or
  • Function was changed within the last 30 days
QuadrantSignalTypical action
fireHigh complexity and high activityPrioritize for review or refactoring
debtHigh complexity, low activitySchedule for future refactoring
watchLow complexity, high activityMonitor for complexity increases
okLow complexity, low activityNo immediate action indicated

Note: a high Activity Risk score does not by itself place a function in fire. Quadrant is determined by band (from LRS) and activity independently. Always check quadrant alongside touches_30d for context.


Step 8 โ€” Rank output โ€‹

Default ranking (no trained ranker):

  1. LRS descending
  2. File path ascending (tiebreak)
  3. Line number ascending (tiebreak)
  4. Function name ascending (tiebreak)

With --mode snapshot triage view:

Functions are grouped by quadrant (fire โ†’ debt โ†’ watch โ†’ ok), then sorted by Activity Risk descending within each group.

With a trained ranker (hotspots train):

Functions are re-scored using a RandomForest model trained on the repo's bug-fix history and sorted by predicted probability descending. LRS, band, and quadrant remain in the output.


File risk score โ€‹

In addition to per-function scoring, Hotspots computes a per-file score for the file-risk view:

File Risk Score = max_cc ร— 0.4
               + avg_cc ร— 0.3
               + log2(function_count + 1) ร— 0.2
               + min(file_churn / 100, 10.0) ร— 0.1

The score weights the highest-complexity function most heavily, incorporates the average complexity distribution, accounts for file size by function count, and includes recent change volume. Files are ranked descending by this score.


Version history โ€‹

Every change to a formula, weight, threshold, or ranking rule is recorded in the Scoring Changelog.


Coming soon: ranker scoring โ€‹

The trained ranker layer is currently in active development. Once complete, the ranker will:

  • Assign a rank_score (predicted probability this function appears in a future bug-fix commit)
  • Surface functions that are statistically over-represented in past defects, even when LRS is moderate
  • Blend structural risk with historical signal rather than treating them as separate steps

The heuristic pipeline above will remain the default for repos without training data.

Released under the MIT License. ยท hotspots.dev