Skip to content

Conversation

@pthmas
Copy link
Contributor

@pthmas pthmas commented Jan 16, 2026

Overview

  • Adds a height-based pruning mechanism to the ev-node store:
    • Prunes /h/{height}, /d/{height}, /c/{height}, /i/{hash} up to a target height.
    • Also prunes per-height DA metadata (height → DA height mapping) for pruned heights.
    • Leaves /s/{height} (state), /t (current height), and global metadata keys untouched.
    • Tracks progress via a last-pruned-block-height metadata key so pruning is monotonic and idempotent.
  • Integrates pruning into block production:
    • After a block is committed, at configurable intervals, the executor computes a prune target and calls store.PruneBlocks synchronously.
  • Wires pruning into the go-header store:
    • Sets the go-header pruning window to PruningKeepRecent * BlockTime so header/data retention matches ev-node’s window.
  • Adds pruning for EVM execution metadata:
    • EVMStore exposes PruneExecMeta to delete per-height ExecMeta entries up to a target height.
    • The EVM executor implements an optional ExecMetaPruner interface, and the block executor calls it from the same pruning hook.
    • Non-EVM (e.g. ABCI) executors don’t implement this interface, so they are unaffected.

Config

New node config fields / flags:

  • --evnode.node.pruning_enabled
  • --evnode.node.pruning_keep_recent
  • --evnode.node.pruning_interval

Pruning actually runs only when all three are set to non-trivial values (enabled, keep_recent > 0, interval > 0).

Design trade-offs

  • Runs in the critical path:
    Pruning is called synchronously in produceBlock. On pruning heights, block production will incur extra latency proportional to the amount of data pruned and underlying storage performance.

  • Best-effort pruning:
    If pruning fails, we log the error but do not fail the block. This avoids halting the node but means disk usage can grow silently if pruning keeps failing; operators will need to watch logs/metrics.

  • Scope:
    This PR prunes ev-node’s own store, the go-header store, and EVM ExecMeta. Pruning of ABCI state (ev-abci) is out of scope and will be added in a follow-up once ev-abci exposes its own pruning API.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 16, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedJan 28, 2026, 10:03 AM

@claude
Copy link
Contributor

claude bot commented Jan 16, 2026

Claude finished @pthmas's task in 2m 7s —— View job


Comprehensive Code Review Complete

I've completed a thorough review of the block pruning implementation. Below is my analysis organized by the categories you requested.


1. Code Quality & Architecture

Strengths

  • Clean separation of concerns: Pruning logic is well-isolated in PruneBlocks (pkg/store/store.go:290-381) and PruneExecMeta (execution/evm/store.go:148-194)
  • Idiomatic Go: Proper use of context, error wrapping with fmt.Errorf, and consistent naming conventions
  • Modular design: Optional ExecMetaPruner interface allows EVM-specific pruning without coupling to non-EVM executors
  • Good interface design: Pruner interface (pkg/store/types.go:85-94) is well-documented with clear semantics

⚠️ Potential Issues

  1. Performance - Critical Path Concern (block/internal/executing/executor.go:549-575)

    • Pruning runs synchronously during block production: if err := e.store.PruneBlocks(e.ctx, targetHeight); err != nil
    • On pruning intervals, block latency will spike proportionally to the amount of data pruned
    • Impact: With default config (keep 10k blocks, prune every 1000 blocks), each pruning cycle deletes 1000 blocks worth of data
    • Recommendation: Consider async pruning with a background goroutine, or add metrics/logging to track pruning duration
  2. Configuration Validation Edge Case (pkg/config/config.go:389-401)

    • When PruningEnabled=true but PruningKeepRecent=0, error says "must be > 0 when pruning is enabled"
    • This is correct, but the comment at line 269 says "(0 = keep all)" which contradicts the validation
    • Fix: Update comment to clarify that pruning_enabled=false is the way to keep all blocks, not keep_recent=0
    • Fix config comment contradiction →
  3. go-header Pruning Window Calculation (pkg/sync/sync_service.go:259-268)

    pruningWindow := syncService.conf.Node.BlockTime.Duration * time.Duration(syncService.conf.Node.PruningKeepRecent)
    • This multiplies BlockTime * PruningKeepRecent to approximate a time window
    • Issue: Assumes constant block time, which may not hold during network issues or lazy mode
    • Recommendation: Document this assumption or consider tracking actual wall-clock time per height

2. Error Handling & Resilience

Strengths

  • Best-effort pruning: Errors are logged but don't halt block production (executor.go:560-561, 570-571)
  • Graceful degradation: Missing headers during pruning are skipped (store.go:325-331)
  • Clear error messages: Context is provided with height info in all error paths
  • Monotonic progress: LastPrunedBlockHeightKey ensures pruning can resume after crashes

⚠️ Potential Issues

  1. Silent Disk Space Growth (block/internal/executing/executor.go:559-561)

    if err := e.store.PruneBlocks(e.ctx, targetHeight); err != nil {
        e.logger.Error().Err(err).Uint64("target_height", targetHeight).Msg("failed to prune old block data")
    }
    • Pruning failures are only logged, not surfaced to operators via metrics
    • Impact: If pruning fails repeatedly (e.g., permission issues, disk full), disk will fill without alerting
    • Recommendation: Add a Prometheus counter for pruning failures and track last successful prune height
    • Add pruning failure metrics →%20Add%20a%20counter%20for%20pruning_failures_total%2C%202)%20Add%20a%20gauge%20for%20last_successful_prune_height.%20Increment%20failures%20at%20line%20560%20and%20update%20success%20gauge%20after%20line%20562.&repo=evstack/ev-node&branch=pierrick/prunning)
  2. Context Cancellation Handling (pkg/store/store.go:322-369)

    • The pruning loop doesn't check ctx.Err() during iteration
    • Impact: On shutdown, pruning may run to completion instead of cancelling promptly
    • Fix: Add periodic context checks:
    for h := lastPruned + 1; h <= height; h++ {
        if h % 100 == 0 {  // Check every 100 heights
            if err := ctx.Err(); err != nil {
                return fmt.Errorf("pruning cancelled: %w", err)
            }
        }
        // ... existing pruning logic
    }
  3. Batch Commit Failure Recovery (pkg/store/store.go:376-378)

    • If batch.Commit() fails after deleting entries, the LastPrunedBlockHeightKey is not updated
    • Impact: Next pruning run will re-attempt the same range, which is safe but inefficient
    • Severity: Low (idempotency handles this correctly)

3. Security

Strengths

  • No external input is used in pruning decisions (height is derived from config + current height)
  • No privilege escalation vectors
  • DA metadata pruning prevents information leakage from pruned blocks

⚠️ Potential Issues

  1. No Rate Limiting on Pruning Interval (pkg/config/config.go:392-393)
    if c.Node.PruningInterval == 0 {
        return fmt.Errorf("pruning_interval must be >= 1 when pruning is enabled")
    }
    • Issue: PruningInterval=1 is allowed, causing pruning on every block
    • Impact: DoS risk - operator misconfiguration could severely degrade performance
    • Recommendation: Add minimum interval check (e.g., >= 10) or document performance implications clearly

4. Performance & Resource Efficiency

⚠️ Critical Concerns

  1. Synchronous Pruning in Block Production (executor.go:553-575)

    • Measured Impact: Deleting 1000 blocks (headers + data + signatures + indices + DA metadata) in a single batch
    • Latency Spike: On a slow disk, this could add 100-500ms to block production
    • Risk: If pruning takes longer than BlockTime, blocks will be delayed
    • Recommendations:
      • Short-term: Add timing metrics to measure actual impact: e.metrics.OperationDuration["pruning"].Observe(duration)
      • Long-term: Move pruning to background goroutine with separate error handling
  2. Memory Usage (pkg/store/store.go:322-369)

    • Each deleted key requires a batch operation entry in memory
    • Pruning 1000 blocks = ~5000 delete operations (header, data, sig, index, 2x DA metadata per height)
    • Estimate: ~500KB batch buffer for default config
    • Severity: Low (acceptable memory overhead)
  3. Disk I/O Amplification (execution/evm/store.go:173-180)

    • EVM ExecMeta pruning iterates sequentially: for h := lastPruned + 1; h <= height; h++
    • Each height requires a separate key deletion
    • Impact: With default config, 1000 individual delete operations
    • Optimization: Could batch deletes using a range scan instead of individual key deletions (if datastore supports it)
  4. go-header Store Pruning (pkg/sync/sync_service.go:263-268)

    • Sets pruning window to PruningKeepRecent * BlockTime
    • This is an approximation - headers may be pruned slightly earlier/later than ev-node blocks
    • Recommendation: Document this timing discrepancy in CLAUDE.md or code comments

5. Testing & Reliability

Strengths

  • Comprehensive store tests: TestPruneBlocks_RemovesOldBlockDataOnly validates correctness (pkg/store/store_test.go:737-814)
  • Edge cases covered: Empty store, repeated calls, partial failures
  • EVM store tests: TestPruneExecMeta_PrunesUpToTargetHeight and TestPruneExecMeta_TracksLastPrunedHeight (execution/evm/store_test.go:20-100)

⚠️ Gaps

  1. Missing Integration Test

    • No end-to-end test verifying pruning actually runs during block production
    • Recommendation: Add test that:
      • Produces blocks past PruningInterval
      • Verifies old blocks are deleted
      • Confirms recent blocks remain accessible
  2. No Performance Tests

    • No benchmark for pruning 1000+ blocks
    • Recommendation: Add benchmark:
    func BenchmarkPruneBlocks(b *testing.B) {
        // Setup: Create 10k blocks
        // Benchmark: Prune 1k blocks repeatedly
    }
  3. Missing Error Simulation

    • Tests don't simulate batch commit failures during pruning
    • Recommendation: Add test with mock datastore that fails commit

6. Documentation & Developer Experience

Strengths

  • Clear PR description with design tradeoffs
  • Well-commented pruning functions with behavior documentation
  • Config fields have helpful YAML comments (pkg/config/config.go:268-270)

⚠️ Improvements Needed

  1. Update CLAUDE.md (per review comments)

    • Add section on pruning configuration and tradeoffs
    • Document performance impact of pruning intervals
    • Example:
    ## Pruning Configuration
    
    Height-based pruning removes old block data to manage disk usage:
    - `pruning_enabled`: Enable/disable pruning (default: false)
    - `pruning_keep_recent`: Blocks to retain (default: 0 = archive mode)
    - `pruning_interval`: Prune every N blocks (default: 0)
    
    **Performance Impact**: Pruning runs synchronously during block production.
    Expect latency spikes every N blocks proportional to the amount pruned.
  2. Missing Operational Guidance

    • No documentation on monitoring pruning health
    • Recommendation: Document:
      • Expected disk savings
      • How to verify pruning is working (check last-pruned-block-height metadata)
      • What to do if pruning fails repeatedly
  3. Flag Help Text (pkg/config/config.go:464-466)

    • Good descriptions, but could clarify performance implications:
    cmd.Flags().Uint64(FlagPrefixEvnode+"node.pruning_interval", def.Node.PruningInterval, 
        "run pruning every N blocks (must be >= 1 when pruning is enabled); larger values reduce performance impact but prune less frequently")

Summary & Recommendations

Priority Fixes

  1. 🔴 High Priority

    • Add pruning failure metrics to prevent silent disk space growth
    • Consider async pruning or add performance monitoring
    • Fix config comment contradiction (keep_recent=0 vs archive mode)
  2. 🟡 Medium Priority

    • Add minimum PruningInterval validation (prevent interval=1 DoS)
    • Add context cancellation checks in pruning loops
    • Document go-header pruning window approximation
  3. 🟢 Nice to Have

    • Add integration test for end-to-end pruning
    • Add pruning performance benchmarks
    • Update CLAUDE.md with pruning operational guidance

Overall Assessment

The implementation is well-structured with good error handling and idempotency. The main concerns are:

  • Performance: Synchronous pruning in critical path (acceptable for now, but needs monitoring)
  • Observability: Silent failures could lead to disk space issues
  • Documentation: Operational guidance would help users configure pruning safely

The code is ready for merge with the high-priority fixes addressed.


Copy link
Member

@julienrbrt julienrbrt Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we have a lot of different stores and keys. Are we planning to prune as well:

(and eventually ev-abci store)

It will be inconsistent if we don't allow height pruning of all those stores (at once).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this PR to prune metadats, go-header store and evm store ✅
I'm happy to work on prunning ev-abci next, but I would probably do it in another PR to prevent this PR from getting too big if that's ok.

@pthmas pthmas changed the title Block Prunning feat: block Prunning Jan 26, 2026
@pthmas pthmas changed the title feat: block Prunning feat: block Pruning Jan 26, 2026
@pthmas pthmas force-pushed the pierrick/prunning branch from 3e1e8e0 to ed66fe6 Compare January 26, 2026 15:01
@pthmas pthmas force-pushed the pierrick/prunning branch from 7708393 to a5110b1 Compare January 28, 2026 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants