Skip to content

Conversation

@chatton
Copy link
Contributor

@chatton chatton commented Jan 28, 2026

Overview

This PR adds a compatibility test that will deploy a specified image, execute some transactions, then deploy the PR image and verify that syncing still works correctly.

closes #3019

Add EVMCompatTestSuite to test syncing between different ev-node
versions. The test:
- Starts sequencer with version A (env: SEQUENCER_EVM_IMAGE_*)
- Submits transactions
- Starts full node with version B (env: FULLNODE_EVM_IMAGE_*)
- Verifies full node syncs and can query transactions
- Submits post-sync transactions to verify ongoing sync

Add make test-docker-compat target for running the test.
@claude
Copy link
Contributor

claude bot commented Jan 28, 2026

Claude finished @chatton's task in 2m 35s —— View job


Code Review Complete

I've performed a comprehensive review of this PR, which adds an excellent cross-version compatibility test suite. Overall, the implementation is well-structured and addresses an important testing gap. Below are my findings organized by category.


Summary

This PR successfully introduces a compatibility test that validates cross-version syncing between different ev-node-evm versions. The new EVMCompatTestSuite deploys a sequencer with one image version, executes transactions, then deploys a full node with the PR image to verify syncing works correctly. The refactoring of Docker E2E utilities improves code reusability.

Strengths:

  • Well-designed test architecture that mirrors real-world upgrade scenarios
  • Clean refactoring with reusable helper functions
  • Proper use of testify suite patterns
  • Good separation of concerns between setup, execution, and verification

Key Issues to Address:

  1. Test flakiness risk from fixed sleep (line 68)
  2. Missing cleanup and resource management
  3. Hardcoded values and TODO comments need resolution
  4. Security considerations for secrets in logs

1. Code Quality & Architecture

Positive Aspects

  • Modular design: The refactoring of RethSetupConfig and introduction of EVMSingleSetupConfig promotes code reuse (docker_test.go:260-268, docker_test.go:371-394)
  • Clear separation: Test phases are well-organized using subtests with descriptive names
  • Shared constants: EVM test constants centralized in docker_test.go:42-47 reduces duplication

Issues

Critical: Fixed sleep introduces test flakiness (compat_test.go:68)

// wait for blocks to be posted to DA before starting full node
time.Sleep(5 * time.Second)

This is a reliability concern. The 5-second sleep is arbitrary and doesn't guarantee DA propagation completed. Gemini's suggestion to wait for additional blocks is good, but I'd recommend also verifying DA availability.

Recommendation: Replace with a conditional wait that polls for DA height or sequencer block production:

s.T().Log("Waiting for DA propagation...")
lastTxHeight, err := s.sequencerClient.BlockNumber(ctx)
s.Require().NoError(err)
targetHeight := lastTxHeight + 3 // Allow a few blocks for DA propagation
s.Require().Eventually(func() bool {
    h, err := s.sequencerClient.BlockNumber(ctx)
    return err == nil && h >= targetHeight
}, 30*time.Second, time.Second, "sequencer did not produce blocks for DA propagation")

Fix this →


2. Error Handling & Resilience

Good Practices

  • Consistent use of s.Require().NoError(err) with descriptive error messages
  • Proper context propagation throughout tests
  • Eventually assertions with reasonable timeouts (120s for sync, 30s for tx inclusion)

Issues

Missing cleanup logic (compat_test.go:45-92)
The test doesn't have explicit cleanup for Docker containers if the test fails mid-execution. While testify suite may handle some of this, explicit cleanup is more robust.

Recommendation: Add a cleanup function or defer statements:

func (s *EVMCompatTestSuite) TearDownTest() {
    ctx := context.Background()
    if s.sequencerNode != nil {
        _ = s.sequencerNode.StopAndRemove(ctx)
    }
    if s.fullNode != nil {
        _ = s.fullNode.StopAndRemove(ctx)
    }
    // Clean up celestia and DA resources
}

Timeout values could be configurable (compat_test.go:208, compat_test.go:368)
Hard-coded timeouts (120s, 30s) may be insufficient on slower CI runners or local development machines.


3. Security Considerations

Issues

JWT Secret exposed in logs (docker_test.go:302)

setup.JWTSecret = rethNode.JWTSecretHex()

While this is test code, it sets a pattern. Ensure JWT secrets are never logged at higher log levels.

Private key as constant (docker_test.go:44)

evmTestPrivateKey = "cece4f25ac74deb1468965160c7185e07dff413f23fcadb611b05ca37ab0a52e"

Good that it's clearly marked for testing, but add a comment warning against production use:

// evmTestPrivateKey is a TEST-ONLY private key. NEVER use in production.
evmTestPrivateKey = "cece4f25ac74deb1468965160c7185e07dff413f23fcadb611b05ca37ab0a52e"

Passphrase hardcoded (compat_test.go:103, upgrade_test.go:104)

WithEVMSignerPassphrase("secret")

Consider using a test constant or environment variable for consistency.


4. Performance & Resource Efficiency

Issues

Concurrent container operations not optimized (compat_test.go:49-60)
The test runs Celestia setup and sequencer setup sequentially. Consider if these could be parallelized since they're independent (though this may add complexity).

Transaction submission is sequential (compat_test.go:191-202)
Submitting 50 transactions one-by-one with synchronous waits is slow. Consider batch submission:

// Submit all txs first
for i := range count {
    tx := evm.GetRandomTransaction(...)
    s.Require().NoError(s.sequencerClient.SendTransaction(ctx, tx))
    txHashes = append(txHashes, tx.Hash())
}
// Then wait for all
for _, txHash := range txHashes {
    s.WaitForTxIncluded(ctx, s.sequencerClient, txHash)
}

30-minute timeout may be excessive (scripts/test.mk:72)

-timeout=30m

Consider if 20 minutes would suffice, or make it configurable.


5. Testing Strategy & Reliability

Positive Aspects

  • Comprehensive verification: Tests check transaction receipts, block numbers, and ongoing sync
  • Real-world scenario: Mimics actual upgrade path users would follow
  • Good test coverage: Both pre-sync and post-sync transactions verified

Issues

TODO comment unresolved (.github/workflows/docker-tests.yml:68)

SEQUENCER_EVM_IMAGE_TAG: main # TODO: set this to the last released tag

This should be resolved before merging. Using main for compatibility tests defeats the purpose of testing against stable releases.

Recommendation: Set this to the last stable release tag (e.g., v1.0.0-rc.1 or similar). If no release exists yet, document why main is acceptable.

Fix this →

Another TODO in upgrade test (upgrade_test.go:23)

// TODO: upgrade from previous released version instead of main
baseEVMSingleVersion = "main"

Same issue - should use a released version tag.

Test could verify more sync properties (compat_test.go:212-226)
Currently only checks transaction receipts and block numbers match. Consider also verifying:

  • Block hashes match
  • State root consistency
  • Receipt roots match
  • Gas used matches

6. Documentation & Developer Experience

Good Practices

  • Test functions have descriptive names
  • Log messages provide helpful progress indicators
  • Struct fields are clearly named

Issues

Missing package-level documentation (compat_test.go:3)
Add a package comment explaining the test's purpose:

// Package docker_e2e contains end-to-end tests that verify ev-node behavior
// in Docker environments, including cross-version compatibility and upgrades.

Function documentation incomplete (compat_test.go:172-189)
Functions like getSequencerP2PAddress could benefit from comments explaining the P2P address format and why node key parsing is necessary.

Makefile target needs documentation (scripts/test.mk:69-73)
The new test-docker-compat target is added but not listed in the project's main documentation (CLAUDE.md or README).

Recommendation: Update CLAUDE.md to document:

### Testing

- `make test-docker-compat` - Runs cross-version compatibility tests

7. Go Idioms & Best Practices

Issues

Context without cancellation (compat_test.go:46)

ctx := context.Background()

For long-running tests, consider using context.WithTimeout or context.WithCancel to enable early termination.

Magic number for transaction count (compat_test.go:64, compat_test.go:86)

preSyncTxHashes = s.submitTransactions(ctx, 50)
postSyncTxHashes := s.submitTransactions(ctx, 5)

Consider using named constants:

const (
    preSyncTxCount = 50
    postSyncTxCount = 5
)

Error messages could be more specific (compat_test.go:215)

s.Require().NoError(err, "failed to query tx %d on sequencer: %s", i, txHash.Hex())

Good, but could include the actual error for debugging:

s.Require().NoError(err, "failed to query tx %d (%s) on sequencer: %v", i, txHash.Hex(), err)

8. Dependency Updates

The PR updates several dependencies:

  • github.com/fatih/color: v1.15.0 → v1.18.0
  • github.com/google/flatbuffers: v24.12.23 → v25.1.24
  • github.com/hashicorp/go-hclog: v1.6.2 → v1.6.3
  • github.com/celestiaorg/tastora: version bump
  • Go version: 1.25.0 → 1.25.6

These updates appear routine and low-risk. The version bumps are minor/patch updates.

Recommendation: Ensure these dependency updates were tested and don't introduce breaking changes. The passing CI suggests they're compatible.


9. Refactoring Quality

The refactoring in docker_test.go is well-executed:

Positive:

  • RethSetupConfig struct groups related configuration (docker_test.go:260-268)
  • SetupEVMSingle helper reduces duplication (docker_test.go:377-394)
  • Shared constants promote consistency (docker_test.go:42-47)

Minor suggestion:
The EVMSingleSetupConfig struct (docker_test.go:371-375) returns both Chain and Node, but only Node is used in most places. Consider if Chain is necessary in the return type.


10. Workflow Configuration

The new workflow job test-docker-compat (.github/workflows/docker-tests.yml:53-71) is well-structured:

  • Proper permissions scoping
  • Environment variables correctly passed
  • Consistent with existing jobs

Issue: The TODO on line 68 (mentioned earlier) needs resolution.


Summary of Priority Issues

Must Fix Before Merge:

  1. Test flakiness risk: Replace time.Sleep with conditional wait (compat_test.go:68)
  2. Unresolved TODOs: Set proper version tags instead of "main" (.github/workflows/docker-tests.yml:68, upgrade_test.go:23)

Should Fix:

  1. Missing cleanup: Add TearDown method for proper resource cleanup
  2. Security comment: Add warning to test private key constant
  3. Documentation: Update CLAUDE.md with new make target

Nice to Have:

  1. Performance: Consider batch transaction submission
  2. Test verification: Add more sync property checks (block hashes, state roots)
  3. Magic numbers: Use named constants for transaction counts

Conclusion

This is a valuable addition to the test suite that will help catch compatibility regressions. The code is generally well-written and follows good testing practices. Addressing the test flakiness issue and resolving the TODO comments would make this PR ready to merge.

Great work on the implementation! The test architecture is solid and the refactoring improves maintainability.


@github-actions
Copy link
Contributor

github-actions bot commented Jan 28, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedJan 28, 2026, 10:51 AM

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @chatton, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the testing infrastructure by introducing a robust Docker E2E compatibility test. This new test ensures that different versions of the ev-node-evm can seamlessly synchronize, specifically verifying that a newer full node can correctly process and validate data from an older sequencer. The changes also include refactoring of test utilities to promote cleaner, more maintainable code and updating core dependencies to their latest versions, contributing to overall system stability and future compatibility.

Highlights

  • New Compatibility Test Suite: A new Docker E2E test suite, EVMCompatTestSuite, has been introduced to verify cross-version syncing capabilities of the ev-node-evm. This test deploys an older sequencer image, executes transactions, and then deploys the current PR image as a full node to ensure it can correctly sync and validate the historical transactions.
  • Docker E2E Test Utilities Refactoring: Existing Docker E2E test utilities have been refactored for improved reusability and clarity. This includes defining shared EVM test constants, renaming RethSetup to RethSetupConfig, and introducing EVMSingleSetupConfig and SetupEVMSingle helper functions to streamline node setup.
  • Dependency and Go Version Updates: The Go version has been updated from 1.25.0 to 1.25.6, and several Go module dependencies, including github.com/fatih/color, github.com/google/flatbuffers, github.com/hashicorp/go-hclog, and github.com/celestiaorg/tastora, have been updated to newer versions. New dependencies like github.com/libp2p/go-libp2p/core were also added.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/docker-tests.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable compatibility test suite to ensure syncing works across different node versions. The new EVMCompatTestSuite effectively simulates a real-world scenario where a new full node syncs from a sequencer running a different version. The changes also include beneficial refactoring of the existing test suite, such as centralizing test constants and creating reusable helper functions for setting up test nodes. My review includes one suggestion to improve the robustness of the new test by replacing a fixed sleep with a conditional wait.

Comment on lines +68 to +71
time.Sleep(5 * time.Second)

sequencerHeight, err := s.sequencerClient.BlockNumber(ctx)
s.Require().NoError(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a fixed time.Sleep can introduce flakiness into tests, as the time required for an operation can vary between test runs. A more robust approach is to wait for a specific condition. Here, you're waiting for blocks to be posted to the DA. While directly querying the DA might be complex, you can achieve a more reliable wait by ensuring a few more blocks are produced by the sequencer. This provides a time window for DA propagation that is tied to block production rather than a fixed duration.

s.T().Log("Waiting for a few blocks to allow for DA propagation...")
	lastTxHeight, err := s.sequencerClient.BlockNumber(ctx)
	s.Require().NoError(err)
	const daPropagationBlocks = 3
	targetHeight := lastTxHeight + daPropagationBlocks
	s.Require().Eventually(func() bool {
		h, err := s.sequencerClient.BlockNumber(ctx)
		return err == nil && h >= targetHeight
	}, 15*time.Second, time.Second, "sequencer did not produce new blocks for DA propagation")

	sequencerHeight, err := s.sequencerClient.BlockNumber(ctx)
	s.Require().NoError(err)

@codecov
Copy link

codecov bot commented Jan 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 55.61%. Comparing base (4a19108) to head (22a4b0b).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3023      +/-   ##
==========================================
+ Coverage   55.55%   55.61%   +0.06%     
==========================================
  Files         116      116              
  Lines       11477    11477              
==========================================
+ Hits         6376     6383       +7     
+ Misses       4401     4394       -7     
  Partials      700      700              
Flag Coverage Δ
combined 55.61% <ø> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[E2E] Sync Compatibility Test

2 participants