Fix SMT2 output determinism by using ordered maps #8830

tautschnig · 2026-01-23T12:57:41Z

This commit addresses the issue where SMT2 formula output varied even when semantically equivalent C code was provided with different source formatting (e.g., adding empty lines).

Root Cause:

Several key data structures in smt2_conv.h used std::unordered_map, which iterates in hash-dependent order. When source code formatting changes:

Line numbers change
Expression hashes change
Symbols are stored/iterated in different orders
SMT2 declarations appear in different orders

The ensuing changes to the SMT formula content -- even though the formulae were semantically equivalent -- caused variability in SMT-solver performance. Producing stable output should ensure stability in solver performance. For the test case provided in #8820 this now yields: SMT files: 0 lines differ (where previously it was 8 lines differing).

Fixes: #8820

Each commit message has a non-empty body, explaining why the change was made.
n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
n/a My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

This commit addresses the issue where SMT2 formula output varied even when semantically equivalent C code was provided with different source formatting (e.g., adding empty lines). Root Cause: ----------- Several key data structures in smt2_conv.h used std::unordered_map, which iterates in hash-dependent order. When source code formatting changes: 1. Line numbers change 2. Expression hashes change 3. Symbols are stored/iterated in different orders 4. SMT2 declarations appear in different orders The ensuing changes to the SMT formula content -- even though the formulae were semantically equivalent -- caused variability in SMT-solver performance. Producing stable output should ensure stability in solver performance. For the test case provided in diffblue#8820 this now yields: `SMT files: 0 lines differ` (where previously it was 8 lines differing). Fixes: diffblue#8820

Copilot

Pull request overview

Improves determinism of generated SMT2 output (and thus solver performance stability) by making key generation steps order-stable and adding a regression test.

Changes:

Replaced several std::unordered_map members in the SMT2 converter with ordered maps and added determinism-oriented comments.
Made goto_symext::phi_function process SSA delta merges in a deterministic order.
Added a regression test asserting stable ordering of SMT2 declarations.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
src/solvers/smt2/smt2_conv.h	Switches multiple `unordered_map` members to `map` to attempt stable ordering.
src/solvers/smt2/smt2_conv.cpp	Adds determinism-related commentary around footer emission and symbol discovery.
src/goto-symex/symex_goto.cpp	Sorts phi-merge processing by identifier to stabilize downstream equation / SMT2 output order.
regression/cbmc/deterministic-smt-output/test.desc	New regression asserting consistent declaration ordering (x before y).
regression/cbmc/deterministic-smt-output/main.c	Minimal program to trigger and validate deterministic SMT2 output ordering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-23T13:11:01Z

src/solvers/smt2/smt2_conv.h

+  // Using std::map instead of std::unordered_map to ensure deterministic
+  // iteration order for SMT2 output. This prevents non-deterministic output
+  // when source code formatting changes (e.g., adding empty lines).
+  typedef std::map<irep_idt, identifiert> identifier_mapt;


This change intends deterministic SMT2 output ordering, but std::map<irep_idt, ...> does not sort by identifier text: irep_idt (dstringt) compares by internal table index (no), which can vary with string-interning insertion order (see src/util/dstring.h:126-128). To make ordering truly stable by identifier name, use a comparator based on irep_idt::compare / id2string (or store std::string keys).

Copilot · 2026-01-23T13:11:01Z

src/solvers/smt2/smt2_conv.h

+  /// smt2 formula. Using std::map for deterministic iteration order.
+  std::map<irep_idt, bool> set_values;


set_values is only used for lookups (e.g., set_values.find(...) in smt2_dec.cpp), not for iteration-based output. Switching from std::unordered_map to std::map therefore doesn’t improve SMT2 output determinism, but does change expected complexity from average O(1) to O(log n) on a potentially large map.

If determinism is not required here, consider keeping std::unordered_map (or otherwise justify the ordering requirement).

Copilot · 2026-01-23T13:11:01Z

src/solvers/smt2/smt2_conv.cpp

+/// for all symbols (variables) found. Symbols are added to identifier_map,
+/// which uses std::map to ensure deterministic iteration order.
+///
+/// Determinism: Since identifier_map is ordered by identifier name, symbols
+/// will always be declared in the same order in the SMT2 output, regardless
+/// of the order in which they are discovered during traversal. This ensures
+/// that semantically equivalent C code produces identical SMT2 output even
+/// if source formatting (e.g., empty lines) changes line numbers and hash codes.


The new doc comment is misleading:

identifier_map being a std::map<irep_idt, ...> does not imply ordering by identifier name (irep_idt/dstringt orders by internal table index; see src/util/dstring.h:126-128).

More importantly, find_symbols emits (declare-fun ...) immediately when a symbol is first inserted, so declaration order still depends on discovery / traversal order, not on iterating identifier_map.

Either adjust the documentation to match the actual behavior, or change the implementation to collect symbols first and then output declarations in a sorted order based on identifier text.

Suggested change

/// for all symbols (variables) found. Symbols are added to identifier_map,

/// which uses std::map to ensure deterministic iteration order.

///

/// Determinism: Since identifier_map is ordered by identifier name, symbols

/// will always be declared in the same order in the SMT2 output, regardless

/// of the order in which they are discovered during traversal. This ensures

/// that semantically equivalent C code produces identical SMT2 output even

/// if source formatting (e.g., empty lines) changes line numbers and hash codes.

/// for all symbols (variables) it encounters. When a symbol is seen for the

/// first time during traversal, it is inserted into identifier_map and its

/// SMT2 declaration is emitted immediately.

///

/// Note: identifier_map is a std::map keyed by irep_idt and is used to track

/// which symbols have already been declared. Its key ordering is defined by

/// irep_idt and is not guaranteed to correspond to lexicographic ordering of

/// identifier text, and declaration order in the SMT2 output follows the

/// expression traversal order rather than iteration over identifier_map.

Copilot · 2026-01-23T13:11:02Z

src/goto-symex/symex_goto.cpp

+  std::map<std::string, symex_renaming_levelt::delta_viewt::const_iterator>
+    ordered_names_to_merge;
+  for(auto it = delta_view.begin(); it != delta_view.end(); ++it)


std::map is used here, but this translation unit doesn’t include <map> (and none of the existing includes appear to). Please add #include <map> to avoid relying on transitive includes / potential build breaks.

Copilot · 2026-01-23T13:11:02Z

src/goto-symex/symex_goto.cpp

+  std::map<std::string, symex_renaming_levelt::delta_viewt::const_iterator>
+    ordered_names_to_merge;
+  for(auto it = delta_view.begin(); it != delta_view.end(); ++it)
  {
+    const ssa_exprt &ssa = it->m.first;
+    bool inserted =
+      ordered_names_to_merge.insert({id2string(ssa.get_identifier()), it})
+        .second;
+    CHECK_RETURN(inserted);
+  }


This change makes phi_function deterministic, but it also changes complexity from a single O(n) scan to building an ordered std::map (O(n log n)) plus per-entry std::string allocations (id2string(...) copied into the map key). Given this is likely a hot path, consider collecting iterators into a std::vector and sorting them with a comparator over identifier text to reduce allocations/overhead while preserving determinism.

Copilot · 2026-01-23T13:11:02Z

src/solvers/smt2/smt2_conv.h

+  // Using std::map for deterministic iteration order of converters
+  using converterst = std::map<irep_idt, std::function<void(const exprt &)>>;


irep_idt ordering in std::map is not lexicographic: dstringt::operator< compares the internal table index (no), which depends on string-interning insertion order (see src/util/dstring.h:126-128). If determinism is meant to be by identifier text (as suggested by the regression test / comments), this map should use a comparator based on irep_idt::compare / id2string rather than the default std::less<irep_idt>.

This affects determinism claims for converter ordering here (and similarly for the other new std::map<irep_idt, ...> members in this header).

Suggested change

// Using std::map for deterministic iteration order of converters

using converterst = std::map<irep_idt, std::function<void(const exprt &)>>;

// Comparator to ensure lexicographic ordering by identifier text

struct irep_id_lexicographic_less

{

bool operator()(const irep_idt &lhs, const irep_idt &rhs) const

{

return lhs.compare(rhs) < 0;

}

};

// Using std::map with a lexicographic comparator for deterministic

// iteration order of converters based on identifier text

using converterst =

std::map<irep_idt, std::function<void(const exprt &)>, irep_id_lexicographic_less>;

Copilot · 2026-01-23T13:11:02Z

src/solvers/smt2/smt2_conv.h

+  // Using std::map for deterministic order of bindings
+  std::map<irep_idt, irept> current_bindings;



Switching current_bindings to std::map<irep_idt, ...> does not guarantee lexicographic determinism because irep_idt (dstringt) orders by internal table index (no), not by string content (see src/util/dstring.h:126-128). If you need stable ordering by identifier text, use a comparator based on irep_idt::compare / id2string.

Suggested change

// Using std::map for deterministic order of bindings

std::map<irep_idt, irept> current_bindings;

struct irep_idt_lexicographic_less

{

bool operator()(const irep_idt &lhs, const irep_idt &rhs) const

{

return lhs.compare(rhs) < 0;

}

};

// Using std::map with lexicographic ordering of identifier text

std::map<irep_idt, irept, irep_idt_lexicographic_less> current_bindings;

rod-chapman · 2026-01-23T14:50:48Z

Test results on mlkem-native, running on M1 macOS, with MLKEM_K=4 and -j8

Before (release 6.8.0): 139 proofs in 17m13s real, 49m 12s user
After (this branch): 139 proofs in 18m39s real, 51m users.

rod-chapman · 2026-01-23T14:51:52Z

Test results on mlkem-native, running on r7g.16xlarge/Ubuntu, with MLKEM_K=4 and -j64

Before (release 6.8.0): 139 proofs in 6m20s real, 47m37s user
After (this branch): 139 proofs in 6m53s real, 49m33s user.

rod-chapman · 2026-01-23T18:46:49Z

Test results on mldsa-native, running on r7g.16xlarge/Ubuntu, with MLD_CONFIG_PARAMETER_SET=87 and -j64

Before (release 6.8.0): 173 proofs in 4m22s real, 51m38s user
After (this branch): 173 proofs in 5m49s real, 57m28s

so a little slower. For example, the proofs of sign_verify_internal slows from 143s to 307s.

With stability gained, we can experiment with that to see which ordering of SMT terms is the best.

I also hope that PR#8705 will result in a noticeable improvement in performance.

kroening · 2026-01-23T23:15:29Z

Rod: can you tell what makes that run slower? Is it the time to generate the SMT-LIB instance, or the time to solve it?

rod-chapman · 2026-01-25T09:21:24Z

For polyvec_add() which slowed a little after this change, the time to generate the SMT is 5 seconds. Time to prove it with Z3 is 520s.

Once we have stabilized SMT-generation in CBMC, we've got a rational basis to proceed - we can experiment with Z3 options and tactics, work out what the best ordering of terms is (for Z3 at least), and possible open bug reports on Z3. Onwards!

rod-chapman · 2026-01-25T09:34:16Z

That Z3 time of 520s is with the option rewrite.sort_bv_ac=true.

With default options, proof time is 44s on my laptop. There might be a few other cases where we can adjust Z3 options to get the overall performance back to what it was.

tautschnig requested review from kroening and martin-cs as code owners January 23, 2026 12:57

Copilot AI review requested due to automatic review settings January 23, 2026 12:57

tautschnig requested review from TGWDB and peterschrammel as code owners January 23, 2026 12:57

Copilot started reviewing on behalf of tautschnig January 23, 2026 12:58 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

tautschnig self-assigned this Jan 23, 2026

		/// smt2 formula. Using std::map for deterministic iteration order.
		std::map<irep_idt, bool> set_values;

-/// for all symbols (variables) found. Symbols are added to identifier_map,
-/// which uses std::map to ensure deterministic iteration order.
-///
-/// Determinism: Since identifier_map is ordered by identifier name, symbols
-/// will always be declared in the same order in the SMT2 output, regardless
-/// of the order in which they are discovered during traversal. This ensures
-/// that semantically equivalent C code produces identical SMT2 output even
-/// if source formatting (e.g., empty lines) changes line numbers and hash codes.
+/// for all symbols (variables) it encounters. When a symbol is seen for the
+/// first time during traversal, it is inserted into identifier_map and its
+/// SMT2 declaration is emitted immediately.
+///
+/// Note: identifier_map is a std::map keyed by irep_idt and is used to track
+/// which symbols have already been declared. Its key ordering is defined by
+/// irep_idt and is not guaranteed to correspond to lexicographic ordering of
+/// identifier text, and declaration order in the SMT2 output follows the
+/// expression traversal order rather than iteration over identifier_map.

		// Using std::map for deterministic iteration order of converters
		using converterst = std::map<irep_idt, std::function<void(const exprt &)>>;

-  // Using std::map for deterministic iteration order of converters
-  using converterst = std::map<irep_idt, std::function<void(const exprt &)>>;
+  // Comparator to ensure lexicographic ordering by identifier text
+  struct irep_id_lexicographic_less
+  {
+    bool operator()(const irep_idt &lhs, const irep_idt &rhs) const
+    {
+      return lhs.compare(rhs) < 0;
+    }
+  };
+  // Using std::map with a lexicographic comparator for deterministic
+  // iteration order of converters based on identifier text
+  using converterst =
+    std::map<irep_idt, std::function<void(const exprt &)>, irep_id_lexicographic_less>;

		// Using std::map for deterministic order of bindings
		std::map<irep_idt, irept> current_bindings;

Fix SMT2 output determinism by using ordered maps #8830

Are you sure you want to change the base?

Fix SMT2 output determinism by using ordered maps #8830

Conversation

tautschnig commented Jan 23, 2026

Root Cause:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

rod-chapman commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rod-chapman commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rod-chapman commented Jan 23, 2026

Uh oh!

kroening commented Jan 23, 2026

Uh oh!

rod-chapman commented Jan 25, 2026

Uh oh!

rod-chapman commented Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rod-chapman commented Jan 23, 2026 •

edited

Loading

rod-chapman commented Jan 23, 2026 •

edited

Loading