Skip to content

feat(vector): implement adaptive mode for IVF vector search#23650

Open
iamlinjunhong wants to merge 4 commits intomatrixorigin:mainfrom
iamlinjunhong:m-23580
Open

feat(vector): implement adaptive mode for IVF vector search#23650
iamlinjunhong wants to merge 4 commits intomatrixorigin:mainfrom
iamlinjunhong:m-23580

Conversation

@iamlinjunhong
Copy link
Contributor

@iamlinjunhong iamlinjunhong commented Feb 2, 2026

User description

This commit introduces an intelligent adaptive mode for IVF vector search that automatically selects the optimal execution strategy based on runtime statistics and query characteristics.

Key features:

  • Add 'mode=auto' option for automatic mode selection
  • Implement shouldUseForceMode() for small dataset optimization
  • Add calculateAutoModeOverFetchFactor() for dynamic over-fetching based on selectivity
  • Add calculateAdaptiveNprobe() for dynamic nprobe adjustment
  • Implement adaptive fallback: retry with 'pre' mode when 'post' returns empty results
  • Support exact PK filter in runtime filter for improved accuracy with small result sets
  • Add session variable 'enable_vector_auto_mode_by_default'

Implementation details:

  • Phase 1: mode=auto recognizes and routes to appropriate strategy
  • Phase 2: Small dataset detection triggers force mode (skip index)
  • Phase 3-4: Dynamic parameter adjustment based on filter selectivity
  • Phase 5: Adaptive retry mechanism via ErrVectorNeedRetryWithPreMode

Testing:

  • Add comprehensive unit tests for all new functions
  • Add integration tests for adaptive mode scenarios

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #23580

What this PR does / why we need it:

feat(vector): implement adaptive mode for IVF vector search


PR Type

Enhancement, Tests


Description

Test Plan: Adaptive Mode for IVF Vector Search

1. Overview

The goal of this test plan is to verify the Adaptive Mode for IVF vector search. This feature introduces an intelligent execution strategy that automatically selects between Index-based search (mode=post), Filter-first search (mode=pre), and Brute-force search (mode=force) based on data statistics and runtime results.

2. Prerequisites

  • Session Variables:
    • set experimental_ivf_index = 1; (Required for IVF index)
    • set enable_vector_auto_mode_by_default = 1; (Optional, for testing default behavior)
  • Parameters:
    • Use lists=... during index creation to control cluster density.
    • Use set probe_limit = 1; to force low recall scenarios for testing the retry mechanism.

3. Test Scenarios

Phase 1: Syntax & Integration

Objective: Verify that mode=auto is recognized and behaves like a "smart" search.

  • Test Case 1.1: Run a vector search with WITH OPTION 'mode=auto'.
  • Test Case 1.2: Ensure it works correctly with WHERE filters and LIMIT.
  • Test Case 1.3: Compare results between mode=auto and mode=pre. They should be identical in terms of correctness.

Phase 2: Smart Selection (Small Dataset Optimization)

Objective: Verify that the system skips the index (uses mode=force) when the dataset is too small to benefit from a vector index.

  • Criteria: Usually triggered when Table Rows < LIMIT * 2.
  • Verification:
    • Insert ~10 rows. Create an index.
    • Run a query with mode=auto and LIMIT 10.
    • Hard to test part: Use EXPLAIN to check if a FUNCTION_SCAN(ivf_search) node exists. If it does not exist, the system correctly selected mode=force to avoid index overhead.

Phase 3 & 4: Dynamic Parameter Adjustment

Objective: Verify that the system adjusts internal parameters (nprobe and over-fetch factor) when filters are highly selective.

  • Criteria: When a filter is estimated to return very few rows (high selectivity), mode=auto increases nprobe to improve recall.
  • Verification:
    • Create a dataset where 99% of data has category=A and 1% has category=B.
    • Search with WHERE category=B.
    • Hard to test part: The exact parameter change is internal. Verify by Correctness: Ensure that mode=auto returns the correct nearest neighbor (compare with mode=force), whereas a non-adaptive mode=post might return empty/wrong results due to low recall in selective filters.

Phase 5: Adaptive Fallback (Retry Mechanism)

Objective: Verify that if mode=post returns fewer results than requested, the system automatically retries with mode=pre.

  • Setup:
    • Create a table where the nearest neighbors for a query are all filtered out by a WHERE clause, and they reside in different clusters.
    • Set probe_limit = 1 (search only 1 cluster).
  • Verification:
    • mode=post: Should return empty (because it only checked the nearest cluster and found no matches after filtering).
    • mode=auto: Should return the correct results.
    • Internal Logic: The query fails with a retry error, rewrites the AST to mode=pre, and reruns. The user should see results without manually intervention.

Phase 6: Session Default Behavior

Objective: Verify the new session variable enable_vector_auto_mode_by_default.

  • Test Case 6.1: Set enable_vector_auto_mode_by_default = 1. Run a search without an explicit WITH OPTION clause. It should behave like mode=auto.
  • Test Case 6.2: Explicitly setting mode=pre or mode=post should override the session default.

4. Troubleshooting & Observations

  • How to tell if a Retry happened?
    • Currently, retries are invisible to the user except for a slight latency increase.
    • One can verify by comparing mode=post (fails) vs mode=auto (succeeds) on the same dataset.
  • Why is Phase 3/4 hard to test?
    • These are performance/recall optimizations. In a standard SQL test, we can only verify if the results are "better" (higher recall) than standard post mode, or just correct.
  • Stats dependency: These optimizations rely on ANALYZE TABLE. Ensure statistics are updated if the system doesn't trigger force mode as expected.

5. Summary Table for Testers

Phase Feature Verification Method Expected Result
1 mode=auto Syntax SQL Execution Query returns successfully
2 Force Mode (Small) EXPLAIN No ivf_search in plan
3/4 Dynamic Params Result Correctness High recall even with selective filters
5 Fallback Retry Compare with mode=post auto succeeds where post returns empty
6 Session Variable SQL Execution Default behavior matches auto mode

Diagram Walkthrough

flowchart LR
  Query["Query with mode=auto"]
  Resolve["resolveVectorSearchMode()"]
  SmallData["shouldUseForceMode()"]
  Force["Force Mode<br/>Full Table Scan"]
  Dynamic["calculateAdaptiveNprobe()<br/>calculateAutoModeOverFetchFactor()"]
  Execute["Execute with<br/>optimized params"]
  Output["Output Operator<br/>tracks rowCount"]
  Empty{"rowCount == 0<br/>& IsAdaptive?"}
  Retry["Retry with<br/>mode=pre"]
  Result["Final Results"]
  
  Query --> Resolve
  Resolve --> SmallData
  SmallData -->|Small Dataset| Force
  SmallData -->|Large Dataset| Dynamic
  Force --> Execute
  Dynamic --> Execute
  Execute --> Output
  Output --> Empty
  Empty -->|Yes| Retry
  Empty -->|No| Result
  Retry --> Result
Loading

File Walkthrough

Relevant files
Tests
6 files
apply_indices_ivfflat_test.go
Add unit tests for adaptive vector search functions           

pkg/sql/plan/apply_indices_ivfflat_test.go

  • Add comprehensive unit tests for calculateAdaptiveNprobe() function
    with various selectivity scenarios
  • Add integration tests for prepareIvfIndexContext() with adaptive
    nprobe logic in auto mode
  • Add unit tests for shouldUseForceMode() to validate small dataset
    detection
  • Add unit tests for resolveVectorSearchMode() covering all mode
    resolution scenarios
+487/-0 
compile2_test.go
Add tests for auto mode rewrite and adaptive detection     

pkg/sql/compile/compile2_test.go

  • Add tests for rewriteAutoModeToPre() function covering Select,
    ExplainStmt, Insert, Replace statements
  • Add tests for forceModePre() function to validate forced pre-mode
    setting
  • Add tests for isAdaptiveVectorSearch() to detect auto mode in query
    plans
  • Test case-insensitive mode matching and nested subquery handling
+440/-0 
apply_indices_test.go
Add tests for auto mode over-fetch factor calculation       

pkg/sql/plan/apply_indices_test.go

  • Add tests for calculateAutoModeOverFetchFactor() with various
    selectivity and limit combinations
  • Test selectivity-based compensation factor calculation and
    MaxOverFetchFactor capping
  • Verify auto mode factor is always >= base factor
  • Add benchmark for over-fetch factor calculation
+231/-0 
ivf_search_test.go
Update tests for runtime filter refactoring                           

pkg/sql/colexec/table_function/ivf_search_test.go

  • Rename test function from TestWaitBloomFilterForTableFunction() to
    TestWaitRuntimeFilterForTableFunction()
  • Update test cases to work with new ivfRuntimeFilter struct
  • Add test case for IN runtime filter (exact PK list) handling
  • Verify both bloomFilter and exactPkFilter fields in returned struct
+32/-22 
vector_ivf_retry.result
Add integration tests for adaptive vector search                 

test/distributed/cases/vector/vector_ivf_retry.result

  • Add comprehensive integration test cases for adaptive vector search
    retry mechanism
  • Test Phase 1: basic auto mode with filters
  • Test Phase 2: small dataset detection triggering force mode
  • Test Phase 3-4: dynamic nprobe adjustment with selectivity
  • Test Phase 5: adaptive retry fallback from post to pre mode
  • Test Phase 6: implicit auto mode via session variable
+153/-0 
vector_ivf_retry.sql
Add comprehensive adaptive mode IVF vector search tests   

test/distributed/cases/vector/vector_ivf_retry.sql

  • Comprehensive test suite for adaptive mode IVF vector search with 6
    phases covering mode=auto syntax, smart mode selection, dynamic
    parameter adjustment, and adaptive fallback mechanisms
  • Phase 1 tests verify mode=auto syntax acceptance and basic execution
    with filters
  • Phase 2 tests validate automatic force mode selection for small
    datasets with rare filter values
  • Phase 3-4 tests cover selectivity-based over-fetch and nprobe
    adjustment for high/low selectivity scenarios
  • Phase 5 tests validate adaptive fallback retry mechanism when post
    mode returns insufficient results, automatically retrying with pre
    mode
  • Phase 6 tests verify session variable
    enable_vector_auto_mode_by_default for enabling auto mode globally
  • Edge case tests cover no matching rows, limit exceeding available
    rows, and queries without filter conditions
+255/-0 
Enhancement
14 files
apply_indices_ivfflat.go
Implement adaptive mode for IVF vector search                       

pkg/sql/plan/apply_indices_ivfflat.go

  • Add shouldUseForceMode() to detect small datasets and trigger full
    table scan optimization
  • Add resolveVectorSearchMode() to determine optimal execution strategy
    (pre/post/force/auto)
  • Add calculateAdaptiveNprobe() to dynamically adjust nprobe based on
    filter selectivity
  • Implement Phase 1-4 of adaptive mode: mode selection, force mode
    detection, nprobe amplification
  • Add isAutoMode and initialStrategy fields to ivfIndexContext struct
+245/-13
ivf_search.go
Support exact PK filter in runtime filter for IVF search 

pkg/sql/colexec/table_function/ivf_search.go

  • Rename waitBloomFilterForTableFunction() to
    waitRuntimeFilterForTableFunction() and extend to handle both
    BloomFilter and exact PK filters
  • Add ivfRuntimeFilter struct to encapsulate both filter types
  • Implement buildExactPkFilter() to convert IN runtime filter vectors to
    SQL literals
  • Add comprehensive type conversion helpers (appendVectorSQLLiteral(),
    appendHex(), etc.) for all supported data types
  • Update ivfSearchState to store both bloomFilter and exactPkFilter
+228/-13
compile2.go
Implement adaptive retry mechanism with AST rewriting       

pkg/sql/compile/compile2.go

  • Add retry logic for ErrVectorNeedRetryWithPreMode error in Run()
    method
  • Implement rewriteAutoModeToPre() to recursively rewrite 'mode=auto' to
    'mode=pre' in AST
  • Implement forceModePre() to force pre-mode when implicit auto mode is
    enabled
  • Add helper functions to traverse and rewrite nested SELECT statements
    and table expressions
  • Update prepareRetry() to sync plan after retry and update analyzer
    query reference
  • Update handleQueryPlanAnalyze() to use final (retry) scopes for
    explain analyze output
+217/-3 
output.go
Implement adaptive fallback mechanism in output operator 

pkg/sql/colexec/output/output.go

  • Add rowCount tracking in container to count output rows
  • Implement adaptive vector search fallback: return
    ErrVectorNeedRetryWithPreMode when IsAdaptive=true and rowCount==0
  • Add retry trigger at multiple exit points (nil batch, block step end,
    execution stop)
  • Only trigger retry on zero results to avoid merging partial results
    with retry results
+25/-0   
search.go
Add exact PK filter path to IVF search                                     

pkg/vectorindex/ivfflat/search.go

  • Add support for exact PK filter path: when sqlproc.ExactPkFilter is
    set, skip centroid finding and use direct IN clause
  • Refactor SQL generation to handle both centroid-based and exact PK
    filter paths
  • Exact PK filter path omits ORDER BY and LIMIT for flexibility in
    result merging
+47/-27 
compile.go
Add adaptive vector search detection and plan retrieval   

pkg/sql/compile/compile.go

  • Add GetPlan() method to retrieve current plan after retry
  • Update canRetry() to recognize ErrVectorNeedRetryWithPreMode as
    retryable error
  • Add isAdaptiveVectorSearch() method to detect auto mode in query plans
  • Update compileSteps() to set IsAdaptive flag on Output operator when
    adaptive vector search is detected
+36/-2   
apply_indices.go
Add auto mode over-fetch factor calculation                           

pkg/sql/plan/apply_indices.go

  • Add MaxOverFetchFactor constant (100.0) to cap over-fetch multiplier
  • Implement calculateAutoModeOverFetchFactor() using selectivity-based
    compensation: max(baseFactor, 1/selectivity)
  • Apply capping at MaxOverFetchFactor to prevent excessive memory usage
+37/-0   
build.go
Add exact PK filter threshold for hash build                         

pkg/sql/colexec/hashbuild/build.go

  • Add exactPkFilterThreshold constant (100) to switch from BloomFilter
    to exact IN list for small PK sets
  • Implement logic to send exact IN runtime filter when row count <=
    threshold
  • Preserve BloomFilter path for larger sets to maintain performance
+31/-3   
types.go
Add adaptive vector search fields to output operator         

pkg/sql/colexec/output/types.go

  • Add rowCount field to container struct to track output row count
  • Add IsAdaptive field to Output struct to enable adaptive vector search
    fallback
  • Add WithAdaptive() method to set adaptive mode flag
+24/-0   
deepcopy.go
Add RankOption deep copy support                                                 

pkg/sql/plan/deepcopy.go

  • Add DeepCopyRankOption() function to deep copy RankOption struct
  • Update DeepCopyNode() to include RankOption in node copying
+10/-0   
query_builder.go
Support auto mode in rank option parsing                                 

pkg/sql/plan/query_builder.go

  • Update parseRankOption() to accept "auto" mode in addition to "pre",
    "post", and "force"
  • Update validation error message to include "auto" as valid mode option
+3/-2     
mysql_cmd_executor.go
Sync plan after execution for retry support                           

pkg/frontend/mysql_cmd_executor.go

  • Sync latest plan after Run() execution to capture plan changes from
    retry
  • Update TxnComputationWrapper.plan to point to final plan after
    potential retries
+5/-0     
sqlexec.go
Add exact PK filter field to SQL process                                 

pkg/vectorindex/sqlexec/sqlexec.go

  • Add ExactPkFilter field to SqlProcess struct to carry exact PK filter
    list
  • Field contains comma-separated SQL literals for direct IN clause usage
+3/-0     
computation_wrapper.go
Sync plan after compilation for retry support                       

pkg/frontend/computation_wrapper.go

  • Sync latest plan after Run() execution in TxnComputationWrapper.Run()
  • Ensures plan reflects any changes from retry mechanism
+2/-0     
Formatting
1 files
plan.pb.go
Update protobuf generated code comments                                   

pkg/pb/plan/plan.pb.go

  • Add comment markers for protobuf oneof types in Expr,
    TableDef_DefType, TransationControl, Plan, DataControl, and
    DataDefinition messages
  • These are auto-generated code comments for clarity
+7/-0     
Error handling
2 files
error.go
Add vector search retry error code                                             

pkg/common/moerr/error.go

  • Add new error code ErrVectorNeedRetryWithPreMode (22301) in Group 15:
    Vector Search
  • Add error message mapping for the new error code
+6/-0     
error_no_ctx.go
Add no-context error constructor for vector retry               

pkg/common/moerr/error_no_ctx.go

  • Add NewVectorNeedRetryWithPreModeNoCtx() function to create error
    without context
+4/-0     
Configuration changes
1 files
variables.go
Add session variable for auto mode default                             

pkg/frontend/variables.go

  • Add new session variable enable_vector_auto_mode_by_default with
    default value 0
  • Variable controls implicit auto mode activation when not explicitly
    specified in query
+8/-0     
Documentation
1 files
plan.proto
Update protobuf documentation for RankOption                         

proto/plan.proto

  • Update RankOption.mode field comment to include "force" and "auto"
    modes
+1/-1     

@qodo-code-review
Copy link

qodo-code-review bot commented Feb 2, 2026

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
SQL injection

Description: Dynamic SQL is constructed via fmt.Sprintf using runtime-provided values (notably
sqlproc.ExactPkFilter and types.ArrayToString(query)) without parameterization, which can
enable SQL injection/query manipulation if these values ever become attacker-controlled or
if escaping is incomplete for the target SQL dialect.
search.go [145-190]

Referred Code
if sqlproc != nil && sqlproc.ExactPkFilter != "" {
	sql = fmt.Sprintf(
		"SELECT `%s`, %s(`%s`, '%s') as vec_dist FROM `%s`.`%s` WHERE `%s` = %d AND `%s` IN (%s)",
		catalog.SystemSI_IVFFLAT_TblCol_Entries_pk,
		metric.MetricTypeToDistFuncName[metric.MetricType(idxcfg.Ivfflat.Metric)],
		catalog.SystemSI_IVFFLAT_TblCol_Entries_entry,
		types.ArrayToString(query),
		tblcfg.DbName, tblcfg.EntriesTable,
		catalog.SystemSI_IVFFLAT_TblCol_Entries_version,
		idx.Version,
		catalog.SystemSI_IVFFLAT_TblCol_Entries_pk,
		sqlproc.ExactPkFilter,
	)
} else {
	distfn, err = metric.ResolveDistanceFn[T](metric.MetricType(idxcfg.Ivfflat.Metric))
	if err != nil {
		return
	}

	centroidsIDs, err = idx.findCentroids(sqlproc, query, distfn, idxcfg, rt.Probe, nthread)
	if err != nil {


 ... (clipped 25 lines)
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
SQL injection risk: The new exact-PK-path builds SQL via fmt.Sprintf(... IN (%s) ...) using
sqlproc.ExactPkFilter (a string of SQL literals) rather than parameterization, which can
enable SQL injection if any literal escaping is incomplete or if untrusted values can
reach the runtime filter.

Referred Code
if sqlproc != nil && sqlproc.ExactPkFilter != "" {
	sql = fmt.Sprintf(
		"SELECT `%s`, %s(`%s`, '%s') as vec_dist FROM `%s`.`%s` WHERE `%s` = %d AND `%s` IN (%s)",
		catalog.SystemSI_IVFFLAT_TblCol_Entries_pk,
		metric.MetricTypeToDistFuncName[metric.MetricType(idxcfg.Ivfflat.Metric)],
		catalog.SystemSI_IVFFLAT_TblCol_Entries_entry,
		types.ArrayToString(query),
		tblcfg.DbName, tblcfg.EntriesTable,
		catalog.SystemSI_IVFFLAT_TblCol_Entries_version,
		idx.Version,
		catalog.SystemSI_IVFFLAT_TblCol_Entries_pk,
		sqlproc.ExactPkFilter,
	)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Missing error context: New error returns (e.g., vec.UnmarshalBinary(m.Data) and literal-building failures)
propagate raw errors without adding actionable context like the runtime filter tag/type or
PK column/type, which can make production debugging harder.

Referred Code
	switch m.Typ {
	case message.RuntimeFilter_BLOOMFILTER:
		// runtime bloomfilter uses common/bloomfilter encoding; pass through bytes directly here
		return &ivfRuntimeFilter{bloomFilter: m.Data}, nil
	case message.RuntimeFilter_IN:
		vec := vector.NewVec(types.T_any.ToType())
		if err := vec.UnmarshalBinary(m.Data); err != nil {
			return nil, err
		}
		defer vec.Free(proc.Mp())

		exactPkFilter, err := buildExactPkFilter(proc.Ctx, vec)
		if err != nil {
			return nil, err
		}
		if exactPkFilter == "" {
			return nil, nil
		}
		return &ivfRuntimeFilter{exactPkFilter: exactPkFilter}, nil
	}
}


 ... (clipped 154 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Potentially user-facing detail: The new moerr.NewInternalErrorf(ctx, "ivf_search: unsupported pk type %d",
typ.Oid) may be returned to clients and exposes internal type identifiers unless higher
layers consistently sanitize internal errors.

Referred Code
default:
	return nil, moerr.NewInternalErrorf(ctx, "ivf_search: unsupported pk type %d", typ.Oid)
}

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

qodo-code-review bot commented Feb 2, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Security
Fix unsafe BIT type serialization

Use hex encoding for BIT type serialization to prevent potential SQL injection
vulnerabilities and syntax errors.

pkg/sql/colexec/table_function/ivf_search.go [223-231]

 	case types.T_bit:
 		value := vector.GetFixedAtWithTypeCheck[uint64](vec, row)
 		bitLength := typ.Width
 		byteLength := (bitLength + 7) / 8
 		b := types.EncodeUint64(&value)[:byteLength]
 		slices.Reverse(b)
-		buf = append(buf, '\'')
-		buf = append(buf, b...)
-		buf = append(buf, '\'')
+		buf = appendHex(buf, b)
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a potential SQL injection vulnerability by improperly escaping BIT type values, and the proposed fix using appendHex is the correct way to handle binary data in SQL literals.

High
High-level
Consider a less complex retry alternative

The current adaptive fallback mechanism uses a complex full query retry
involving error handling and AST rewriting. Consider a simpler alternative where
the vector search operator internally handles the retry logic, such as fetching
more candidates, to avoid the overhead of a complete query restart.

Examples:

pkg/sql/compile/compile2.go [294-312]
		forcePreMode := moerr.IsMoErrCode(err, moerr.ErrVectorNeedRetryWithPreMode)
		if forcePreMode {
			updated := rewriteAutoModeToPre(c.stmt)
			if !updated {
				// If no explicit 'auto' was rewritten, but we got a retry request,
				// it means it was implicit auto mode (from session variable).
				// We force the AST to 'pre' mode to rebuild the plan correctly.
				forceModePre(c.stmt)
			}
			// Force rebuild of physical plan for explain analyze after rewrite.

 ... (clipped 9 lines)
pkg/sql/colexec/output/output.go [64-75]
			// Adaptive vector search fallback: trigger retry with pre-filter mode
			// when post-filter mode returns empty results.
			//
			// Condition: IsAdaptive=true AND rowCount=0 (no results at all)
			//
			// We only trigger on rowCount == 0 (not rowCount < limit) because:
			// 1. Partial results cannot be merged with retry results (would cause duplicates)
			// 2. Empty results strongly indicate that post-filter mode failed to find
			//    any matching rows, likely due to high filter selectivity
			if output.IsAdaptive && output.ctr.rowCount == 0 {

 ... (clipped 2 lines)

Solution Walkthrough:

Before:

// In pkg/sql/colexec/output/output.go
func (output *Output) Call(proc *process.Process) (vm.CallResult, error) {
  // ...
  if result.Batch == nil {
    // If post-filter mode returns no results at all...
    if output.IsAdaptive && output.ctr.rowCount == 0 {
      // ...return a special error to trigger a full query retry.
      return result, moerr.NewVectorNeedRetryWithPreModeNoCtx()
    }
    // ...
  }
  // ...
}

// In pkg/sql/compile/compile2.go
func (c *Compile) Run(...) (*util.RunResult, error) {
  // ...
  err = c.run()
  if moerr.IsMoErrCode(err, moerr.ErrVectorNeedRetryWithPreMode) {
    rewriteAutoModeToPre(c.stmt) // Rewrite AST
    runC, err = c.prepareRetry(true) // Re-compile
    // ... re-execute runC
  }
  // ...
}

After:

// The suggestion implies moving the logic inside the vector search operator.
// This would avoid the error-based retry loop in the compiler.

// In pkg/sql/colexec/table_function/ivf_search.go (conceptual)
func runIvfSearchVector(...) {
  // 1. Initial 'post' mode search
  keys, distances, err = veccache.Cache.Search(sqlProc, ...)

  // Post-filter the results internally
  filteredKeys, filteredDistances = applyFilters(keys, distances)

  // 2. If results are empty, trigger internal fallback
  if len(filteredKeys) == 0 {
    // Switch to 'pre' mode logic internally
    // This would involve generating a new SQL with filters pushed down
    // and re-executing the search without a full query re-compile.
    keys, distances, err = veccache.Cache.SearchWithPreFilter(...)
  }

  // Return final results
}
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that the full query retry mechanism is a major source of complexity, and it accurately describes how it works by catching an error and recompiling. This is a valid, high-impact architectural critique.

Medium
Possible issue
Prevent panic from nil stats

Add a nil check for vecCtx.scanNode.Stats before accessing its Selectivity field
in a logging statement to prevent a potential panic.

pkg/sql/plan/apply_indices_ivfflat.go [282-295]

 	// Phase 4: Dynamic nprobe amplification for auto mode
 	// Only applied if mode is "post" (pushdown disabled) and totalLists is available
 	if isAutoMode && mode == "post" && totalLists > 0 {
 		oldNProbe := nProbe
 		nProbe = builder.calculateAdaptiveNprobe(
 			nProbe,
 			vecCtx.scanNode.Stats,
 			totalLists,
 		)
 		if nProbe != oldNProbe {
+			selectivity := -1.0
+			if vecCtx.scanNode.Stats != nil {
+				selectivity = vecCtx.scanNode.Stats.Selectivity
+			}
 			logutil.Infof("Auto mode: adjusted nprobe from %d to %d (selectivity: %.4f)",
-				oldNProbe, nProbe, vecCtx.scanNode.Stats.Selectivity)
+				oldNProbe, nProbe, selectivity)
 		}
 	}
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a nil pointer dereference that would cause a panic when logging, which is a critical bug. The proposed fix correctly guards against this scenario.

Medium
Fix stale plan usage during retry

Move the plan assignment runC.pn = c.pn to after the if defChanged block in
prepareRetry to ensure the new compile object uses the potentially rebuilt plan.

pkg/sql/compile/compile2.go [528-590]

 func (c *Compile) prepareRetry(defChanged bool) (*Compile, error) {
 	v2.TxnStatementRetryCounter.Inc()
 	// XXX we need to get a new transaction context from the transaction.
 	// and we need to be very careful about the context.
 	c.proc.SetTxnOperator(c.proc.GetTxnOperator().GetLatestTxnOperator())
 	runC := newCompile(c.addr, c.db, c.sql, c.uid, c.tenant, c.proc.GetTxnOperator(), c.proc, c.stmt, c.isInternal, c.cnList)
 	runC.anal = c.anal
 	runC.proc.SetAnalyze(c.anal)
 	runC.affectRows.Store(c.affectRows.Load())
 	runC.isFirst = false
 	runC.retryTimes = c.retryTimes
 	runC.disableRetry = c.disableRetry
 	runC.isPrepare = c.isPrepare
-	runC.pn = c.pn
+
 	topContext := c.proc.GetTopContext()
 	if defChanged {
 		pn, e := plan.BuildPlan(topContext, c.stmt, c.isInternal)
 		if e != nil {
 			return nil, e
 		}
 		c.pn = pn
 		// Update c.anal.qry to point to the new plan's Query
 		// This ensures fillPlanNodeAnalyzeInfo uses the correct nodes
 		if qry, ok := pn.Plan.(*plan.Plan_Query); ok && c.anal != nil {
 			c.anal.qry = qry.Query
 		}
 	}
+	runC.pn = c.pn
 	if e = runC.Compile(topContext, c.pn, c.fill); e != nil {
 		return nil, e
 	}
 	return runC, nil
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a logical error where a retry operation would use a stale execution plan, leading to incorrect behavior. This is a significant correctness issue.

Medium
Add missing by rank clause

Add the by rank clause to the Phase 6 test queries to ensure they correctly test
the vector search adaptive mode and do not fall back to a non-vector-optimized
plan.

test/distributed/cases/vector/vector_ivf_retry.sql [244-252]

 -- Test 6.1: Default auto mode (should trigger fallback/retry internally)
 -- Even though no mode is specified in SQL, it SHOULD return 999 because auto mode is default
-select id from t_phase6 where filter_col = 1 order by l2_distance(vec, '[0,0,0]') limit 1;
+select id from t_phase6 where filter_col = 1 order by l2_distance(vec, '[0,0,0]') limit 1 by rank;
 
 -- Test 6.2: Override session default with explicit mode
 -- Explicit 'post' should return empty despite session default being 'auto'
 select id from t_phase6 where filter_col = 1 order by l2_distance(vec, '[0,0,0]') limit 1 by rank with option 'mode=post';
 
 set enable_vector_auto_mode_by_default = 0;
 -- Test 6.3: Back to default (post)
-select id from t_phase6 where filter_col = 1 order by l2_distance(vec, '[0,0,0]') limit 1;
+select id from t_phase6 where filter_col = 1 order by l2_distance(vec, '[0,0,0]') limit 1 by rank;

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly points out that the by rank clause is likely necessary to trigger the intended vector search execution path, making the test more accurate and valid.

Medium
Run table analyze

Uncomment the analyze table t_phase2_small; statement to provide the optimizer
with up-to-date statistics, ensuring the correct IVF search mode is selected in
Phase 2 tests.

test/distributed/cases/vector/vector_ivf_retry.sql [60]

--- analyze table t_phase2_small;
+analyze table t_phase2_small;
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that running ANALYZE is crucial for the optimizer to have the statistics needed to test the smart mode selection feature, making the test more robust and reliable.

Medium
General
Handle context cancellation in filter loop

Add a check for context cancellation within the
waitRuntimeFilterForTableFunction loop to prevent it from blocking indefinitely
if no filter arrives.

pkg/sql/colexec/table_function/ivf_search.go [140-188]

 func waitRuntimeFilterForTableFunction(tf *TableFunction, proc *process.Process) (*ivfRuntimeFilter, error) {
 	if len(tf.RuntimeFilterSpecs) == 0 {
 		return nil, nil
 	}
 
+	mb := proc.GetMessageBoard()
 	for {
-		m := messageBoard.Receive(tf.RuntimeFilterSpecs[0].Tag)
+		select {
+		case <-proc.Ctx.Done():
+			return nil, proc.Ctx.Err()
+		default:
+		}
+		m := mb.Receive(tf.RuntimeFilterSpecs[0].Tag)
 		if m == nil {
 			continue
 		}
 		switch m.Typ {
 		case message.RuntimeFilter_BLOOMFILTER:
 			return &ivfRuntimeFilter{bloomFilter: m.Data}, nil
 		case message.RuntimeFilter_IN:
 			// ... build exactPkFilter ...
 			return &ivfRuntimeFilter{exactPkFilter: exactPkFilter}, nil
 		}
 	}
-
-	return nil, nil
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that the loop can block indefinitely. Adding context cancellation handling is a good practice for robustness and preventing resource leaks.

Medium
Guard totalLists before bounding

In calculateAdaptiveNprobe, add a check to ensure totalLists is positive before
using it to cap the adaptiveNprobe value.

pkg/sql/plan/apply_indices_ivfflat.go [180-186]

 func (builder *QueryBuilder) calculateAdaptiveNprobe(baseNprobe int64, stats *plan.Stats, totalLists int64) int64 {
 	// ...
 	adaptiveNprobe := int64(math.Ceil(float64(baseNprobe) * compensation))
 	adaptiveNprobe = max(adaptiveNprobe, baseNprobe)
-	adaptiveNprobe = min(adaptiveNprobe, totalLists)
+	if totalLists > 0 {
+		adaptiveNprobe = min(adaptiveNprobe, totalLists)
+	}
 
 	return adaptiveNprobe
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that totalLists can be negative, which would lead to incorrect behavior. The proposed guard ensures the logic is robust by only applying the upper bound when it's valid.

Medium
Ensure deterministic test query results

Add a secondary sort criterion, such as the primary key id, to the ORDER BY
clause in test queries to ensure deterministic results when vectors have equal
distances.

test/distributed/cases/vector/vector_ivf_retry.sql [28-38]

 -- Test 1.1: mode=auto syntax is accepted
 -- Expectation: Returns closest vector to [0,0,0]
-select id from t_phase1 order by l2_distance(vec, '[0,0,0]') limit 1 by rank with option 'mode=auto';
+select id from t_phase1 order by l2_distance(vec, '[0,0,0]'), id limit 1 by rank with option 'mode=auto';
 
 -- Test 1.2: mode=auto with filter
 -- Expectation: Returns id 1 or 2 (category=1, closest to [0,0,0])
-select id from t_phase1 where category = 1 order by l2_distance(vec, '[0,0,0]') limit 1 by rank with option 'mode=auto';
+select id from t_phase1 where category = 1 order by l2_distance(vec, '[0,0,0]'), id limit 1 by rank with option 'mode=auto';
 
 -- Test 1.3: Compare auto with explicit modes - results should be equivalent
 -- mode=pre (guaranteed correct)
-select id from t_phase1 where category = 1 order by l2_distance(vec, '[0,0,0]') limit 2 by rank with option 'mode=pre';
+select id from t_phase1 where category = 1 order by l2_distance(vec, '[0,0,0]'), id limit 2 by rank with option 'mode=pre';
 -- mode=auto should return same results (may use different path internally)
-select id from t_phase1 where category = 1 order by l2_distance(vec, '[0,0,0]') limit 2 by rank with option 'mode=auto';
+select id from t_phase1 where category = 1 order by l2_distance(vec, '[0,0,0]'), id limit 2 by rank with option 'mode=auto';

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies a source of potential test flakiness due to non-deterministic ordering and proposes a valid fix, which improves test reliability.

Low
Reset session variables after tests

Reset session variables like probe_limit to their default values after the Phase
6 tests to ensure test isolation and prevent side effects on other tests.

test/distributed/cases/vector/vector_ivf_retry.sql [226-252]

 -- =============================================================================
 -- Phase 6: Session Variable enable_vector_auto_mode_by_default
 -- =============================================================================
 
 drop table if exists t_phase6;
 create table t_phase6(id int primary key, vec vecf32(3), filter_col int);
 insert into t_phase6 values (1, '[1,0,0]', 0);
 insert into t_phase6 values (2, '[0,1,0]', 0);
 insert into t_phase6 values (3, '[0,0,1]', 0);
 insert into t_phase6 values (999, '[10,10,10]', 1);
 create index idx_phase6 using ivfflat on t_phase6(vec) lists=2 op_type 'vector_l2_ops';
 
 set experimental_ivf_index = 1;
 set probe_limit = 1;
 set enable_vector_auto_mode_by_default = 1;
 
 -- Test 6.1: Default auto mode (should trigger fallback/retry internally)
 -- Even though no mode is specified in SQL, it SHOULD return 999 because auto mode is default
 select id from t_phase6 where filter_col = 1 order by l2_distance(vec, '[0,0,0]') limit 1;
 
 -- Test 6.2: Override session default with explicit mode
 -- Explicit 'post' should return empty despite session default being 'auto'
 select id from t_phase6 where filter_col = 1 order by l2_distance(vec, '[0,0,0]') limit 1 by rank with option 'mode=post';
 
 set enable_vector_auto_mode_by_default = 0;
 -- Test 6.3: Back to default (post)
 select id from t_phase6 where filter_col = 1 order by l2_distance(vec, '[0,0,0]') limit 1;
 
+-- Reset session variables to avoid affecting other tests
+set probe_limit = 10;
+
  • Apply / Chat
Suggestion importance[1-10]: 4

__

Why: The suggestion proposes a good practice for test isolation by resetting session variables, which improves maintainability, although its immediate impact is low as this is the last test in the file.

Low
  • Update

This commit introduces an intelligent adaptive mode for IVF vector search
that automatically selects the optimal execution strategy based on runtime
statistics and query characteristics.

Key features:
- Add 'mode=auto' option for automatic mode selection
- Implement shouldUseForceMode() for small dataset optimization
- Add calculateAutoModeOverFetchFactor() for dynamic over-fetching based on selectivity
- Add calculateAdaptiveNprobe() for dynamic nprobe adjustment
- Implement adaptive fallback: retry with 'pre' mode when 'post' returns empty results
- Support exact PK filter in runtime filter for improved accuracy with small result sets
- Add session variable 'enable_vector_auto_mode_by_default'

Implementation details:
- Phase 1: mode=auto recognizes and routes to appropriate strategy
- Phase 2: Small dataset detection triggers force mode (skip index)
- Phase 3-4: Dynamic parameter adjustment based on filter selectivity
- Phase 5: Adaptive retry mechanism via ErrVectorNeedRetryWithPreMode

Testing:
- Add comprehensive unit tests for all new functions
- Add integration tests for adaptive mode scenarios
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Review effort 4/5 size/XXL Denotes a PR that changes 2000+ lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants