Skip to content

[QDP] File and streaming Quantum Data Loader#1010

Merged
guan404ming merged 2 commits intoapache:mainfrom
rich7420:file-streaming-api
Feb 7, 2026
Merged

[QDP] File and streaming Quantum Data Loader#1010
guan404ming merged 2 commits intoapache:mainfrom
rich7420:file-streaming-api

Conversation

@rich7420
Copy link
Contributor

@rich7420 rich7420 commented Feb 3, 2026

Purpose of PR

Adds file-backed and streaming Parquet data sources to the Quantum Data Loader. Users can call .source_file(path) or .source_file(path, streaming=True) and iterate in batches. PyO3 loader bindings are restored so QuantumDataLoader works with synthetic, file, and streaming sources.

  • qdp-core: DataSource::InMemory and DataSource::Streaming; new_from_file (full read, supports .parquet/.arrow/.npy/.pt/.pb etc.) and new_from_file_streaming (Parquet, chunked read). Shared path_extension_lower, take_batch_from_source, named constants, first-chunk buffer reuse. Basis encoding fix: state vector allocated as Float64 to match kernel, then converted to engine precision.
  • qdp-python (Rust): PyQuantumLoader, create_synthetic_loader, create_file_loader, create_streaming_file_loader (Linux only). batch_limit=Noneusize::MAX; path from str or Path; file/streaming build in py.detach().
  • loader.py: source_file(path, streaming=False), _create_iterator(). Streaming requires .parquet.
  • Tests: New loader tests (mutual exclusion, batch count, extension, streaming). DLPack and bindings tests updated for current error messages.

Related Issues or PRs

Related to #969

Changes Made

  • Bug fix
  • New feature
  • Refactoring
  • Documentation
  • Test
  • CI/CD pipeline
  • Other

Breaking Changes

  • Yes
  • No

Checklist

  • Added or updated unit tests for all changes
  • Added or updated documentation for all changes
  • Successfully built and ran all unit tests or manual tests locally
  • PR title follows "MAHOUT-XXX: Brief Description" format (if related to an issue)
  • Code follows ASF guidelines

@400Ping
Copy link
Member

400Ping commented Feb 4, 2026

Please resolve conflicts

Copy link
Member

@guan404ming guan404ming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks nice, left one question

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both _source_type string and three boolean flags track the same information. Maybe we could consider to keep only booleans since this would eliminate ~10 lines of validation logic in _create_iterator().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch!

@guan404ming
Copy link
Member

Overall looks nice! Please help resolve conflict and I think we are good to merge~

@400Ping
Copy link
Member

400Ping commented Feb 6, 2026

@rich7420 Please solve conflicts

@400Ping
Copy link
Member

400Ping commented Feb 6, 2026

Overall LGTM

@guan404ming guan404ming merged commit 7ee09af into apache:main Feb 7, 2026
6 checks passed
@guan404ming
Copy link
Member

Thanks for the update!

@rich7420
Copy link
Contributor Author

rich7420 commented Feb 8, 2026

thanks for the review!
@guan404ming , @400Ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants