-
Notifications
You must be signed in to change notification settings - Fork 12
feat: Add Opt-in Formula Reading Support #137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
A human addendum to the AI's diff summary ^: This lib is much faster than openpyxl but remained unusable for my use-case due to:
This PR should be a big step forward for broader usability and more use-cases. The next thing I'd hit would be a unified Cheers! |
|
@dimastbk bumping this |
| m.add_function(wrap_pyfunction!(load_workbook, m)?)?; | ||
| m.add_class::<CalamineWorkbook>()?; | ||
| m.add_class::<CalamineSheet>()?; | ||
| m.add_class::<CalamineFormulaIterator>()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add CalamineCellIterator here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove, please
|
This would be awesome! |
|
This is something we urgently need, let me know if we can help! |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@python/python_calamine/_python_calamine.pyi`:
- Around line 133-137: The type stub for class CalamineWorkbook is missing the
read_formulas property; add a read-only boolean attribute declaration to the
CalamineWorkbook class (e.g., add "read_formulas: bool") so the stub matches the
Rust #[pyo3(get)]-exposed property and satisfies the test that checks
test_formulas_read_flag_exposed.
🧹 Nitpick comments (1)
src/types/sheet.rs (1)
330-356: Consider explicit dimension checks for edge cases.The width/height calculation uses tuple comparison (
data_start <= data_end), which performs lexicographic comparison. While this should work correctly for typical calamine ranges where both row and column of start are ≤ end, consider making the intent explicit:Suggested change for clarity
fn from_range_with_data_bounds( range: Arc<Range<String>>, data_start: (u32, u32), data_end: (u32, u32), ) -> CalamineFormulaIterator { - let width = if data_start <= data_end { + let width = if data_start.1 <= data_end.1 { (data_end.1 - data_start.1 + 1) as usize } else { 0 }; - let height = if data_start <= data_end { + let height = if data_start.0 <= data_end.0 { (data_end.0 - data_start.0 + 1) as usize } else { 0 };
| @typing.final | ||
| class CalamineWorkbook(contextlib.AbstractContextManager): | ||
| path: str | None | ||
| sheet_names: list[str] | ||
| sheets_metadata: list[SheetMetadata] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing read_formulas property in type stub.
The CalamineWorkbook class exposes a read_formulas property (via #[pyo3(get)] in Rust), which is tested in test_formulas_read_flag_exposed, but it's not declared in the type stub.
Proposed fix
`@typing.final`
class CalamineWorkbook(contextlib.AbstractContextManager):
path: str | None
sheet_names: list[str]
sheets_metadata: list[SheetMetadata]
+ read_formulas: bool
`@classmethod`
def from_object(🤖 Prompt for AI Agents
In `@python/python_calamine/_python_calamine.pyi` around lines 133 - 137, The type
stub for class CalamineWorkbook is missing the read_formulas property; add a
read-only boolean attribute declaration to the CalamineWorkbook class (e.g., add
"read_formulas: bool") so the stub matches the Rust #[pyo3(get)]-exposed
property and satisfies the test that checks test_formulas_read_flag_exposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/types/sheet.rs`:
- Around line 230-245: iter_formulas currently uses unwrap_or((0,0)) for range
bounds which causes a 1×1 empty row for empty ranges and drops leading empty
rows when start.0 > 0; fix by computing bounds like iter_rows: let data_start =
self.range.start().unwrap_or((0,0)); if self.range.is_empty() set data_end to a
value that yields zero height (e.g. make data_end row/col < data_start so
height/width compute to 0), otherwise use self.range.end().unwrap(); keep
data_start unchanged so leading empty rows (when start.0 > 0) are preserved,
then call CalamineFormulaIterator::from_range_with_data_bounds with those
adjusted data_start/data_end values. Ensure you use self.range.is_empty(),
self.range.start(), self.range.end(), iter_formulas, and
CalamineFormulaIterator::from_range_with_data_bounds to locate and implement the
change.
| fn iter_formulas(&self) -> PyResult<CalamineFormulaIterator> { | ||
| match &self.formula_range { | ||
| Some(formula_range) => { | ||
| let data_start = self.range.start().unwrap_or((0, 0)); | ||
| let data_end = self.range.end().unwrap_or((0, 0)); | ||
| Ok(CalamineFormulaIterator::from_range_with_data_bounds( | ||
| Arc::clone(formula_range), | ||
| data_start, | ||
| data_end | ||
| )) | ||
| }, | ||
| None => Err(pyo3::exceptions::PyValueError::new_err( | ||
| "Formula iteration is disabled. Use read_formulas=True when creating the workbook to enable formula access." | ||
| )), | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formula iterator row count can diverge from iter_rows for non‑zero starts or empty sheets.
iter_rows() emits leading empty rows when start.0 > 0, but CalamineFormulaIterator’s height is computed from data_end - data_start + 1, so it skips those rows. For empty ranges, unwrap_or((0,0)) produces a 1×1 empty row. This violates the “same dimensions” contract in those cases.
🐛 Suggested fix to align formula dimensions with iter_rows and empty ranges
- let data_start = self.range.start().unwrap_or((0, 0));
- let data_end = self.range.end().unwrap_or((0, 0));
+ let data_start = self.range.start();
+ let data_end = self.range.end();
Ok(CalamineFormulaIterator::from_range_with_data_bounds(
Arc::clone(formula_range),
data_start,
data_end
))- fn from_range_with_data_bounds(
- range: Arc<Range<String>>,
- data_start: (u32, u32),
- data_end: (u32, u32),
- ) -> CalamineFormulaIterator {
- let width = if data_start <= data_end {
- (data_end.1 - data_start.1 + 1) as usize
- } else {
- 0
- };
-
- let height = if data_start <= data_end {
- (data_end.0 - data_start.0 + 1) as usize
- } else {
- 0
- };
-
- CalamineFormulaIterator {
- position: 0,
- start: data_start,
- range,
- width,
- height,
- }
- }
+ fn from_range_with_data_bounds(
+ range: Arc<Range<String>>,
+ data_start: Option<(u32, u32)>,
+ data_end: Option<(u32, u32)>,
+ ) -> CalamineFormulaIterator {
+ let (start, width, height) = match (data_start, data_end) {
+ (Some(start), Some(end)) if start <= end => {
+ let width = (end.1 - start.1 + 1) as usize;
+ // include leading empty rows to match iter_rows
+ let height = (end.0 + 1) as usize;
+ (start, width, height)
+ }
+ _ => ((0, 0), 0, 0),
+ };
+
+ CalamineFormulaIterator {
+ position: 0,
+ start,
+ range,
+ width,
+ height,
+ }
+ }- let current_row = slf.start.0 + slf.position;
- slf.position += 1;
+ let current_row = slf.position;
+ slf.position += 1;
// Create the result row with proper width, filled with empty strings
let mut result_row = vec!["".to_string(); slf.width];
- // Fill in formulas for this row by checking each column position
- for (col_idx, result_cell) in result_row.iter_mut().enumerate() {
- let current_col = slf.start.1 + col_idx as u32;
- if let Some(formula) = slf.range.get_value((current_row, current_col)) {
- if !formula.is_empty() {
- *result_cell = formula.clone();
- }
- }
- }
+ // Fill in formulas only once we are within the data start row
+ if current_row >= slf.start.0 {
+ for (col_idx, result_cell) in result_row.iter_mut().enumerate() {
+ let current_col = slf.start.1 + col_idx as u32;
+ if let Some(formula) = slf.range.get_value((current_row, current_col)) {
+ if !formula.is_empty() {
+ *result_cell = formula.clone();
+ }
+ }
+ }
+ }Also applies to: 331-387
🤖 Prompt for AI Agents
In `@src/types/sheet.rs` around lines 230 - 245, iter_formulas currently uses
unwrap_or((0,0)) for range bounds which causes a 1×1 empty row for empty ranges
and drops leading empty rows when start.0 > 0; fix by computing bounds like
iter_rows: let data_start = self.range.start().unwrap_or((0,0)); if
self.range.is_empty() set data_end to a value that yields zero height (e.g. make
data_end row/col < data_start so height/width compute to 0), otherwise use
self.range.end().unwrap(); keep data_start unchanged so leading empty rows (when
start.0 > 0) are preserved, then call
CalamineFormulaIterator::from_range_with_data_bounds with those adjusted
data_start/data_end values. Ensure you use self.range.is_empty(),
self.range.start(), self.range.end(), iter_formulas, and
CalamineFormulaIterator::from_range_with_data_bounds to locate and implement the
change.
Overview
This PR introduces optional formula reading capabilities to python-calamine, allowing users to access the underlying formulas in spreadsheet cells rather than just their calculated values.
Motivation
Previously, python-calamine only provided access to cell values (the calculated results of formulas). Users had no way to access the actual formula expressions, which is essential for:
Implementation
Core Changes
Opt-in Design: Formula reading is disabled by default (
read_formulas=False) to maintain backward compatibility and avoid performance overhead for users who don't need formulas.New API Parameter: Added
read_formulasparameter to all workbook creation methods:CalamineWorkbook.from_object(path_or_filelike, read_formulas=False)CalamineWorkbook.from_path(path, read_formulas=False)CalamineWorkbook.from_filelike(filelike, read_formulas=False)load_workbook(path_or_filelike, read_formulas=False)Formula Iterator: Added
CalamineSheet.iter_formulas()method that returns aCalamineFormulaIteratorwith consistent dimensions matching the data iterator.Performance Optimization: Implemented on-demand coordinate mapping instead of pre-allocating expanded ranges, ensuring minimal memory overhead.
Technical Details
get_value()lookups rather than expanding formula ranges upfrontCalamineCellIteratorandCalamineFormulaIteratorexposeposition,start,width, andheightproperties for coordinate recoveryFile Format Support
Supports formula reading across all major spreadsheet formats:
Formula syntax varies by format (e.g., ODS uses
of:=SUM([.A1:.B1])vs Excel'sSUM(A1:B1)).API Examples
Backward Compatibility
read_formulas=False)Testing
Comprehensive test suite covering:
Performance Impact
Summary by CodeRabbit
Release Notes
read_formulasparameter to workbook creation methods (load_workbook,from_path,from_object,from_filelike) to enable formula extraction from spreadsheetsiter_formulas()method to iterate through formulas in sheetsread_formulasat workbook creation; attempting to read formulas when disabled raises an error✏️ Tip: You can customize this high-level summary in your review settings.