-
Notifications
You must be signed in to change notification settings - Fork 2
57 new preprocessor step for whitlisting of CSV columns #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- exact match of the header string - other columns will be deleted.
- as fall-back, if CSV header is not exactly the same as the string in the rule
| })?; | ||
|
|
||
| if let Some(c) = table.columns.iter_mut().find(|col| { | ||
| col.header.as_deref().unwrap_or_default() == name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would create a temp variable for col.header.as_deref().unwrap_or_default()
| )) | ||
| })?; | ||
|
|
||
| if let Some(c) = table.columns.iter_mut().find(|col| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we use filter() here? find will only catch 1 result, and we want more?
extending the unit test might also be a good idea
| use crate::csv::{Column, Delimiters, Error}; | ||
| use std::fs::File; | ||
|
|
||
| macro_rules! string_vec { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like this is overkill :D ... might use vec!["".to_owned()] or change the arguments in keep_columns_matching_any_names to use Vec<&str>
src/csv/preprocessing.rs
Outdated
|
|
||
| fn keep_columns_matching_any_names( | ||
| table: &mut Table, | ||
| names: &Vec<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use names: &[String]
- fixed wrong error string - clippy -> use slice instead of specific Container
Introduce new Keywords in the havocompare config file:
DeleteColumnByName: "" (without the 'G') already exists as a preprocessor step. It does an exact match that deletes the first column from both actual and nominal csv files, whos header exactly matches . So later at the comparison, such columns are treated as if they were never in the actual.csv or nominal.csv (or both).
KeepColumnsByName
KeepColumnByName has a list of strings that are exactly compared to the extracted table headers. Only those headers that have at least one exact match from that list, are kept, e.g.
Applied to a CSV like
would delete all matches to any of the strings in the config:
Globbing variants (New)
use suffix 'G' in the preprocessor step, then '*', '**', '?' can be used in the list of pattern strings as wildcards.
Preprocessor steps, that support globbing:
How globbing with
Note: Square brackets, like '[um]' also denote a wildcard, that matches 'u' OR 'm' => Thats why an exact match is also tried, so that the glob variant always matches (and therefore compares) what the exact variant would compare and more.
matches in CSV:
KeepColumnsByNameG: ... - "Center ? [mm]"however would not match "Center x [mm]" because it does not match char-by-char and the pattern"Center ? [mm]"matches only"Center <any character> m". The glob"Center [xyz] [[]mm[]]"` would be doing whats intended (exact unit, spaces and x, y, or z at the right place) but maybe too complicated to read.closes #57