Skip to content

Conversation

@BBertram-hex
Copy link
Collaborator

@BBertram-hex BBertram-hex commented Jan 28, 2026

Introduce new Keywords in the havocompare config file:

rules:
  CSV:
    Preprocessing:
  • KeepColumnsByName
  • KeepColumnsByNameG
  • DeleteColumnByNameG

DeleteColumnByName: "" (without the 'G') already exists as a preprocessor step. It does an exact match that deletes the first column from both actual and nominal csv files, whos header exactly matches . So later at the comparison, such columns are treated as if they were never in the actual.csv or nominal.csv (or both).

KeepColumnsByName

KeepColumnByName has a list of strings that are exactly compared to the extracted table headers. Only those headers that have at least one exact match from that list, are kept, e.g.

        - KeepColumnsByName:
            - "Center x [mm]"
            - "Center y [mm]"

Applied to a CSV like

Center x [mm] Center y [mm] Center z [mm] any other string Center x [mm] Center x
1 2 3 4 5 6
....

would delete all matches to any of the strings in the config:

Center x [mm] Center y [mm] DELETED DELETED Center x [mm] DELETED
1 2 DELETED DELETED 5 DELETED
....

Globbing variants (New)

use suffix 'G' in the preprocessor step, then '*', '**', '?' can be used in the list of pattern strings as wildcards.
Preprocessor steps, that support globbing:

  • KeepColumnsByNameG
  • DeleteColumnByNameG

How globbing with

  • first an exact match of the header is tried (like the non-globbing variant above)
    • if the strings are exactly equal, then the column is kept.
  • if the string from the config doesn't match the header exactly, then it is interpreted as a glob pattern.
    • if the glob matches, the column is kept.
  • if a column in the CSV matches the config pattern neither exactly nor as a glob, then the column is marked as deleted in nominal and actual file (thus ignored).

Note: Square brackets, like '[um]' also denote a wildcard, that matches 'u' OR 'm' => Thats why an exact match is also tried, so that the glob variant always matches (and therefore compares) what the exact variant would compare and more.

- KeepColumnsByNameG:
            - "Center *"

matches in CSV:

  • "Center x [mm]", "Center y [mm]",
  • but also "Center " ( not a wildcard

KeepColumnsByNameG: ... - "Center ? [mm]" however would not match "Center x [mm]" because it does not match char-by-char and the pattern "Center ? [mm]" matches only "Center <any character> m". The glob "Center [xyz] [[]mm[]]"` would be doing whats intended (exact unit, spaces and x, y, or z at the right place) but maybe too complicated to read.

closes #57

- exact match of the header string
- other columns will be deleted.
- as fall-back, if CSV header is not exactly the same as the string in the rule
@BBertram-hex BBertram-hex self-assigned this Jan 28, 2026
})?;

if let Some(c) = table.columns.iter_mut().find(|col| {
col.header.as_deref().unwrap_or_default() == name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would create a temp variable for col.header.as_deref().unwrap_or_default()

))
})?;

if let Some(c) = table.columns.iter_mut().find(|col| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we use filter() here? find will only catch 1 result, and we want more?

extending the unit test might also be a good idea

use crate::csv::{Column, Delimiters, Error};
use std::fs::File;

macro_rules! string_vec {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like this is overkill :D ... might use vec!["".to_owned()] or change the arguments in keep_columns_matching_any_names to use Vec<&str>


fn keep_columns_matching_any_names(
table: &mut Table,
names: &Vec<String>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use names: &[String]

- fixed wrong error string
- clippy -> use slice instead of specific Container
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support whitelisting header names when preprocessing a CSV

3 participants