Skip to content

Conversation

@e2720pjk
Copy link

@e2720pjk e2720pjk commented Jan 20, 2026

This PR splits out a feature from the fork branch referenced in #28 as an initial contribution🙏
Welcome to review and provide suggestions. I'm happy to make adjustments as needed

TLDR

Adds support for respecting .gitignore patterns during repository analysis. Uses git check-ignore when available with fallback to pathspec library.

Examples:

# Enable gitignore respect
codewiki generate --respect-gitignore

# Save as default setting
codewiki config set --respect-gitignore

Dive Deeper

Key Changes

repo_analyzer.py
  • Added _user_exclude_patterns to distinguish user-specified vs default excludes
  • Implemented three-tier priority logic for respect_gitignore:
    1. Git ignores (returns True) → exclude file
    2. Git doesn't ignore (returns False) → check user-specified excludes, then DEFAULT_IGNORE_PATTERNS
    3. Git unavailable (returns None) → fallback to .gitignore file via pathspec
  • Modified _should_include_file to use pathspec matching for include patterns
  • Bug fix: Ensure DEFAULT_IGNORE_PATTERNS are checked when git returns False (line 188)
analysis_service.py
  • Added None checks in analyze_local_repository and _analyze_structure
  • Returns empty structure on failure/exclusion to prevent crashes

How It Works

  1. Git available: Uses git check-ignore command for full recursive accuracy
  2. Git unavailable: Falls back to pathspec library for root-only .gitignore matching
  3. Features: Handles nested .gitignore files, pattern negation (!pattern), and complex patterns
  4. DEFAULT_IGNORE_PATTERNS: Always checked for patterns like .git, *.pyc, __pycache__/, node_modules/, etc.

Exclusion priority:

  1. Git ignore (if --respect-gitignore enabled and git says ignored) → exclude
  2. User-specified excludes (CLI --exclude) → exclude
  3. .gitignore file (if --respect-gitignore enabled) → exclude
  4. DEFAULT_IGNORE_PATTERNS (always checked) → exclude

Use cases:

  • Node.js projects: Excludes node_modules/, build/, .env, *.log
  • Python projects: Excludes __pycache__/, *.pyc, .venv/, *.egg-info/
  • Custom patterns: All patterns in your .gitignore file are respected
  • Default exclusions: Git internals (.git/), caches, logs, and other standard patterns always excluded

Reviewer Test Plan Suggestions

1. Check Documentation

You can review the changes in README.md to verify accuracy and clarity:

git show HEAD -- README.md

2. Verify Gitignore Behavior

You can run this script to verify hybrid strategy and priority logic:

python tests/test_gitignore_verification.py

The test script verifies:

  • Basic gitignore patterns (*.log)
  • Multiple log files excluded by DEFAULT_IGNORE_PATTERNS
  • Nested .gitignore files
  • CLI --exclude overrides Git tracking

Output:

Details 截圖 2026-01-20 上午11 12 39

3. Runtime Check

You can verify the CLI configuration loads the flag correctly:

codewiki config show
# Should show: "Respect Gitignore: True" when set

codewiki generate --help | grep -i "gitignore"
# Should show the --respect-gitignore option

Output:

Details 截圖 2026-01-20 上午11 13 28 截圖 2026-01-20 上午11 14 10

- Implement hybrid .gitignore processing using git check-ignore with pathspec fallback
- Add --respect-gitignore CLI option to both config and generate commands
- Update configuration models to store gitignore preference persistently
- Enhance RepoAnalyzer with gitignore pattern matching and priority logic
- Add comprehensive test suite for gitignore verification including negation patterns
- Update documentation with detailed pattern behavior and processing logic
- Add pathspec dependency for robust gitignore pattern matching

The feature respects .gitignore patterns during file analysis while maintaining proper priority:
1. Git ignore patterns are checked first
2. User CLI exclude patterns override git tracking
3. Default ignore patterns are applied last
4. Include patterns filter the remaining files
@e2720pjk e2720pjk force-pushed the feat/respect-gitignore branch from 3d79fef to 93a610e Compare January 20, 2026 03:32
…support

- Remove early return that prevented default ignore patterns from being checked when custom patterns were matched
- Remove support for negation patterns (patterns starting with !) as they were causing incorrect exclusion behavior
- Update tests to reflect corrected behavior where important.log is now properly matched by *.log pattern
- Allow default ignore patterns to be applied after custom pattern matching instead of short-circuiting

This change ensures that ignore patterns work as expected by not prematurely returning when negation patterns are encountered, and instead allows all applicable ignore patterns to be evaluated properly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants