A Python package for segmenting XML files through automatic text structure recognition using YOLO.
- 🎯 Automatic Segmentation - Automatically recognizes text structures
- 🔍 Multiple Backends - Supports YOLO and Kraken (for baseline and linemask detection)
- 📐 Baseline Processing - Intelligent baseline extraction
- 🛠️ XML Utilities - Comprehensive XML processing functions
- ✅ Validation - Robust input validation with Pydantic
# Clone repository
git clone https://github.com/The-Flow-Project/package-segmenter.git
cd package-segmenter
# Install
make install# With all dev dependencies
make install-dev
# Or with pip
pip install -e ".[dev]"uv pip install -e .from flow_segmenter import SegmenterYOLO
from flow_segmenter.config import SegmenterConfig
from lxml import etree
# Create configuration
config = SegmenterConfig(
model_names=["Riksarkivet/yolov9-regions-1", "Riksarkivet/yolov9-lines-within-regions-1"],
order_lines=True,
baselines=True,
)
# Initialize segmenter
segmenter = SegmenterYOLO(config)
# Process XML file
result = segmenter.segment(
image_path="path/to/image.jpg",
)
print(f"Segmentation completed: {result}")
print(etree.tostring(result, pretty_print=True, encoding="unicode"))Have a look at the Makefile for various development commands, like:
make testTests created with Claude Sonnet 4.5.
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Add and run tests
- Format code (
make format) - Commit changes (
git commit -m 'Add amazing feature') - Push branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- ✨ Complete refactoring OOP structure
- 🔧 Improved configuration with Pydantic
- 🚀 Performance optimization with spatial indexing
- 🧪 Comprehensive test suite
- 📖 Extended documentation
- 🛠️ Makefile for development workflow
- 🎉 Initial release
- 🎯 Basic segmentation functionality
- 📐 Baseline extraction
- 🗺️ XML processing