-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Overview
This issue documents the cleanup strategy for tying a bow around the iSamples project (this round). Goal: simplify to an MVP, then refine.
Date: 2026-01-29
Scope: 3 repositories (isamplesorg-metadata, isamples-python/examples, isamplesorg.github.io)
The Stack at a Glance
┌─────────────────────────────────────────────────────────────────────┐
│ USER EXPERIENCE │
├──────────────────────────────┬──────────────────────────────────────┤
│ isamplesorg.github.io │ isamples-python (examples) │
│ (Browser - Zero Install) │ (Jupyter - Developer) │
│ • isamples_explorer.qmd │ • isamples_explorer.ipynb │
│ • parquet_cesium_*.qmd │ • geoparquet.ipynb │
│ • DuckDB-WASM + Cesium │ • DuckDB + Lonboard │
└──────────────────────────────┴──────────────────────────────────────┘
│
┌─────────▼─────────┐
│ DATA LAYER │
│ Cloudflare R2 │
│ Wide: 280MB │
│ Narrow: 850MB │
└─────────┬─────────┘
│
┌───────────────────────────────────▼─────────────────────────────────┐
│ METADATA STANDARD │
│ isamplesorg-metadata │
│ • 8 Entity Types (MaterialSampleRecord, SamplingEvent, etc.) │
│ • 14 Predicates (produced_by, has_material_category, etc.) │
│ • JSON Schema, JSON-LD, SKOS vocabularies │
└─────────────────────────────────────────────────────────────────────┘
MVP Definition: What to Keep
Tier 1: Essential (The Core Product)
| Component | Location | Purpose |
|---|---|---|
| Data on R2 | Cloudflare | Wide parquet (280MB) - single source of truth |
| Schema | isamplesorg-metadata | isamples_core.yaml + JSON Schema |
| Browser Explorer | isamplesorg.github.io | isamples_explorer.qmd - main discovery UX |
| 3D Globe | isamplesorg.github.io | parquet_cesium_isamples_wide.qmd |
| Jupyter Explorer | isamples-python | isamples_explorer.ipynb - developer entry point |
| Visualization Patterns | isamples-python | geoparquet.ipynb - Lonboard patterns |
Tier 2: Educational (Keep for Learning)
| Component | Location | Purpose |
|---|---|---|
| PQG Demo | isamples-python | pqg_demo.ipynb - property graph queries |
| Schema Comparison | isamples-python | schema_comparison.ipynb - narrow vs wide |
| SQL Deep Dive | isamplesorg.github.io | zenodo_isamples_analysis.qmd |
| Graph Documentation | isamplesorg-metadata | src/docs/UNDERSTANDING_THE_GRAPH.md |
Cleanup Actions
✅ Completed: isamples-python (PR #2 merged)
- Archived defunct API client →
archive/defunct-api-client/ - Archived export parquet tools →
archive/export-parquet-tools/ - Updated pyproject.toml to examples-only repo
- Updated README, CLAUDE.md, STATUS.md
🔲 TODO: isamples-python (remaining)
- Remove
playwright/directory (-15MB node_modules cruft) - Clean
.ipynb_checkpoints/across examples
🔲 TODO: isamplesorg.github.io
| Action | Impact | Effort |
|---|---|---|
Delete assets/oc_isamples_pqg.parquet |
-724MB | 5 min |
Archive empty stubs (parquet.qmd, etc.) |
Clarity | 10 min |
| Update tutorial index to highlight 3 core tutorials | UX | 15 min |
| Add cross-links to isamples-python | Discovery | 10 min |
🔲 TODO: isamplesorg-metadata
| Action | Impact | Effort |
|---|---|---|
| Add "Related Repositories" section to README | Discovery | 10 min |
Archive notes/vocabulary/ (moved to separate repo) |
Clarity | 5 min |
Archive examples/APItesting/ |
Clarity | 5 min |
| Consolidate README + new PQG docs into "Getting Started" | Onboarding | 30 min |
Cross-Repo Linking
Each repo should include this in README:
## Related iSamples Repositories
| Repo | Purpose | Start Here |
|------|---------|------------|
| [isamplesorg-metadata](https://github.com/isamplesorg/metadata) | Schema definition | `src/schemas/isamples_core.yaml` |
| [isamples-python](https://github.com/isamplesorg/examples) | Jupyter examples | `examples/basic/isamples_explorer.ipynb` |
| [isamplesorg.github.io](https://isamplesorg.github.io/) | Browser tutorials | `tutorials/isamples_explorer.qmd` |
| [vocabularies](https://github.com/isamplesorg/vocabularies) | SKOS terms | Material types, context categories |Canonical Data URLs
All repos should reference:
# Wide format (primary) - 280MB, 20M rows
WIDE_URL = "https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_wide.parquet"
# Narrow format (advanced) - 850MB, 106M rows
NARROW_URL = "https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202512_narrow.parquet"The "Bow" Summary
What iSamples MVP delivers:
- A domain-agnostic metadata standard for material samples (metadata repo)
- 6.7M samples from 4 sources in efficient geoparquet format (R2)
- Zero-install browser exploration with Cesium + DuckDB-WASM (website)
- Developer-friendly Jupyter examples for custom analysis (python repo)
What makes it work:
- Single data source (R2 parquet) - no API dependency
- Consistent schema (8 types, 14 predicates) across all domains
- Two complementary UIs: browser (discovery) + Jupyter (analysis)
Metadata
Metadata
Assignees
Labels
No labels