Skip to content

iSamples MVP Cleanup & Simplification Strategy #49

@rdhyee

Description

@rdhyee

Overview

This issue documents the cleanup strategy for tying a bow around the iSamples project (this round). Goal: simplify to an MVP, then refine.

Date: 2026-01-29
Scope: 3 repositories (isamplesorg-metadata, isamples-python/examples, isamplesorg.github.io)


The Stack at a Glance

┌─────────────────────────────────────────────────────────────────────┐
│                        USER EXPERIENCE                               │
├──────────────────────────────┬──────────────────────────────────────┤
│  isamplesorg.github.io       │  isamples-python (examples)          │
│  (Browser - Zero Install)    │  (Jupyter - Developer)               │
│  • isamples_explorer.qmd     │  • isamples_explorer.ipynb           │
│  • parquet_cesium_*.qmd      │  • geoparquet.ipynb                  │
│  • DuckDB-WASM + Cesium      │  • DuckDB + Lonboard                 │
└──────────────────────────────┴──────────────────────────────────────┘
                                    │
                          ┌─────────▼─────────┐
                          │   DATA LAYER      │
                          │   Cloudflare R2   │
                          │   Wide: 280MB     │
                          │   Narrow: 850MB   │
                          └─────────┬─────────┘
                                    │
┌───────────────────────────────────▼─────────────────────────────────┐
│                     METADATA STANDARD                                │
│                   isamplesorg-metadata                               │
│  • 8 Entity Types (MaterialSampleRecord, SamplingEvent, etc.)       │
│  • 14 Predicates (produced_by, has_material_category, etc.)         │
│  • JSON Schema, JSON-LD, SKOS vocabularies                          │
└─────────────────────────────────────────────────────────────────────┘

MVP Definition: What to Keep

Tier 1: Essential (The Core Product)

Component Location Purpose
Data on R2 Cloudflare Wide parquet (280MB) - single source of truth
Schema isamplesorg-metadata isamples_core.yaml + JSON Schema
Browser Explorer isamplesorg.github.io isamples_explorer.qmd - main discovery UX
3D Globe isamplesorg.github.io parquet_cesium_isamples_wide.qmd
Jupyter Explorer isamples-python isamples_explorer.ipynb - developer entry point
Visualization Patterns isamples-python geoparquet.ipynb - Lonboard patterns

Tier 2: Educational (Keep for Learning)

Component Location Purpose
PQG Demo isamples-python pqg_demo.ipynb - property graph queries
Schema Comparison isamples-python schema_comparison.ipynb - narrow vs wide
SQL Deep Dive isamplesorg.github.io zenodo_isamples_analysis.qmd
Graph Documentation isamplesorg-metadata src/docs/UNDERSTANDING_THE_GRAPH.md

Cleanup Actions

✅ Completed: isamples-python (PR #2 merged)

  • Archived defunct API client → archive/defunct-api-client/
  • Archived export parquet tools → archive/export-parquet-tools/
  • Updated pyproject.toml to examples-only repo
  • Updated README, CLAUDE.md, STATUS.md

🔲 TODO: isamples-python (remaining)

  • Remove playwright/ directory (-15MB node_modules cruft)
  • Clean .ipynb_checkpoints/ across examples

🔲 TODO: isamplesorg.github.io

Action Impact Effort
Delete assets/oc_isamples_pqg.parquet -724MB 5 min
Archive empty stubs (parquet.qmd, etc.) Clarity 10 min
Update tutorial index to highlight 3 core tutorials UX 15 min
Add cross-links to isamples-python Discovery 10 min

🔲 TODO: isamplesorg-metadata

Action Impact Effort
Add "Related Repositories" section to README Discovery 10 min
Archive notes/vocabulary/ (moved to separate repo) Clarity 5 min
Archive examples/APItesting/ Clarity 5 min
Consolidate README + new PQG docs into "Getting Started" Onboarding 30 min

Cross-Repo Linking

Each repo should include this in README:

## Related iSamples Repositories

| Repo | Purpose | Start Here |
|------|---------|------------|
| [isamplesorg-metadata](https://github.com/isamplesorg/metadata) | Schema definition | `src/schemas/isamples_core.yaml` |
| [isamples-python](https://github.com/isamplesorg/examples) | Jupyter examples | `examples/basic/isamples_explorer.ipynb` |
| [isamplesorg.github.io](https://isamplesorg.github.io/) | Browser tutorials | `tutorials/isamples_explorer.qmd` |
| [vocabularies](https://github.com/isamplesorg/vocabularies) | SKOS terms | Material types, context categories |

Canonical Data URLs

All repos should reference:

# Wide format (primary) - 280MB, 20M rows
WIDE_URL = "https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_wide.parquet"

# Narrow format (advanced) - 850MB, 106M rows  
NARROW_URL = "https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202512_narrow.parquet"

The "Bow" Summary

What iSamples MVP delivers:

  1. A domain-agnostic metadata standard for material samples (metadata repo)
  2. 6.7M samples from 4 sources in efficient geoparquet format (R2)
  3. Zero-install browser exploration with Cesium + DuckDB-WASM (website)
  4. Developer-friendly Jupyter examples for custom analysis (python repo)

What makes it work:

  • Single data source (R2 parquet) - no API dependency
  • Consistent schema (8 types, 14 predicates) across all domains
  • Two complementary UIs: browser (discovery) + Jupyter (analysis)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions