My Web Intelligence

My Web Intelligence (MWI)

MWI is a reproducible research toolkit to collect web corpora, qualify/enrich them (NLP/LLM-assisted, auditable), and export interpretable outputs (CSV/JSON/GEXF) for digital methods in social sciences and communication studies.

Start here (flagship)

Use this repository first:

mwi (local reproducible “desktop-lab”): https://github.com/MyWebIntelligence/mwi

Do not start with mywebapi unless you explicitly need a scalable backend.

Quickstart (get a first result fast)

Recommended: Docker Compose.

git clone https://github.com/MyWebIntelligence/mwi.git
cd mwi

# Choose one mode
./scripts/docker-compose-setup.sh basic   # minimal local setup
# ./scripts/docker-compose-setup.sh api   # API-oriented mode
# ./scripts/docker-compose-setup.sh llm   # ML/embeddings/LLM mode

# Sanity check (example command)
docker compose exec mwi python mywi.py land list

Full installation details:

What MWI does (workflow)

Collect → Qualify → Analyze → Export

Collect
Build a corpus from seed URLs and curated sources, keep crawl traces, store pages + metadata.
Qualify
Extract readable content, enrich with NLP and optional LLM-based relevance gating.
Auditability is a design goal: raw traces are kept and decisions can be inspected.
Analyze
Produce socio-semantic structures: documents, expressions/entities, similarity links, networks.
Export
Generate outputs for analysis and visualization:

CSV / JSON
GEXF (Gephi)
structured datasets / reports

Key concept: “Land”

A Land is a research project container (topic) holding:

terms, seed URLs, crawls
extracted content + metadata
enrichment layers
exports

Think: one Land = one case study / one dataset / one pipeline run.

Repository map (what each repo is for)

Flagship (start here)

mwi
Local reproducible research tool (Python + SQLite + Docker Compose).
https://github.com/MyWebIntelligence/mwi

Components (use when relevant)

mwiR
R package for analysis and R-friendly workflows (bridge for R users).
https://github.com/MyWebIntelligence/mwiR
mywebapi
Scalable backend (FastAPI + PostgreSQL + Celery + Redis).
Note: this repository contains components “in transition” (API + legacy parts).
https://github.com/MyWebIntelligence/mywebapi

Architecture (high-level)

        ┌──────────────────────────┐
        │ mwi (flagship, local)     │
        │ CLI + reproducible setup  │
        └─────────────┬────────────┘
                      │
          SQLite DB + corpus files
                      │
        ┌─────────────┴────────────┐
        │                          │
  Exports (CSV/JSON/GEXF)     Optional scale-out
  for R / Gephi / notebooks   (mywebapi: Postgres/API/Celery)
        │                          │
      mwiR as bridge          external clients/pipelines

Academic citation

Recommended practice (until stable releases are published everywhere):

Cite the relevant paper(s) (HAL/publications).
Cite the software using either:
- a GitHub Release tag (preferred), or
- a commit hash.

Recommended professionalization steps:

Add CITATION.cff to mwi and mwiR
Publish GitHub Releases (e.g., v0.1.0)
Archive releases to Zenodo (DOI)

Support / Contact

For research collaborations, deployments at scale, or reproducible case studies, open an issue on the flagship repository: https://github.com/MyWebIntelligence/mwi/issues

License

See each repository for licensing details (MIT where specified).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

My Web Intelligence

My Web Intelligence (MWI)

Start here (flagship)

Quickstart (get a first result fast)

What MWI does (workflow)

Key concept: “Land”

Repository map (what each repo is for)

Flagship (start here)

Components (use when relevant)

Architecture (high-level)

Academic citation

Support / Contact

License

Pinned Loading

Repositories

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!