CADS Research Visualization System
Semantic analytics pipeline that turns faculty publication data into interactive visual maps for the Texas State research community.
Snapshot
- Automated ingestion from OpenAlex + Supabase, clustering 2,400+ publications into thematic groups.
- UMAP + HDBSCAN embeddings surface collaboration hotspots and cross-department opportunities.
- Streamlit dashboard and monitoring suite give CADS leadership live health metrics on ingest jobs.
Architecture
CADS Research Visualization System
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Data Sources β β Core Pipeline β β Visualization β
βββββββββββββββββββ€ ββββββββββββββββββββ€ βββββββββββββββββββ€
β β’ OpenAlex API βββββΆβ β’ Data Loader βββββΆβ β’ Web Dashboard β
β β’ Supabase DB β β β’ Embeddings β β β’ Search System β
β β’ CADS Faculty β β β’ UMAP/HDBSCAN β β β’ Interactive β
β β’ Research Data β β β’ Theme Gen β β Visualizationsβ
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
Why It Matters
- Cuts hours of manual grant scouting by exposing real-time semantic search and profile matching.
- Provides reproducible analyticsβCI runs tests on ingest scripts, and monitoring catches schema drift before faculty demos.
- NSF CAP award showcases the system as the backbone for expanding AI curriculum across campus.
Stack & Operations
- Python ingestion workers (Poetry + Airflow-ready scripts) writing into Supabase vector tables.
- Visualization layer served from a hardened
visuals/publicbundle with CDNs for department-wide access. - Documentation set spans pipeline playbooks, troubleshooting guides, and CI/CD runbooks so new CADS hires can onboard in a day.
