Skip to content

Commit 4cd3164

Browse files
rdhyeeclaude
andcommitted
Add cross-repo links and data source documentation
- Add Data Sources section with canonical R2 URLs - Add Related Repositories table - Part of MVP cleanup strategy (issue #49) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent ebe8420 commit 4cd3164

File tree

7 files changed

+2020
-0
lines changed

7 files changed

+2020
-0
lines changed

README.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,3 +51,23 @@ The generated docs are placed under `models/generated/vocabularies`
5151
After editing, push the sources to GitHub. The rendered pages are generated using the `Render using Quarto and push to GH-pages` GitHub action that is currently manually triggered.
5252

5353
Updating dependencies using `pip -U <<package name>>` and regenerate `requirements.txt` with `pip freeze > requirements.txt`.
54+
55+
## Data Sources
56+
57+
All tutorials query parquet files hosted on Cloudflare R2:
58+
59+
```javascript
60+
// Wide format (recommended) - 280 MB, 20M rows
61+
const WIDE_URL = "https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_wide.parquet";
62+
63+
// Narrow format (advanced) - 850 MB, 106M rows
64+
const NARROW_URL = "https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202512_narrow.parquet";
65+
```
66+
67+
## Related Repositories
68+
69+
| Repo | Purpose | Start Here |
70+
|------|---------|------------|
71+
| [isamplesorg-metadata](https://github.com/isamplesorg/metadata) | Schema definition (8 types, 14 predicates) | `src/schemas/isamples_core.yaml` |
72+
| [isamples-python](https://github.com/isamplesorg/examples) | Jupyter examples (DuckDB + Lonboard) | `examples/basic/isamples_explorer.ipynb` |
73+
| [vocabularies](https://github.com/isamplesorg/vocabularies) | SKOS vocabulary terms | Material types, context categories |

SESSION_NOTES.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
# Session Continuity Notes - Oct 3, 2025
2+
3+
## Current Status: Phase 2 Complete ✅
4+
5+
We're in the middle of integrating Eric Kansa's 4 query functions from `oc_parquet_analysis_enhanced.ipynb` into `/tutorials/parquet_cesium.qmd`.
6+
7+
### Completed Work
8+
9+
**Phase 1: Documentation** (Commit d5d6690)
10+
- ✅ Added comprehensive Path 1/Path 2 explanation with diagrams
11+
- ✅ Added full relationship map (Agent, IdentifiedConcept paths)
12+
- ✅ Added Eric's 4 query function analysis with summary table
13+
14+
**Phase 2: First Query Implementation** (Commit 3224eb1)
15+
- ✅ Implemented `get_samples_at_geo_cord_location_via_sample_event()`
16+
- ✅ Combines Path 1 + Path 2 with UNION
17+
- ✅ Returns rich metadata: thumbnail_url, description, alternate_identifiers, site info
18+
- ✅ Added reactive cell `selectedSamplesCombined` with loading state
19+
- ✅ Added display section "Combined Samples at Location"
20+
21+
### What's Next: Phase 3
22+
23+
**Remaining 2 queries from Eric's notebook** (cell 59):
24+
25+
1. **Agent Query** - `get_sample_data_agents_sample_pid(sample_pid)`
26+
- Shows who collected/registered samples
27+
- Path: MaterialSampleRecord → produced_by → SamplingEvent → {responsibility, registrant} → Agent
28+
- Returns: sample metadata + agent info (agent_pid, agent_name, predicate)
29+
- **Note**: Independent of Path 1/Path 2 (no geographic data needed)
30+
31+
2. **Keywords/Concepts Query** - `get_sample_types_and_keywords_via_sample_pid(sample_pid)`
32+
- Shows material types and classifications
33+
- Path: MaterialSampleRecord → {keywords, has_sample_object_type, has_material_category} → IdentifiedConcept
34+
- Returns: sample metadata + concept info (keyword_pid, keyword, predicate)
35+
- **Note**: Direct edges to concepts, bypasses SamplingEvent entirely!
36+
37+
### Key Technical Details
38+
39+
**Table alias difference**:
40+
- Eric's Python notebook uses: `FROM pqg AS ...`
41+
- Our JavaScript implementation uses: `FROM nodes ...`
42+
- DuckDB view is created as: `CREATE VIEW nodes AS SELECT * FROM read_parquet('${parquet_path}')`
43+
44+
**Loading pattern to follow**:
45+
```javascript
46+
async function get_FUNCTION_NAME(pid) {
47+
if (pid === null || pid ==="" || pid == "unset") return [];
48+
const q = `SQL QUERY HERE`;
49+
const result = await loadData(q, [pid], "loading_ID", "key");
50+
return result ?? [];
51+
}
52+
53+
mutable FUNCTIONLoading = false;
54+
55+
selectedFUNCTION = {
56+
mutable FUNCTIONLoading = true;
57+
try {
58+
return await get_FUNCTION_NAME(clickedPointId);
59+
} finally {
60+
mutable FUNCTIONLoading = false;
61+
}
62+
}
63+
```
64+
65+
**Display pattern**:
66+
```markdown
67+
## Section Title
68+
69+
<div id="loading_ID" hidden>Loading message…</div>
70+
71+
Explanation text...
72+
73+
\`\`\`{ojs}
74+
//| echo: false
75+
variable = selectedFUNCTION
76+
FUNCTIONLoading ? md`(loading…)` : md`\`\`\`
77+
${JSON.stringify(variable, null, 2)}
78+
\`\`\`
79+
`
80+
\`\`\`
81+
```
82+
83+
### File Locations
84+
85+
**Main working file**: `/Users/raymondyee/C/src/iSamples/isamplesorg.github.io/tutorials/parquet_cesium.qmd`
86+
87+
**Reference notebook**: `/Users/raymondyee/C/src/iSamples/isamples-python/examples/basic/oc_parquet_analysis_enhanced.ipynb` (cell 59)
88+
89+
**Local parquet**: `http://localhost:4979/assets/oc_isamples_pqg.parquet` (691MB, in `docs/assets/`)
90+
91+
**Branch**: `issue-13-parquet-duckdb`
92+
93+
### Quick Start Commands
94+
95+
```bash
96+
# Navigate to project
97+
cd /Users/raymondyee/C/src/iSamples/isamplesorg.github.io
98+
99+
# Check current branch
100+
git status
101+
102+
# Start Quarto preview (if needed)
103+
quarto preview
104+
105+
# View notebook for reference
106+
code /Users/raymondyee/C/src/iSamples/isamples-python/examples/basic/oc_parquet_analysis_enhanced.ipynb
107+
```
108+
109+
### Session Pickup Prompt
110+
111+
"Let's continue integrating Eric's queries into parquet_cesium.qmd. We completed Phase 2 (combined samples query). Next we need to add the agent query and keywords/concepts query. Should we start with the agent query?"
112+
113+
### Notes
114+
- All existing queries (`get_samples_1`, `get_samples_2`, `get_samples_at_geo_cord_location_via_sample_event`) are working and preserved
115+
- Pattern is established - just need to adapt Eric's remaining 2 SQL queries to JavaScript
116+
- Consider UI improvements after Phase 3 complete (tables, clickable links, thumbnails)

0 commit comments

Comments
 (0)