feat: add collections validation and GFQL support#874
feat: add collections validation and GFQL support#874
Conversation
graphistry/PlotterBase.py
Outdated
| show_collections: Optional[bool] = None, | ||
| collections_global_node_color: Optional[str] = None, | ||
| collections_global_edge_color: Optional[str] = None, | ||
| encode: bool = True, |
There was a problem hiding this comment.
in the interest of simplifying, would it be difficult to detect pre-encoded strings?
There was a problem hiding this comment.
yeah this surprised me: the reason it's like this is bc the base REST api takes AABBCC instead of ~css colors like we started to do elsewhere i believe (js color(xyz) normalizes to rgba('...'), handling '#AABBCC', 'silver', etc)
so maybe fix is to ship better colors in REST, and then this upgrades to that?
There was a problem hiding this comment.
i think thats a great idea to ship better colours in REST will add that to the list
i was talking about the encode: bool = True, though, which is If True, JSON-minify and URL-encode collections. Use False for pre-encoded strings.
i was thinking we could just detect pre-encoded strings (which i assume are URL encoded and can be detected by attempting a parse). then we can remove this parameter
| if isinstance(collections, list): | ||
| return _coerce_collection_list(collections, validate_mode, warn) | ||
| if isinstance(collections, dict): | ||
| return _coerce_collection_list(collections, validate_mode, warn) |
There was a problem hiding this comment.
nitpick: would just get rid of the _coerce_collection_list function and put the contents in here, would get rid of checking is list is dict etc twice and make easier to read
| if validate_mode == 'autofix': | ||
| collection_type = str(collection_type) |
There was a problem hiding this comment.
if collection_type was not already a string, im not sure that just typecasting it is going to produce a valid type string given this string must be either set or intersection
There was a problem hiding this comment.
actually nevermind, just saw the if collection_type not in ('set', 'intersection'): below, im thinking maybe combine this check with the check below, if collection_type is not string its all but guaranteed to fail and hit continue in the section below
|
general comment, validate_mode = 'autofix' seems more likely to produce invalid output and/or result in a collection getting skipped than otherwise. to me its biggest value is typecasting stuff like id=1 to id="1" etc. not sure how much value there is in making it explicit, maybe we just have warn true/false and otherwise handle things silently? |
|
hmm, if they do something like that was mostly added b/c people keep having dirty data that fails arrow conversion and rather seeing it load vs fixing their data/cfg |
| return raw | ||
| return raw | ||
|
|
||
| return _normalize_ops_list(_extract_ops_value(gfql_ops)) |
There was a problem hiding this comment.
i feel this could be simplified by just feeding the GFQL into graphistry.compute.chain and seeing if it parses properly rather than setting up stuff here to determine if GFQL is valid or not
|
@lmeyerov regarding general comment above: im thinking maybe instead of
if we feel we need to type cast we might want to just do it anyway strict true or not |
|
the issue with strict true/false is that's closer to what we had before, and users were complaining that they just wanted it to 'work', hence autofix (coerce++). validate=true/false is closer to what you're thinking, while 'autofix' (coerce) is "it'll run, but may not be what you want, but you said wanted soemthing that runs" We therefore have leeway in what autofix does --- we just need to warn (if warn=auto/true), and do what. So the q is... what should it do wrt diff collections errors, if strong opinions about any params in any direction? My default intuition is probably:
|
|
agree with that, i think we do the 3 bullets listed if strict (or whatever we want to call it) is false and its almost what we are doing right now. theres only 1 point i would make about the current implementation
in summary its only the approach i think might need to change, not the intention EDIT: if we do the above it actually gets closer to a true autofix than what it was before, so i dont have an issue with the name anymore |
3cf9e9d to
41a33c4
Compare
aucahuasi
left a comment
There was a problem hiding this comment.
Thanks for this PR, I left some comments and I think we need to solve this feedback before merge:
https://github.com/graphistry/pygraphistry/pull/874/changes#r2778233409
- Remove dead code after return in _normalize_intersection_expr - Add empty sets list validation in _normalize_sets_list - Remove pointless type coercion (str() won't produce valid types) - Consolidate GFQL parsing: _normalize_gfql_ops calls _wrap_gfql_expr - Add cross-validation for intersection set ID references - Require id field for sets (server requires it, no auto-generation) - Use precise exception handling (TypeError, ValueError, GFQLValidationError) - Update tests to include required id fields 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Review Comments Addressed (commits 23d1f00, d2bdf02, 47d87e8)HIGH Priority - Fixed
MEDIUM Priority - Fixed
LOW Priority - Fixed
Still Open / Deferred
Key Design Decisions
CI Status: All jobs green (manual workflow_dispatch run 21793451172) - python-lint-types 3.8-3.14 ✅, all test suites ✅ |
- Set collections require id field (no auto-generation) - Intersection cross-validation for set ID references - GFQL parsing consolidation with precise exception handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address review comment #2683393920 - removes unnecessary helper function. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
CollectionsInput includes TypedDicts which need to be cast to Dict[str, Any]. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
9cc4506 to
47d87e8
Compare
Server requires IDs for all collection types (used as storage keys).
In autofix mode, generate `{type}_{idx}` IDs when missing instead of
dropping the collection. This makes simple use cases "just work" while
still warning about the missing ID.
- Sets get `set_0`, `set_1`, etc.
- Intersections get `intersection_0`, `intersection_1`, etc.
- Strict mode still rejects missing IDs
- Added test for auto-generation behavior
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change from `set_0` to `set-0` for consistency with other ID patterns
in the codebase (e.g., kepler's `dataset-{uuid[:8]}`).
User-provided IDs can be any string - no validation beyond type check.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add normalize_gfql_to_wire() as canonical GFQL→wire implementation - Simplify collections.py (removed 36 lines of duplicate logic) - Simplify validate_collections.py to call compute/ast - Clean import graph: models → compute → helpers/validation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
aucahuasi
left a comment
There was a problem hiding this comment.
Thanks for fixing the previous issue, here are some others that caught my attention:
https://github.com/graphistry/pygraphistry/pull/874/changes#r2789010065
https://github.com/graphistry/pygraphistry/pull/874/changes#r2789125932
…intersections) - Allow intersections to reference other intersections (backend supports this) - Detect self-references (intersection referencing itself) - Detect cycles in intersection dependencies (A->B->A) - Add tests for DAG validation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
AST class-level from_json methods (ASTLet, ASTRemoteGraph, ASTRef, ASTCall) use bare `assert` for required field checks. These raise AssertionError which was not caught by the validation boundary, causing raw crashes instead of proper validation errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Thanks for the thorough reviews! All comments have been addressed: Consolidation & Architecture:
Validation Improvements:
Code Quality:
CI is green, ready for merge. |
Summary
graphistry/collections.pyand introduce typed models ingraphistry/models/collections.pyKey Features
g.collections(...)API for defining subsets via GFQL expressions with priority-based visual encodingsgraphistry.collection_set(...)andgraphistry.collection_intersection(...)showCollections,collectionsGlobalNodeColor, andcollectionsGlobalEdgeColorURL paramsValidation Behavior
ValueErroron invalid inputidfield (server requirement) - missing IDs are warned/dropped, not auto-generated_wrap_gfql_expras canonical implementation with precise exception handlingTesting
Review Comments Addressed
See comment for detailed breakdown of all review comments and their resolutions.