Skip to content

CLI: add stdin support for instance validation #614

Open
Vaibhav701161 wants to merge 7 commits intosourcemeta:mainfrom
Vaibhav701161:feat/stdin-validate
Open

CLI: add stdin support for instance validation #614
Vaibhav701161 wants to merge 7 commits intosourcemeta:mainfrom
Vaibhav701161:feat/stdin-validate

Conversation

@Vaibhav701161
Copy link

Adds stdin support for instance input to enable real-time editor diagnostics.
Fixes(#537)

  • Centralized stdin parsing in src/input.h (JSON-first, YAML fallback).
  • validate accepts instances from stdin only; schemas must be file paths.
  • Stdin inputs resolve relative $ref against the current working directory.
  • Includes integration test test/validate/pass_stdin.sh.
  • Unblocks sourcemeta/studio#61; addresses sourcemeta/jsonschema#537.

Refs: sourcemeta/studio#61, #537

*Build and test:
Screenshot 2026-01-15 145159

@Vaibhav701161 Vaibhav701161 force-pushed the feat/stdin-validate branch 2 times, most recently from ddb8312 to 9a040d7 Compare January 15, 2026 10:07
@Vaibhav701161
Copy link
Author

Hi @jviotti , I’ve opened this PR and would appreciate a review when you have time. Thanks!

@jviotti
Copy link
Member

jviotti commented Jan 19, 2026

Hey @Vaibhav701161 , sorry for the delay. It has been a hectic week. I'll try to get into it today

@jviotti
Copy link
Member

jviotti commented Jan 19, 2026

I like the idea but I think there are some things to iron out:

  • If we accept standard input, we should do so in ALL commands, otherwise it gets weird
  • Some commands require a schema as input but for others, passing nothing means "the current directory", which introduces some ambiguity. For example, jsonschema fmt formats all schemas in the current directory, while jsonschema inspect <schema.json> requires a single schema argument
  • Furthermore, some commands require 2 schemas, like validate

I think for consistency, we can require the - to signify standard input in every command. However, for commands take potentially take directories as input, we should transparently implement this logic on src/input.h. So it happens out of the box.

The other problem here will be line numbers. We should make sure all relevant commands handle standard input while still reporting line/column information and potentially some "special" file path like <stdin>?

@Vaibhav701161
Copy link
Author

Vaibhav701161 commented Jan 20, 2026

Hey @jviotti , thanks a lot for the feedback.

I’ve pushed an updated iteration that tries to address the concerns you mentioned:

  • - is now the explicit opt-in for stdin.
  • The stdin logic is centralized in src/input.h, so it’s not handled ad-hoc in individual commands.
  • Ambiguous cases (schema via stdin, multiple stdin inputs) are explicitly rejected.
  • Line/column information is preserved, and stdin is handled as a special input (covered via tests).

@Vaibhav701161
Copy link
Author

Hi @jviotti , just a quick follow-up on this.
Whenever you get a chance, I’d appreciate a review to confirm whether this direction looks good or if there’s anything you’d like adjusted before moving forward. Thanks


struct InputJSON {
std::filesystem::path first;
std::filesystem::path resolution_base;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just a display name, maybe make a std::string? Otherwise <stdin> is not really a path

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this still needs to be addressed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split this test into multiple tests. If you see the other tests in the project, we have specific conventions and we always test a single thing per file. Plus you also want to add test for every other applicable command.

Finally, you didn't register this test in test/CMakeLists.txt, so it won't run at all

copy, sourcemeta::core::schema_walker, custom_resolver,
get_lint_callback(errors_array, entry, output_json), dialect,
sourcemeta::jsonschema::default_id(entry.first),
sourcemeta::core::URI::from_path(entry.resolution_base)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still definitely want to use sourcemeta::jsonschema::default_id here, to concentrate that logic in a single place. Maybe make such function take entry directly, so that inside we can decide to use entry.resolution_base or whatever?

custom_resolver,
get_lint_callback(errors_array, entry, output_json), dialect,
sourcemeta::jsonschema::default_id(entry.first),
sourcemeta::core::URI::from_path(entry.resolution_base)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

src/input.h Outdated
// TODO: Print a verbose message for what is getting parsed
auto parsed{read_file(canonical)};
result.push_back({std::move(canonical), std::move(parsed.document),
auto canonical_copy{canonical};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

}
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this deletion was intended?

src/input.h Outdated
}
}
if (stdin_count > 1) {
throw std::runtime_error{"Standard input (-) can only be specified once"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw std::runtime_error{"Standard input (-) can only be specified once"};
throw std::runtime_error{"The standard input position argument `-` can only be specified once"};

src/input.h Outdated
std::size_t stdin_count{0};
for (const auto &arg : arguments) {
if (arg == "-") {
stdin_count++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think counting all occurrences makes sense if you only want up to two. Maybe check that if you already got at least 2, then its an error and you can break?

set +o errexit
cat "$TMP/schema.json" > "$TMP/schema_stdin.json"

OUTPUT=$("$1" validate - "$TMP/schema.json" < "$TMP/schema_stdin.json" 2>&1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the other shell scripts for the conventions on how we test stuff. Some obvious ones:

  • We never capture into a variable. We pipe into a file and then diff it
  • We never enable set +o errexit
  • We never test more than one thing at once

Also your test is not registered in test/CMakeLists.txt so it won't run at all. But overall, split it so that there are multiple tests testing each one thing

Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 12 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/command_fmt.cc">

<violation number="1" location="src/command_fmt.cc:25">
P2: Use a structured error type from `error.h` instead of `std::runtime_error` for stdin not supported. This enables proper machine-readable JSON output when `--json` is passed, following the pattern of other error types like `YAMLInputError`.

(Based on your team's feedback about avoiding dynamically constructed exception messages and reusing centralized error wrapper types.) [FEEDBACK_USED]</violation>
</file>

<file name="src/input.h">

<violation number="1" location="src/input.h:151">
P2: Missing YAML fallback for stdin input. The PR description states "JSON-first, YAML fallback" but `read_from_stdin()` only parses JSON. This is inconsistent with file handling which falls back to YAML if JSON parsing fails. Note: implementing YAML fallback would require first buffering stdin (since it can only be read once), then attempting JSON parse, then falling back to YAML on failure.</violation>
</file>

<file name="test/validate/fail_stdin_multiple.sh">

<violation number="1" location="test/validate/fail_stdin_multiple.sh:25">
P2: Add a JSON error assertion for this failure case so the structured error output is validated alongside the text output.

(Based on your team's feedback about adding JSON variants for failure test cases.) [FEEDBACK_USED]</violation>
</file>

<file name="test/validate/fail_stdin_schema.sh">

<violation number="1" location="test/validate/fail_stdin_schema.sh:22">
P2: Add a JSON error assertion for this fail_* test so structured error output remains covered.

(Based on your team's feedback about adding JSON variants for failure test cases.) [FEEDBACK_USED]</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

error: The standard input position argument `-` can only be specified once
EOF

diff "$TMP/stderr.txt" "$TMP/expected.txt"
Copy link

@cubic-dev-ai cubic-dev-ai bot Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Add a JSON error assertion for this failure case so the structured error output is validated alongside the text output.

(Based on your team's feedback about adding JSON variants for failure test cases.)

View Feedback

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At test/validate/fail_stdin_multiple.sh, line 25:

<comment>Add a JSON error assertion for this failure case so the structured error output is validated alongside the text output.

(Based on your team's feedback about adding JSON variants for failure test cases.) </comment>

<file context>
@@ -0,0 +1,25 @@
+error: The standard input position argument `-` can only be specified once
+EOF
+
+diff "$TMP/stderr.txt" "$TMP/expected.txt"
</file context>
Fix with Cubic

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vaibhav701161 This is a valid one. Needs fixing!

error: Reading the schema from standard input is not supported
EOF

diff "$TMP/stderr.txt" "$TMP/expected.txt"
Copy link

@cubic-dev-ai cubic-dev-ai bot Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Add a JSON error assertion for this fail_* test so structured error output remains covered.

(Based on your team's feedback about adding JSON variants for failure test cases.)

View Feedback

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At test/validate/fail_stdin_schema.sh, line 22:

<comment>Add a JSON error assertion for this fail_* test so structured error output remains covered.

(Based on your team's feedback about adding JSON variants for failure test cases.) </comment>

<file context>
@@ -0,0 +1,22 @@
+error: Reading the schema from standard input is not supported
+EOF
+
+diff "$TMP/stderr.txt" "$TMP/expected.txt"
</file context>
Fix with Cubic

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@Vaibhav701161 Vaibhav701161 requested a review from jviotti February 6, 2026 11:57
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a failing case too?


if (schema_path == "-") {
throw std::runtime_error{
"Reading the schema from standard input is not supported"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? Sounds like a valid use case, unless there is another reason why it wouldn't work that I'm not thinking about?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far we only have stdin tests for the validate command. But we should be testing stdin behaviour for every command that supports it. Otherwise it could be totally broken in other commands and we are not knowing about it

for (const auto &entry : for_each_json(options)) {
if (entry.from_stdin) {
throw std::runtime_error{
"This command does not support reading from standard input"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not?

@jviotti
Copy link
Member

jviotti commented Feb 12, 2026

I left some more comments. It's shaping up really well!

@jviotti
Copy link
Member

jviotti commented Feb 12, 2026

@Vaibhav701161 The machinery that allows for - detection is already very solid. Something I suggest is sending another PR based on this one that does all the refactoring while not adding tests and not actually surfacing the - logic yet. That's legit refactoring that already looks great and we can merge, and then we can rebase this one with the changes and just focus on the - new logic.

I guess that would be most of the changes you made except the tests, and just never setting entry.from_stdin to true and never checking for two or more - so nothing is actually affected?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants