Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
0934a91
Initial commit
hpdekoning Mar 22, 2025
d7bbfa0
Added HTML sources
hpdekoning Mar 22, 2025
440e4da
Added early ...ebnf.txt results
hpdekoning Mar 22, 2025
18160c4
Added initial version ebnf_extractor.py
hpdekoning Mar 22, 2025
e5da22d
Added initial results
hpdekoning Mar 22, 2025
ef257bd
feat: Added initial support for GBNF images via HTML <img> elements
hpdekoning Jun 8, 2025
9896b47
fix: Renamed project name and added short project description in READ…
hpdekoning Jun 8, 2025
3f1a5e3
chore: Renamed main script to bnf_grammar_extractor.py and deleted un…
hpdekoning Jun 8, 2025
e54fc4f
chore: Rearranged folder structure to prepare for integration into Sy…
hpdekoning Jun 21, 2025
80d87de
chore: Removed obsolete test files
hpdekoning Jul 27, 2025
55cb17f
doc: Updated README documentation and moved to .adoc format
hpdekoning Jul 27, 2025
5cfdef0
chore: Removed obsolete files
hpdekoning Jul 27, 2025
a4e6e90
feat: Added Lark grammar definitions for KerML and SysML textual and …
hpdekoning Jul 27, 2025
cc1a500
feat: Added HTML exports of KerML and SysML r2025-04 specifications (…
hpdekoning Jul 27, 2025
60a170e
feat: Created CSS styles for HTML renderings of textual and graphical…
hpdekoning Jul 27, 2025
1c93e41
fix: Intermediate update of bnf_grammar_extractor.py that produces Ke…
hpdekoning Jul 27, 2025
d30dcf1
chore: Added local .gitignore for build of Python executable
hpdekoning Sep 21, 2025
6939f1f
fix: Rearranged KerML and SysML HTML sources
hpdekoning Sep 21, 2025
3ea3825
feat: Added CSS styles for exported BNF files in HTML format
hpdekoning Sep 21, 2025
8e7e707
chore: Copied SVG files for Graphical BNF from GSWG repo into images …
hpdekoning Sep 21, 2025
7b76279
chore: Cleanup
hpdekoning Sep 21, 2025
defd9f4
fix: Updated textual and graphical BNF grammars
hpdekoning Sep 21, 2025
9639c1d
fix: Intermediate improvements of bnf_grammar_extractor.py
hpdekoning Sep 21, 2025
4ad3f88
feat: Intermediate set of textual and graphical BNF exports
hpdekoning Sep 21, 2025
9989bdc
chore: Removed input and output grammar files
hpdekoning Nov 19, 2025
83c8807
fix: Refactoring of bnf_grammar_extractor and documentation
hpdekoning Nov 19, 2025
618b88c
chore: Added basics to build self-standing executable
hpdekoning Nov 19, 2025
b9c88be
feat: Added basic textual BNF grammar parser tool for file-based gram…
hpdekoning Nov 19, 2025
adf5eb4
chore: Added tests folder with spec sources and generated grammars
hpdekoning Nov 20, 2025
b82b103
chore: Added images folder with SVG files for all SysML graphical BNF…
hpdekoning Nov 20, 2025
39b646f
chore: Updated .gitignore to exclude generated Python files
hpdekoning Nov 20, 2025
3cd731c
feat: Refactoring naming lark grammars and plain text BNF kinds to ke…
hpdekoning Nov 23, 2025
9f83cee
fix: Documentation of lark grammar definitions
hpdekoning Nov 26, 2025
41b1a89
fix: Cleanup and refactorings of extractor code
hpdekoning Nov 26, 2025
e801d1d
fix: Update of kerml_sysml_bnf_parser for .kebnf and .kgbnf file exte…
hpdekoning Nov 26, 2025
c5ec2dd
chore: New intermediate set of output files in tests/
hpdekoning Nov 26, 2025
a379fc3
feat: Prepared bnf_grammar_extractor for processing marked_up text fi…
hpdekoning Nov 28, 2025
7fad94a
fix: Updated corrected .kebng and .kgbnf files for marked_up text fil…
hpdekoning Nov 28, 2025
1a88e2d
fix: Corrected check for expected subtag
hpdekoning Nov 28, 2025
f448fd0
fix: Updated the notes nested lists transformation for HTML and TXT, …
hpdekoning Dec 2, 2025
3ff1ded
chore: Updated generated and corrected grammars in tests/
hpdekoning Dec 2, 2025
c6a801b
doc: Complete update of the README documentation
hpdekoning Dec 4, 2025
9178ff0
fix: Refactoring and major upgrade of the bnf_grammar_processor and b…
hpdekoning Dec 4, 2025
2c783d8
fix: Corrected graphical BNF file succession-compartment.svg
hpdekoning Dec 4, 2025
8744d58
chore: Added backup log files and removed leftover built files
hpdekoning Dec 4, 2025
303d43c
chore: Moved lark grammar definition filed to package bnf_grammar
hpdekoning Dec 4, 2025
19f700a
feat: Added first unittest, for test_render_nested_lists
hpdekoning Dec 4, 2025
842675a
fix: Updated complete set of input and output grammar files
hpdekoning Dec 4, 2025
773f73e
fix: Updated log files
hpdekoning Dec 4, 2025
245dc39
feat: Added set of cloned log files with diagnostics on corrected gra…
hpdekoning Dec 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,5 @@ dependency-reduced-pom.xml
# MacOS Finder
.DS_Store

# Generated Python files
*.pyc
3 changes: 3 additions & 0 deletions tool-support/bnf_grammar_tools/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
build/
dist/
scratch/
190 changes: 190 additions & 0 deletions tool-support/bnf_grammar_tools/README.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
= bnf_grammar_tools

Tools that process KerML and SysML2 concrete language grammars from their respective specifications, check them for correctness and generates two kinds of grammar listings: (1) machine-readable plain text BNF files and (2) human-readable hyperlinked BNF files in HTML format.

== Usage of the Tools

=== Obtain a Complete KerML or SysML Specification in HTML Source Format

Instructions on how to export a selected version of the specification from View Editor to a self-standing HTML file.

> Note: These instructions are tested with View Editor version 4 with the FireFox browser v130.0.1 on Windows 11. In other browsers the menu or key to inspect the source code may be different.

1. Open the selected spec in View Editor.
2. Load the full document by clicking the "Full Document" icon (image:VE-full-document-icon.png[]) at the top of the left panel. This may take several minutes.
3. Click the print icon, image:VE-print-icon.png[] left next to the EXPORT icon at the top of the document panel.
4. Wait for a new browser tab to appear with the complete HTML document and a popup print dialog. Again, be patient, this may take several minutes.
5. Cancel the popup print dialog.
6. In the complete HTML document tab, open a Developer / Inspector panel via menu *More tools > Web Developer Tools* or by hitting `Ctrl+Shift+I`. (Note: a direct *Save page as ...* or `Ctlr+S` does not work, as it saves a script.)
7. In the Inspector tab of the Developer panel, right-click the top level `<html ...>` element, and select *Copy > Outer HTML* from the context menu.
8. Open a new, appropriately named `.html` file in a text editor, paste the contents and save.
9. Close the complete HTML document tab.

=== Install the Python Environment

Ensure that Python version 3.8 or higher is installed on your machine. The most convenient way is to use the https://www.jetbrains.com/pycharm/[PyCharm] tool. Create a dedicated `conda` or `venv` development environment and activate it. The best tool to install a Python base environment is https://github.com/conda-forge/miniforge[miniforge].

After installation check the active Python version, e.g.:

[source,shell]
----
$ python --version
Python 3.12.12
----

Also ensure that the latest version of the following packages are installed. You can use `pip` or `conda` or `mamba`.

* Package https://pypi.org/project/beautifulsoup4/[beautifulsoup4] is used to parse the HTML input file.
* Package https://pypi.org/project/lark/[lark] is used to parse and verify the extracted BNF source.
* Package https://pypi.org/project/pytest/[pytest] is used to run unit tests.

for example with the following commands:

[source,shell]
----
$ pip install beautifulsoup4
$ pip install lark
$ pip install pytest
----

=== Run the bnf_grammar_processor

The usage info for the `bnf_grammar_processor` is as below, as usual obtained with the `-h` or `--help` option.
Go to the `tool-support/bnf_grammar_tools` directory and run `python .\bnf_grammar\bnf_grammar_processor.py -h`.

[source,shell]
----
usage: python bnf_grammar_processor [-h] [-i [INPUT_DIR]] [-o [OUTPUT_DIR]] SOURCE_DATA

Extract or parse textual and/or graphical grammars from given KerML or SysML specifications and generate plain text and html BNF grammar files.

positional arguments:
SOURCE_DATA JSON file defining source data - see examples under
the tests directory

options:
-h, --help show this help message and exit
-i [INPUT_DIR], --input-dir [INPUT_DIR]
input directory path
-o [OUTPUT_DIR], --output-dir [OUTPUT_DIR]
output directory path

The processor supports to two main capabilities:
1) Extract the KerML or SysML grammar(s)
from provided raw .html file(s) exported from KerML and SysML specifications in the View Editor tool,
and then validate the extracted grammars, report possible errors, and generate these outputs:
- .json dumps of the processed intermediate data model(s),
- .kebnf and/or .kgbnf plain text files,
- -marked_up.kebnf and/or -marked_up.kgbnf marked up text files, that can be used as a basis for corrected grammars,
- .html files with hyperlinked, human-readable versions of the grammars.
2) Validate corrected -marked_up.kebnf or -marked_up.kgbnf input files
and generate the same files as under 1), but now for the corrected grammars.

Option 2) is selected when the input filename(s) end on '-marked_up.kebnf' or '-marked_up.kgbnf', otherwise option 1).
Both options will produce a log file named 'bnf_grammar_processor.log' in the working directory.
In SOURCE_DATA the input files should be given in reverse dependendy order, i.e., first KerML textual, then SysML textual, then SysML graphical notation.
Via diff'ing of the extracted and corrected .kebnf and/or .kgbnf files a list of corrections to be fed into the OMG issue trackers can be compiled.
----

Note. The file extensions `.kebnf` and `.kgbnf` are inspired by the `.kpar` extension for the KerML archive files.

The BNF grammars are defined in the format of the https://github.com/lark-parser/lark[Lark] parsing toolkit for Python. The definitions are in:

* `bnf_grammar/kebnf_textual_grammar.lark`, and,
* `bnf_grammar/kgbnf_graphical_grammar.lark`.

Inside the `bnf_grammar_processer` Lark is used to check each production individually. Some additional heuristic validation is also performed to permit processing of incorrect grammar or note fragments. All diagnostics are reported in the `bnf_grammar_processor.log` file. For the graphical grammar this includes a mapping table between the existing (PNG) images in the specs from the View Editor source and the new SVG images in `images` subdirectory.

.Example command line arguments
For the time being, example input and output directories with `SOURCE_DATA` files can be found under the `tests` folder.

For option 1):

- `INPUT_DIR` = `tests/KerML_and_SysML_spec_sources`
- `OUTPUT-DIR` = `tests/KerML_and_SysML_grammars`
- `SOURCE_DATA` = `source_specs.json` (in `INPUT_DIR`)

The style information for the generated HTML outputs resides in `tests/KerML_and_SysML_grammars/bnf_styles.css`.

For option 2):

- `INPUT_DIR` = `tests/KerML_and_SysML_grammars`
- `OUTPUT-DIR` = `tests/KerML_and_SysML_grammars`
- `SOURCE_DATA` = `source_marked_ups.json` (in `INPUT_DIR`)

.Example generated outputs for option 1) (extract)
The bnf_grammar_processor with produces the following outputs (see directory `tests/KerML_and_SysML_grammars`):

[cols="2,3"]
|===
| `KerML-textual-bnf-elements.json` | dump of the processed intermediate data model(s)
| `KerML-textual-bnf.kebnf` | generated plain text KerML textual grammar file
| `KerML-textual-bnf-marked_up.kebnf` | generated editable marked up KerML textual grammar file
| `KerML-textual-bnf.html` | generated browsable, hyperlinked HTML KerML textual grammar file
| `SysML-textual-bnf-elements.json` | dump of the processed intermediate data model(s)
| `SysML-textual-bnf.kebnf` | generated plain text SysML textual grammar file
| `SysML-textual-bnf-marked_up.kebnf` | generated editable marked up SysML textual grammar file
| `SysML-textual-bnf.html` | generated browsable, hyperlinked HTML SysML textual grammar file
| `SysML-graphical-bnf-elements.json` | dump of the processed intermediate data model(s)
| `SysML-graphical-bnf.kgbnf` | generated plain text SysML graphical grammar file
| `SysML-graphical-bnf-marked_up.kgbnf` | generated editable marked up SysML textual grammar file
| `SysML-graphical-bnf.html` | generated browsable, hyperlinked HTML SysML textual grammar file (See Note)
|===

Note. The SVG images for the graphical BNF productions reside in `tests/KerML_and_SysML_grammars/images`. They are copied from the source in https://github.com/Systems-Modeling/Graphical-Specification-WG/tree/main/src/Graphical-BNF/_svg[Graphical-Specification-WG github repo].

Each run of the `bnf_grammar_processor` produces a log on the console and in file `bnf_grammar_processor.log`. The log of the previous run is saved in `bnf_grammar_processor.log.backup`, which can be used to detect differences between runs.

=== Correct the Extracted Grammar Files and Reprocess with the bnf_grammar_processor

If there are errors in the grammar files, the following workflow can be used to apply bulk corrections.

. Copy `KerML-textual-bnf-marked_up.kebnf` to `KerML-textual-bnf-corrected-marked_up.kebnf`
. Copy `SysML-textual-bnf-marked_up.kebnf` to `SysML-textual-bnf-corrected-marked_up.kebnf`
. Copy `SysML-graphical-bnf-marked_up.kgbnf` to `SysML-graphical-bnf-corrected-marked_up.kgbnf`
. Check the errors in the log files, and modify the `...-corrected-marked_up.k*bnf` files in a text editor to correct the errors.
. After every couple of corrections, run the `bnf_grammar_processor` with the option 2) arguments. This will validate the corrected `.kebnf` and `.kgbnf` and generate the set of files described in the table below, similar to option 1)
. Iterate the steps 4 and 5, until satisfied.
. By making a diff between pairs of original (`...-bnf-marked_up.k*bnf`) and corrected (`...-bnf-corrected-marked_up.k*bnf`) files the required changes to be raised in OMG issues can be systematically compiled.

.Example generated files for option 2)
[cols="2,3"]
|===
| `KerML-textual-bnf-corrected-elements.json` | dump of the corrected intermediate data model(s)
| `KerML-textual-bnf-corrected.kebnf` | generated corrected plain text KerML textual grammar file
| `KerML-textual-bnf-corrected.html` | generated corrected browsable, hyperlinked HTML KerML textual grammar file
| `SysML-textual-bnf-corrected-elements.json` | dump of the processed intermediate corrected data model(s)
| `SysML-textual-bnf-corrected.kebnf` | generated corrected plain text SysML textual grammar file
| `SysML-textual-bnf-corrected.html` | generated corrected browsable, hyperlinked HTML SysML textual grammar file
| `SysML-graphical-bnf-corrected-elements.json` | dump of the corrected intermediate data model(s)
| `SysML-graphical-bnf-corrected.kgbnf` | generated corrected plain text SysML graphical grammar file
| `SysML-graphical-bnf-corrected.html` | generated corrected browsable, hyperlinked HTML SysML textual grammar file
|===

=== Use the bnf_file_parser for Final Checks

As a final check the `bnf_file_parser` can be used to validate complete, corrected BNF grammar files.

The usage info for the `bnf_file_parser` is as below, as usual obtained with the `-h` or `--help` option.
Go to the `tool-support/bnf_grammar_tools` directory and run `python .\bnf_grammar\bnf_file_parser.py -h`.

[source,shell]
----
usage: bnf_file_parser [-h] BNF_PATH

Parse KerML or SysML grammar files in textual or graphical BNF format.

positional arguments:
BNF_PATH Path to plain text BNF file with extension .kebnf or .kgbnf

options:
-h, --help show this help message and exit
----

Run `bnf_file_parser` on the following files:

* `KerML-textual-bnf-corrected.kebnf`
* `SysML-textual-bnf-corrected.kebnf`
* `SysML-graphical-bnf-corrected.kgbnf`

The console and the log file `bnf_file_parser.log` will list any errors still present. Otherwise, if the parse is completely successful, a dump of the resulting abstract syntax tree (in Lark's pretty print format) will be listed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tool-support/bnf_grammar_tools/VE-print-icon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
Empty file.
Empty file.
Empty file.
110 changes: 110 additions & 0 deletions tool-support/bnf_grammar_tools/bnf_grammar/bnf_file_parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
#!python

"""
bnf_file_parser is a command line tool that parses a KerML or SysML plain text grammar file.

The supported file formats are:
- .kebnf for a KerML or SysML textual notation grammar
- .kgbnf for a SysML graphical notation grammar

Its usage is described below in the main() function.

@author: Hans Peter de Koning (DEKonsult)

Requirements:

This tool requires installation of the following packages:
- lark (See https://pypi.org/project/lark)

"""

import sys
import os
import shutil
import argparse
from datetime import datetime, timezone
from typing import Optional
from lark import Lark, UnexpectedInput

# Create logger for debug, info, warning, error, critical messages
import logging
LOGGER = logging.getLogger()


class BnfParser:
def __init__(self) -> None:
self.start_timestamp: Optional[datetime] = None
self.bnf_filepath: Optional[str] = None
self.parser: Optional[Lark] = None

def parse(self, bnf_filepath: str) -> None:
self.start_timestamp = datetime.now(timezone.utc).isoformat(timespec="seconds").replace("+00:00", "Z")
self.bnf_filepath = bnf_filepath

LOGGER.info(f"Started parsing {self.bnf_filepath} at {self.start_timestamp}")

basename, ext = os.path.splitext(bnf_filepath)
grammar_file = None
if ext == ".kebnf":
grammar_file = "kebnf_textual_grammar.lark"
elif ext == ".kgbnf":
grammar_file = "kgbnf_graphical_grammar.lark"
else:
LOGGER.critical(f"Unrecognized file extension for BNF_PATH {bnf_filepath}, terminating ...")
sys.exit(1)

self.parser = Lark.open(grammar_file, rel_to=__file__, parser="lalr")

bnf_file = open(bnf_filepath, "r", encoding="utf-8")
bnf_input = bnf_file.read()
bnf_file.close()

try:
parse_tree = self.parser.parse(bnf_input)
except UnexpectedInput as e:
LOGGER.error(f"Parse error in {self.bnf_filepath}:\n{e}")
else:
LOGGER.info(f"Parse completed successfully")
LOGGER.info(f"The resulting (AST) parse tree is:\n\n{parse_tree.pretty()}")


def main() -> None:
# Initialize logging
LOGGER.setLevel(logging.DEBUG)
formatter = logging.Formatter("%(levelname)-8s: %(message)s")

console_handler = logging.StreamHandler()
console_handler.set_name("console")
console_handler.setLevel(logging.INFO)
console_handler.setFormatter(formatter)
LOGGER.addHandler(console_handler)

log_file_name = "bnf_file_parser.log"
if os.path.exists(log_file_name):
# Create backup copy of the log-file to inspect differences between runs
shutil.copy2(log_file_name, log_file_name + ".backup")

file_handler = logging.FileHandler(log_file_name, mode="w", encoding="utf-8")
file_handler.set_name("logfile")
file_handler.setLevel(logging.INFO)
file_handler.setFormatter(formatter)
LOGGER.addHandler(file_handler)

LOGGER.debug(f"bnf_grammar_parser started in {os.getcwd()}")

# Parse command line
parser = argparse.ArgumentParser(
prog="bnf_file_parser",
allow_abbrev=False,
description="Parse KerML or SysML grammar files in textual or graphical BNF format.")
parser.add_argument("bnf_path", metavar="BNF_PATH", type=str, help="Path to plain text BNF file with extension .kebnf or .kgbnf")
args = parser.parse_args()
LOGGER.debug(f"args={args}")

# Run the parser
bnf_parser = BnfParser()
bnf_parser.parse(args.bnf_path)


if __name__ == "__main__":
main()
Loading