Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 20 additions & 1 deletion src/gitingest/notebook_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from typing import Any


def process_notebook(file: Path) -> str:
def process_notebook(file: Path , parse_notebook_output: bool = True) -> str:
"""
Process a Jupyter notebook file and return an executable Python script as a string.

Expand Down Expand Up @@ -45,6 +45,7 @@ def process_notebook(file: Path) -> str:
notebook = worksheets[0]

result = []
cell_count=0

for cell in notebook["cells"]:
cell_type = cell.get("cell_type")
Expand All @@ -61,6 +62,24 @@ def process_notebook(file: Path) -> str:
if cell_type in ("markdown", "raw"):
str_ = f'"""\n{str_}\n"""'

# Extract Output from cell
if parse_notebook_output and (("outputs" in cell) and (cell["outputs"] != [])):
sample_output=""
for output in cell["outputs"]:
if output["output_type"] == "stream" and output["text"] != []:
sample_output += "".join(output["text"]) + "\n"
elif (output["output_type"] in ["execute_result","display_data"]) and ("data" in output) and ("text/plain" in output["data"]):
sample_output += "".join(output["data"]["text/plain"]) + "\n"
elif (output["output_type"]=="error" and ("evalue" in output) ):
sample_output += f"{output.get("ename","Error")} : " + "".join(output["evalue"]) + "\n"
str_ += f'\n# Output:\n"""{sample_output}"""\n'

# Add Cell Info
cell_count+=1
str_ = f"# Cell {cell_count} ; Type : ({cell_type})\n" + str_
Comment on lines +65 to +79
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already implemented this quite neatly – will make the PR in a moment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already implemented this quite neatly – will make the PR in a moment.

No problem , I will close this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, should have been clearer.




result.append(str_)

return "\n\n".join(result)
6 changes: 4 additions & 2 deletions src/gitingest/query_ingestion.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ def _is_text_file(file_path: Path) -> bool:
return False


def _read_file_content(file_path: Path) -> str:
def _read_file_content(file_path: Path , parse_notebook_output: bool = True) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the point of passing on the parse_notebook_output argument to the _read_file_content function if it is not ever used in the codebase. Instead, wait with this until we implement support for not including the notebook output.

"""
Read the content of a file.

Expand All @@ -152,6 +152,8 @@ def _read_file_content(file_path: Path) -> str:
----------
file_path : Path
The path to the file to read.
parse_output_notebook : bool
Whether to parse the output of the notebook-cells.

Returns
-------
Expand All @@ -160,7 +162,7 @@ def _read_file_content(file_path: Path) -> str:
"""
try:
if file_path.suffix == ".ipynb":
return process_notebook(file_path)
return process_notebook(file_path, parse_notebook_output)

with open(file_path, encoding="utf-8", errors="ignore") as f:
return f.read()
Expand Down
1 change: 1 addition & 0 deletions src/gitingest/query_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
"bitbucket.org",
"gitea.com",
"codeberg.org",
"gitingest.com"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the need for this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the need for this?

#126

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Don't see why people would do this, but makes sense to cover it. Could you make a separate PR for adding gitingest.com, as well as adding it and the missing gitea.com and codeberg.org to the test cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup I noticed that some people did that so why not :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IsNoobgrammer Will you make a separate PR for adding gitingest.com, as well as adding it and the missing gitea.com and codeberg.org to the test cases, or should I go ahead and do it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IsNoobgrammer I went ahead and created a PR: #134.

]


Expand Down
5 changes: 5 additions & 0 deletions tests/query_parser/test_query_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ async def test_parse_url_valid_https() -> None:
"https://github.com/user/repo",
"https://gitlab.com/user/repo",
"https://bitbucket.org/user/repo",
"https://gitea.com/user/repo",
"https://codeberg.com/user/repo",
Comment on lines +20 to +21
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good! Please make a separate PR for this.

"https://gitingest.com/user/repo",
]
for url in test_cases:
result = await _parse_repo_source(url)
Expand All @@ -34,6 +37,8 @@ async def test_parse_url_valid_http() -> None:
"http://github.com/user/repo",
"http://gitlab.com/user/repo",
"http://bitbucket.org/user/repo",
"https://gitingest.com/user/repo",
"http://gitea.com/user/repo",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good! Please make a separate PR for this (as seen in my other comment). You can include codeberg.com here as well.

]
for url in test_cases:
result = await _parse_repo_source(url)
Expand Down