Skip to content

Conversation

@IsNoobgrammer
Copy link

@IsNoobgrammer IsNoobgrammer commented Jan 13, 2025

This PR works on issue 126 and issue 119

Where earlier URL consisting gitingest.com/user/repo were not parsed

  • Added gitingest.com to known domain
  • Updated test-cases

Earlier Jupyter Notebook's only source was parsed

  • Added output for various categories to make it pack more context
  • Added metadate before every cell to make it more LLM-friendly

Example

# Cell 0 ; Type : Markdown
"""
md-text
"""

...

# Cell 5 ; Type : code

print("hello_world")
print(XoR)

# Output
"""hello_world
NameError: XoR is not defined
"""

@cyclotruc
@filipchristiansen (needs to update test-cases for notebook)

@filipchristiansen
Copy link
Contributor

@IsNoobgrammer Why do you want to make gitingest.com/user/repo parsable?



def _read_file_content(file_path: Path) -> str:
def _read_file_content(file_path: Path , parse_notebook_output: bool = True) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the point of passing on the parse_notebook_output argument to the _read_file_content function if it is not ever used in the codebase. Instead, wait with this until we implement support for not including the notebook output.

"bitbucket.org",
"gitea.com",
"codeberg.org",
"gitingest.com"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the need for this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the need for this?

#126

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Don't see why people would do this, but makes sense to cover it. Could you make a separate PR for adding gitingest.com, as well as adding it and the missing gitea.com and codeberg.org to the test cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup I noticed that some people did that so why not :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IsNoobgrammer Will you make a separate PR for adding gitingest.com, as well as adding it and the missing gitea.com and codeberg.org to the test cases, or should I go ahead and do it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IsNoobgrammer I went ahead and created a PR: #134.

Comment on lines +65 to +79
# Extract Output from cell
if parse_notebook_output and (("outputs" in cell) and (cell["outputs"] != [])):
sample_output=""
for output in cell["outputs"]:
if output["output_type"] == "stream" and output["text"] != []:
sample_output += "".join(output["text"]) + "\n"
elif (output["output_type"] in ["execute_result","display_data"]) and ("data" in output) and ("text/plain" in output["data"]):
sample_output += "".join(output["data"]["text/plain"]) + "\n"
elif (output["output_type"]=="error" and ("evalue" in output) ):
sample_output += f"{output.get("ename","Error")} : " + "".join(output["evalue"]) + "\n"
str_ += f'\n# Output:\n"""{sample_output}"""\n'

# Add Cell Info
cell_count+=1
str_ = f"# Cell {cell_count} ; Type : ({cell_type})\n" + str_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already implemented this quite neatly – will make the PR in a moment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already implemented this quite neatly – will make the PR in a moment.

No problem , I will close this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, should have been clearer.

Comment on lines +20 to +21
"https://gitea.com/user/repo",
"https://codeberg.com/user/repo",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good! Please make a separate PR for this.

"http://gitlab.com/user/repo",
"http://bitbucket.org/user/repo",
"https://gitingest.com/user/repo",
"http://gitea.com/user/repo",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good! Please make a separate PR for this (as seen in my other comment). You can include codeberg.com here as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants