-
Notifications
You must be signed in to change notification settings - Fork 1k
feat: add support for improved handling of jupyter notebooks #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for improved handling of jupyter notebooks #105
Conversation
cyclotruc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I tested this with:
https://raw.githubusercontent.com/cyclotruc/test/refs/heads/main/Exploration%20of%20Airline%20On-Time%20Performance.ipynb and got:
File "/workspaces/gitingest/src/gitingest/notebook_utils.py", line 30, in process_notebook
for cell in notebook["cells"]:
~~~~~~~~^^^^^^^^^
KeyError: 'cells'
|
The use of
|
|
@cyclotruc Tests added for the |
|
@filipchristiansen liked your PR , can you create a new PR such that ,cell number, cell type are commented above the source and if cells[output][-1]['text'] are commented below , also we can make such that that results always init_s with "### Jupyter-Notebook" If you are busy , do mention , I would create a PR in that case |
What do you mean by For the second point, may I ask your use case for this? You would still identify that it is a notebook based on the |
What do you (@cyclotruc) say about the suggestion to start each notebook with |
1st part else-if it ran and has output then "outputs":[{"name":"stderr","output_type":"stream","text":"/usr/local/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n\n from .autonotebook import tqdm as notebook_tqdm\n"}]for nth-cell in cell: outputs=nth-cell.get("outputs","") ; if outputs: output = outputs[-1]["text"] ; del outputs # [-1] always gets last ran output2nd part |
|

This PR introduces the
process_notebookfunction to process.ipynbfiles and return them as Python scripts, converting markdown and raw cells into multi-line string literals. It also refactors the function nameingest_from_querytorun_ingest_queryiningest_from_query.pyto avoid naming conflicts with the module, ensuring clearer code organization.Changes include:
process_notebookfunction to handle Jupyter notebooks.ingest_from_queryfunction torun_ingest_queryto avoid naming conflicts with the module._read_file_contentto invokeprocess_notebookfor.ipynbfiles.test_notebook_utils.pyfor the notebook processing logic.test_ingest.pyto verify that.ipynbfiles triggerprocess_notebook.These changes integrate Jupyter notebook processing into the file ingestion workflow, while also improving code clarity and test coverage.