Skip to content

Conversation

@filipchristiansen
Copy link
Contributor

This PR introduces an optional parameter (include_output) to the process_notebook function, enabling control over whether code cell outputs are included in the generated Python script.

Key Changes

  1. InvalidNotebookError: Added a new custom exception to better handle invalid or non-decodable notebook files.
  2. process_notebook enhancements:
    • Accepts a new boolean parameter (include_output) defaulting to True.
    • Extracts cell outputs and appends them as commented blocks (# Output: ...) in the generated script if include_output=True.
    • Refactored internal logic into helper functions (_process_cell, _extract_output) for better readability.
  3. Exception handling:
    • InvalidNotebookError is now caught alongside OSError when reading notebook files in query_ingestion.py.
  4. Unit tests:
    • Added coverage for the new behavior in test_notebook_utils.py (including tests for both enabling and disabling outputs).

Why This Change?

Previously, the script generation process did not account for cell outputs, which can be important for understanding context in code cells or for debugging. By making this feature optional, users can choose to omit outputs in large or potentially sensitive notebooks.

Example

A single-cell notebook with:

import matplotlib.pyplot as plt
print('my_data')
my_data = [1, 2, 3, 4, 5]
plt.plot(my_data)
my_data

will produce:

# Jupyter notebook converted to Python script.

import matplotlib.pyplot as plt
print('my_data')
my_data = [1, 2, 3, 4, 5]
plt.plot(my_data)
my_data
# Output:
#   my_data

#   [1, 2, 3, 4, 5]
#   <Figure size 640x480 with 1 Axes>

when include_output is True (default).

@filipchristiansen
Copy link
Contributor Author

This addresses #119

Copy link
Member

@cyclotruc cyclotruc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this and seems very good thank you, the notebooks are now spot on!

@cyclotruc cyclotruc merged commit 6039114 into main Jan 13, 2025
8 checks passed
@cyclotruc cyclotruc deleted the feat/optional-notebook-outputs branch January 13, 2025 20:51
@IsNoobgrammer
Copy link

IsNoobgrammer commented Jan 14, 2025

Thanks @filipchristiansen !!
Great Work

This PR was merged way faster lol , I would recommend that we try add cell metadata
We discussed it earlier #119 #127

# Cell:  cellNumber ; Type: cellType

cellData

# Output
"""
Output also inside doc string 
""" 

Although this is great work too !!

According to the diff , we haven't added functionality to query-parsing for include output , we can create a issue for that and include this feature request in that issue

@filipchristiansen
Copy link
Contributor Author

Thanks @filipchristiansen !! Great Work

This PR was merged way faster lol , I would recommend that we try add cell metadata We discussed it earlier #119 #127

# Cell:  cellNumber ; Type: cellType

cellData

# Output
"""
Output also inside doc string 
""" 

Although this is great work too !!

According to the diff , we haven't added functionality to query-parsing for include output , we can create a issue for that and include this feature request in that issue

@cyclotruc

Could you please specify exactly what metadata you want included and why?

Example

  • What is the motivation for including cell numbers?
    • The cell numbers will correspond to the correct cell numbers if and only if empty cells are included.
      • Do we really want to include empty cells?
  • What is the motivation for including the cell_type?
    • If we transform markdown cells to multi-line block comments, do we really need to specify which cells are code and which are markdown?

Is there any other metadata you want included?

@IsNoobgrammer
Copy link

Thanks @filipchristiansen !! Great Work
This PR was merged way faster lol , I would recommend that we try add cell metadata We discussed it earlier #119 #127

# Cell:  cellNumber ; Type: cellType

cellData

# Output
"""
Output also inside doc string 
""" 

Although this is great work too !!
According to the diff , we haven't added functionality to query-parsing for include output , we can create a issue for that and include this feature request in that issue

@cyclotruc

Could you please specify exactly what metadata you want included and why?

Example

  • What is the motivation for including cell numbers?

    • The cell numbers will correspond to the correct cell numbers if and only if empty cells are included.

      • Do we really want to include empty cells?
  • What is the motivation for including the cell_type?

    • If we transform markdown cells to multi-line block comments, do we really need to specify which cells are code and which are markdown?

Is there any other metadata you want included?

I don't intent on sounding rude , but suppose we have a notebook

cell 1 we have some markdown

**Welcome to Gitingest**

consider empty or non parse-able cells here

cell 4

"""
This Doc-string is just for showcase
"""

Class Hi:
  def init(self):
       print("hi")

Then its py-script according to current merge would be

"""
**Welcome to Gitingest**
"""


"""
This Doc-string is just for showcase
"""

Class Hi:
  def init(self):
       print("hi")

# Output
# hi

what I suggest is

# Cell 1 ; Type: Markdown

"""
**Welcome to Gitingest**
"""

# Cell 2 ; Type: Code

"""
This Doc-string is just for showcase
"""

Class Hi:
  def init(self):
       print("hi")

# Output
"""
hi    # output should also be enclosed in docstring , It looks more neat and is genrally used for multi-line comment
"""      

Then I would leave it up to you and @cyclotruc to decide which is more suited and would further increase UX of GitIngest

As for regard to include other metadata Not in my opinion

Also as for Cell number , Increment only when you parse it , if its empty or un parse-able don't increment cellNumber

@filipchristiansen
Copy link
Contributor Author

@IsNoobgrammer

I meant to tag you and not @cyclotruc in my previous comment.

I don't intent on sounding rude

What do you mean? Why would it be rude to suggest a feature? :)

As for regard to include other metadata Not in my opinion

I agree.

According to the diff, we haven't added functionality to query-parsing for include output, we can create a issue for that and include this feature request in that issue.

I still do not understand what you mean by "query-parsing for include output"? The whole purpose of this PR was to include output, and it has now been merged. Please explain what you mean.

You still did not mention the motivation for including cell numbers and cell types. I am not saying we should not include it—I am just wondering whether it is motivated.

FOLKS-Tech pushed a commit to FOLKS-Tech/gitingest that referenced this pull request Sep 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants