Skip to content

Add batch denoising CLI and helpers#3

Open
davior wants to merge 2 commits intomainfrom
codex/implement-multi-pass-spectral-gating
Open

Add batch denoising CLI and helpers#3
davior wants to merge 2 commits intomainfrom
codex/implement-multi-pass-spectral-gating

Conversation

@davior
Copy link
Owner

@davior davior commented Nov 29, 2025

Summary

  • add a batch-processing helper and CLI for running the denoise chain over files or folders
  • document how to run the CLI and use the helper functions programmatically
  • export the new helpers through the audio package entrypoint

Testing

  • not run (librosa dependency not installed in environment)

Codex Task

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +31 to +33
for file_path in sorted(source.rglob(glob)):
if file_path.is_file():
results.append((file_path, file_path.relative_to(source).parent))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent overwriting when multiple input roots share filenames

_gather_audio_files strips the top-level source directory (file_path.relative_to(source).parent) and process_batch later writes outputs using only this relative_parent, so files in different input directories with the same relative path (e.g., dirA/clip.wav and dirB/clip.wav) both map to the same output path and the latter silently overwrites the former. This only surfaces when users pass multiple input roots, but it defeats the documented promise of preserving paths inside the output directory.

Useful? React with 👍 / 👎.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +4 to +5
import librosa
import numpy as np

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add librosa dependency to environment

The new denoising helpers import librosa, but requirements.txt was not updated (still only numpy/soundfile/scipy/pyloudnorm), so installing dependencies as documented and then running python -m audio.pipeline or importing audio.denoise will immediately fail with ModuleNotFoundError: No module named 'librosa'. The CLI described in the README therefore cannot run in the documented environment until librosa is added to the dependency list.

Useful? React with 👍 / 👎.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex review

Comment on lines +29 to +33
for source in sources:
if source.is_dir():
for file_path in sorted(source.rglob(glob)):
if file_path.is_file():
results.append((file_path, file_path.relative_to(source).parent))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid overwriting outputs from multiple input roots

When processing multiple input directories, _gather_audio_files strips the source root (relative_to(source).parent), and process_batch writes each file to output_dir/<relative_parent>/<stem>_denoised_<preset>.wav. If two different input roots contain the same relative file path (e.g., dirA/foo.wav and dirB/foo.wav), both outputs resolve to output_dir/foo_denoised_*.wav, so the later file silently overwrites the earlier one. The CLI claims to handle multiple directories, so outputs should be namespaced per source root or at least detect collisions to avoid data loss.

Useful? React with 👍 / 👎.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +4 to +5
import librosa
import numpy as np

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Declare librosa dependency used by CLI

The new denoising pipeline imports librosa but the project requirements were not updated to include it, so installing via the documented requirements.txt leaves the CLI and denoise helpers failing immediately with ModuleNotFoundError before any processing occurs. Please add librosa to the dependency list so users can run the new batch tooling.

Useful? React with 👍 / 👎.

Comment on lines +30 to +33
if source.is_dir():
for file_path in sorted(source.rglob(glob)):
if file_path.is_file():
results.append((file_path, file_path.relative_to(source).parent))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid overwriting outputs when multiple source dirs share names

When multiple input directories are provided, _gather_audio_files tracks only the path relative to each directory and process_batch reuses that relative path under a single output_dir. If two sources contain files with the same relative path (e.g., dir1/sub/a.wav and dir2/sub/a.wav), the second pass will overwrite the first output silently in the same location. Consider namespacing outputs by the source root or detecting collisions to avoid data loss in multi-directory runs.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant