A collection of simple, self-contained tools in two flavors:
- Scripts โ Python CLI tools runnable with
uv run https://tools.ricardodecal.com/python/foo.py, which spawns a self-contained and ephemeraluvenvironment. - Pages โ Single-file HTML tools that run entirely in your browser at
https://tools.ricardodecal.com/html/foo.html.
This is an experiment in low-stakes vibe coding. The code lives in crypdick/tools.
Inspired by Simon Willison's tools collection.
๐ Data Processing (7 scripts, 2 pages)
convert_arrow_to_parquet_streaming.py python
Output of uv run https://tools.ricardodecal.com/python/convert_arrow_to_parquet_streaming.py --help:
Usage: convert_arrow_to_parquet_streaming.py [OPTIONS]
Convert Arrow shards to Parquet.
- Discovers all .arrow files under a given source directory
- Converts each file to Parquet
- Uses streaming in order to keep memory bounded and convert files larger than
available RAM
- Handles both Arrow IPC File and Stream formats (tries file, falls back to
stream)
Notes:
- Use --preserve-subdirs to mirror the input directory tree under the output
dir.
- Use --overwrite to re-create files; otherwise existing outputs are skipped.
Arguments:
SOURCE_DIR: Directory containing .arrow files.
OUTPUT_DIR: Directory to write .parquet files.
Examples:
uv run https://tools.ricardodecal.com/python/convert_arrow_to_parquet_streaming.py --source-dir
./arrow_data --output-dir ./parquet_data
uv run https://tools.ricardodecal.com/python/convert_arrow_to_parquet_streaming.py --source-dir
./arrow_data --output-dir ./parquet_data --preserve-subdirs
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * --source-dir DIRECTORY Directory containing .arrow files โ
โ [required] โ
โ --output-dir DIRECTORY Directory to write .parquet files โ
โ [default: parq_convert] โ
โ --overwrite Overwrite existing parquet files โ
โ --preserve-subdirs Preserve input subdirectory โ
โ structure inside output dir โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏcount_parquet_rows.py python
Output of uv run https://tools.ricardodecal.com/python/count_parquet_rows.py --help:
Usage: count_parquet_rows.py [OPTIONS] DATASET_PATH
Count the number of rows in a parquet file/dataset without reading data into
memory.
Works by reading just the metadata headers. Supports:
- Single parquet files
- Directories of parquet shards
- Hive-style partitioned datasets
- Local paths and S3 URIs
Arguments:
DATASET_PATH: Local file path or S3 URI to the parquet dataset.
Examples:
# Local file
uv run https://tools.ricardodecal.com/python/count_parquet_rows.py
./data.parquet
# Directory of shards
uv run https://tools.ricardodecal.com/python/count_parquet_rows.py
./data_dir/
# S3 URI
uv run https://tools.ricardodecal.com/python/count_parquet_rows.py
s3://my-bucket/data.parquet
โญโ Arguments โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * dataset_path TEXT Local file path or S3 URI. [required] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏdedup_dirs.py python
Output of uv run https://tools.ricardodecal.com/python/dedup_dirs.py --help:
Usage: dedup_dirs.py [OPTIONS] OLD_DIR NEW_DIR
Find duplicate files between OLD_DIR and NEW_DIR, optionally deleting from
OLD_DIR.
Compares files by path and content. For large files (>10MB), uses sampling
for speed. For smaller files, compares MD5 hashes. Runs in parallel for
performance on large directory trees.
Without --delete, runs in dry-run mode showing what would be deleted.
With --delete, removes identical files from OLD_DIR and cleans up empty
directories.
Arguments:
OLD_DIR: Source directory to deduplicate (files deleted from here).
NEW_DIR: Reference directory to compare against (untouched).
Examples:
# Dry run - see what would be deleted
uv run https://tools.ricardodecal.com/python/dedup_dirs.py ~/old-backup
~/new-backup
# Actually delete duplicates
uv run https://tools.ricardodecal.com/python/dedup_dirs.py ~/old-backup
~/new-backup --delete
# Use more workers for faster processing
uv run https://tools.ricardodecal.com/python/dedup_dirs.py ~/old ~/new
--delete -w 16
โญโ Arguments โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * old_dir PATH Source directory to deduplicate (files deleted from โ
โ here). โ
โ [required] โ
โ * new_dir PATH Reference directory to compare against (untouched). โ
โ [required] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --delete Actually delete identical files (default is dry โ
โ run). โ
โ --workers -w INTEGER Number of parallel workers. [default: 8] โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏdownload_video.py python
Output of uv run https://tools.ricardodecal.com/python/download_video.py --help:
Usage: download_video.py [OPTIONS] URL
Download a video from a supported platform (Twitter/X, YouTube, etc.).
Uses yt-dlp to download videos from a wide variety of websites.
Twitter "GIFs" are actually MP4 videos, which this tool can also download.
Arguments:
URL: The URL of the video page (e.g., Twitter post, YouTube video).
Examples:
uv run https://tools.ricardodecal.com/python/download_video.py
https://x.com/SemiAnalysis_/status/1990449859321888935
uv run https://tools.ricardodecal.com/python/download_video.py
https://www.youtube.com/watch?v=dQw4w9WgXcQ --output my_video.mp4
โญโ Arguments โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * url TEXT The URL of the video page. [required] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --output -o TEXT Output filepath (file or directory). Defaults to โ
โ 'Title [ID].mp4' in current directory. โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏhtml_to_text.html page
https://tools.ricardodecal.com/html/html_to_text.html
Paste HTML and extract readable plain text. Runs entirely in your browser.
html_to_text.py python
Output of uv run https://tools.ricardodecal.com/python/html_to_text.py --help:
Usage: html_to_text.py [OPTIONS] URL
Fetch a webpage and convert its readable content to plain text.
Uses Readability-style boilerplate removal by default, with conservative
fallbacks.
Then uses inscriptis to render HTML to text while preserving basic structure.
Automatically adds https:// if no scheme is provided.
Arguments:
URL: The webpage URL to fetch (e.g., example.com or https://example.com).
Examples:
uv run https://tools.ricardodecal.com/python/html_to_text.py example.com
uv run https://tools.ricardodecal.com/python/html_to_text.py
https://news.ycombinator.com --timeout 30
uv run https://tools.ricardodecal.com/python/html_to_text.py
wikipedia.org/wiki/Python --raw
โญโ Arguments โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * url TEXT The webpage URL to fetch. [required] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --timeout -t INTEGER Request timeout in seconds. โ
โ [default: 15] โ
โ --mode -m [auto|readability|heuristi Extraction mode: auto โ
โ c|full] (default), readability โ
โ (boilerplate removal), โ
โ heuristic (semantic tags), โ
โ or full (whole page). โ
โ [default: auto] โ
โ --min-chars INTEGER RANGE [x>=0] Minimum extracted text โ
โ length to accept before โ
โ falling back (auto mode). โ
โ [default: 200] โ
โ --min-ratio FLOAT RANGE [0.0<=x<=1.0] Minimum extracted/full text โ
โ length ratio to accept โ
โ before falling back (auto โ
โ mode). โ
โ [default: 0.2] โ
โ --raw Skip whitespace cleanup โ
โ (preserve original โ
โ formatting). โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏhttps://tools.ricardodecal.com/html/strip_pdf_metadata.html
Remove author, title, timestamps, and other metadata from PDF files. Runs entirely in your browser โ files never leave your device.
strip_pdf_metadata.py python
Output of uv run https://tools.ricardodecal.com/python/strip_pdf_metadata.py --help:
Usage: strip_pdf_metadata.py [OPTIONS] INPUT_FILE [OUTPUT_FILE]
Strip metadata from a PDF file.
If OUTPUT_FILE is not provided, writes to 'stripped_<INPUT_FILE>'.
โญโ Arguments โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * input_file FILE Input PDF file. [required] โ
โ output_file [OUTPUT_FILE] Output PDF file. Defaults to โ
โ 'stripped_<INPUT_FILE>'. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏyt_transcript.py python
Output of uv run https://tools.ricardodecal.com/python/yt_transcript.py --help:
Usage: yt_transcript.py [OPTIONS] URL [OUTPUT_FILE]
Download transcripts from a YouTube URL (video or playlist) to a single file.
Arguments:
URL: YouTube video or playlist URL.
OUTPUT_FILE: Path to save the transcript text. Defaults to transcript.txt.
Examples:
uv run https://tools.ricardodecal.com/python/yt_transcript.py "https://youtu.be/..."
uv run https://tools.ricardodecal.com/python/yt_transcript.py "https://youtube.com/playlist?list=..."
out.txt
โญโ Arguments โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * url TEXT YouTube video or playlist URL. โ
โ [required] โ
โ output_file [OUTPUT_FILE] Path to save the transcript text. โ
โ Defaults to transcript.txt. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --lang -l TEXT Language codes to prefer (e.g. -l en -l fr) โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ๐ ๏ธ Development (2 scripts)
burn_iso.py python
Output of uv run https://tools.ricardodecal.com/python/burn_iso.py --help:
Usage: burn_iso.py [OPTIONS] ISO_FILE
Burn an ISO file to a USB drive with safety checks and progress monitoring.
This tool will verify the ISO file, check the target device, unmount any
mounted partitions, and burn the ISO using dd with progress reporting.
DANGER: This will completely erase all data on the target USB device!
Arguments:
ISO_FILE: Path to the ISO file to burn.
Examples:
uv run https://tools.ricardodecal.com/python/burn_iso.py --list
uv run https://tools.ricardodecal.com/python/burn_iso.py ubuntu.iso
--device /dev/sdb --dry-run
uv run https://tools.ricardodecal.com/python/burn_iso.py
~/Downloads/debian.iso -d /dev/sdc
โญโ Arguments โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * iso_file FILE Path to the ISO file to burn. [required] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --device -d TEXT USB device path (e.g., /dev/sdb). Use --list to see โ
โ available devices. โ
โ --list List available block devices and exit. โ
โ --dry-run -n Show what would be done without actually doing it. โ
โ --force -f Skip confirmation prompts. โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏipynb_to_py_sphinx.py python
Output of uv run https://tools.ricardodecal.com/python/ipynb_to_py_sphinx.py --help:
Usage: ipynb_to_py_sphinx.py [OPTIONS] NOTEBOOK
Convert a Jupyter notebook to a Sphinx Gallery Python script.
This tool converts a .ipynb file to a .py file formatted for Sphinx Gallery.
It converts Markdown cells to RST (using pypandoc) and comments them out,
while preserving code cells. It also handles magic commands by commenting them
out.
Based on: https://gist.github.com/chsasank/7218ca16f8d022e02a9c0deb94a310fe
Arguments:
NOTEBOOK: The path to the input Jupyter notebook (.ipynb).
Examples:
uv run https://tools.ricardodecal.com/python/ipynb_to_py_sphinx.py
notebook.ipynb
uv run https://tools.ricardodecal.com/python/ipynb_to_py_sphinx.py
notebook.ipynb --output my_gallery_script.py
โญโ Arguments โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * notebook PATH The path to the input Jupyter notebook (.ipynb). โ
โ [required] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --output -o PATH Output Python file path. Defaults to notebook name โ
โ with .py extension. โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ