Skip to content

Commit 368c12b

Browse files
Refactor project structure, enhance logic, update configurations, and improve code quality (coderamp-labs#85)
* Refactor project structure, enhance logic, update configurations, and improve code quality Refactoring and Logic Improvements - Refactored the `_scan_directory` function in `src/gitingest/ingest_from_query.py` by extracting loop logic into the new `_process_item` function, and further separating functionality into `_process_symlink` and `_process_file` - Replaced multiple return statements with error raising and catching, introducing custom exceptions (`MaxFilesReachedError`, `MaxFileSizeReachedError`, `AlreadyVisitedError`) in the `_process_item` and `_scan_directory` functions - Enhanced the logic in the `process_query` function in `src/process_query.py` for better flow and maintainability - Improved the logic in `_generate_token_string` in `src/gitingest/ingest_from_query.py` - Refined the `download_ingest` function in `src/routers/download.py` for better clarity and functionality Exception Handling Enhancements - Replaced broad `Exception` handling with specific `OSError` in the `_read_file_content` function in `src/gitingest/ingest_from_query.py` - Refined exception handling throughout the codebase, including removing redundant try-except-raise blocks, e.g., in `clone_repo` function in `src/gitingest/clone.py` - Added custom exceptions to `src/gitingest/exceptions.py`: `MaxFilesReachedError`, `MaxFileSizeReachedError`, and `AlreadyVisitedError` - Included explicit re-raising of exceptions in various functions for improved error propagation Test Suite Refactoring - Cleaned up and reorganized test files: - Moved tests from `src/gitingest/tests/` to `tests/` - Consolidated fixtures from `tests/test_ingest.py` into `tests/conftest.py` - Removed redundant content from `tests/conftest.py` - Migrated configuration from `pytest.ini` to `pyproject.toml`, deleted `pytest.ini`, and updated `.dockerignore` Documentation Improvements - Added `darglint` for enforcing `numpy` docstring style in `.pre-commit-config.yaml` for `src/` files - Updated docstrings throughout the codebase, including adding module docstrings where needed - Updated `README.md`: - Added "GitHub stars" badge - Moved the "Discord" badge to its own line - Replaced occurrences of "Gitingest" with "GitIngest" for consistency and clarity Linting and Code Quality - Integrated `pylint` into `.pre-commit-config.yaml` for both `src/` and `tests/` directories - Created `tests/.pylintrc` for linting configuration specific to test files Code Clean-up - Removed the redundant `src/__init__.py` file Naming Conventions and Code Style - Renamed `logSliderToSize` to `log_slider_to_size` in `src/server_utils.py` for consistency with Python's naming conventions - Added explicit encoding specification in multiple instances of `open` throughout the code
1 parent 6e32a03 commit 368c12b

30 files changed

+614
-311
lines changed

.dockerignore

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,5 +37,4 @@ docs/
3737
tests/
3838
*.md
3939
LICENSE
40-
pytest.ini
4140
setup.py

.pre-commit-config.yaml

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,3 +83,51 @@ repos:
8383
- id: markdownlint
8484
description: "Lint markdown files."
8585
args: ["--disable=line-length"]
86+
87+
- repo: https://github.com/terrencepreilly/darglint
88+
rev: v1.8.1
89+
hooks:
90+
- id: darglint
91+
name: darglint for source
92+
args: [--docstring-style=numpy]
93+
files: ^src/
94+
95+
- repo: https://github.com/pycqa/pylint
96+
rev: v3.3.3
97+
hooks:
98+
- id: pylint
99+
name: pylint for source
100+
files: ^src/
101+
additional_dependencies:
102+
[
103+
click,
104+
fastapi-analytics,
105+
pytest-asyncio,
106+
python-dotenv,
107+
slowapi,
108+
starlette,
109+
tiktoken,
110+
uvicorn,
111+
]
112+
- id: pylint
113+
name: pylint for tests
114+
files: ^tests/
115+
args:
116+
- --rcfile=tests/.pylintrc
117+
additional_dependencies:
118+
[
119+
click,
120+
fastapi-analytics,
121+
pytest,
122+
pytest-asyncio,
123+
python-dotenv,
124+
slowapi,
125+
starlette,
126+
tiktoken,
127+
uvicorn,
128+
]
129+
130+
- repo: meta
131+
hooks:
132+
- id: check-hooks-apply
133+
- id: check-useless-excludes

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,11 @@
44

55
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/cyclotruc/gitingest/blob/main/LICENSE)
66
[![PyPI version](https://badge.fury.io/py/gitingest.svg)](https://badge.fury.io/py/gitingest)
7+
[![GitHub stars](https://img.shields.io/github/stars/cyclotruc/gitingest?style=social.svg)](https://github.com/cyclotruc/gitingest)
78
[![Downloads](https://pepy.tech/badge/gitingest)](https://pepy.tech/project/gitingest)
89
[![GitHub issues](https://img.shields.io/github/issues/cyclotruc/gitingest)](https://github.com/cyclotruc/gitingest/issues)
910
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
11+
1012
[![Discord](https://dcbadge.limes.pink/api/server/https://discord.com/invite/zerRaGK9EC)](https://discord.com/invite/zerRaGK9EC)
1113

1214
Turn any Git repository into a prompt-friendly text ingest for LLMs.

pyproject.toml

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,60 @@
1+
[project]
2+
name = "gitingest"
3+
version = "0.1.2"
4+
description="CLI tool to analyze and create text dumps of codebases for LLMs"
5+
readme = {file = "README.md", content-type = "text/markdown" }
6+
requires-python = ">= 3.10"
7+
dependencies = [
8+
"click>=8.0.0",
9+
"fastapi-analytics",
10+
"fastapi[standard]",
11+
"python-dotenv",
12+
"slowapi",
13+
"starlette",
14+
"tiktoken",
15+
"uvicorn",
16+
]
17+
license = {file = "LICENSE"}
18+
authors = [{name = "Romain Courtois", email = "romain@coderamp.io"}]
19+
classifiers=[
20+
"Development Status :: 3 - Alpha",
21+
"Intended Audience :: Developers",
22+
"License :: OSI Approved :: MIT License",
23+
"Programming Language :: Python :: 3.10",
24+
"Programming Language :: Python :: 3.11",
25+
"Programming Language :: Python :: 3.12",
26+
"Programming Language :: Python :: 3.13",
27+
]
28+
29+
[project.scripts]
30+
gitingest = "gitingest.cli:main"
31+
32+
[project.urls]
33+
homepage = "https://gitingest.com"
34+
github = "https://github.com/cyclotruc/gitingest"
35+
36+
[build-system]
37+
requires = ["setuptools>=61.0", "wheel"]
38+
build-backend = "setuptools.build_meta"
39+
40+
[tool.setuptools]
41+
packages = {find = {where = ["src"]}}
42+
include-package-data = true
43+
44+
# Linting configuration
145
[tool.pylint.format]
246
max-line-length = 119
347

48+
[tool.pylint.'MESSAGES CONTROL']
49+
disable = [
50+
"too-many-arguments",
51+
"too-many-positional-arguments",
52+
"too-many-locals",
53+
"too-few-public-methods",
54+
"broad-exception-caught",
55+
"duplicate-code",
56+
]
57+
458
[tool.pycln]
559
all = true
660

@@ -14,3 +68,12 @@ filter_files = true
1468

1569
[tool.black]
1670
line-length = 119
71+
72+
# Test configuration
73+
[tool.pytest.ini_options]
74+
pythonpath = ["src"]
75+
testpaths = ["tests/"]
76+
python_files = "test_*.py"
77+
asyncio_mode = "auto"
78+
python_classes = "Test*"
79+
python_functions = "test_*"

pytest.ini

Lines changed: 0 additions & 8 deletions
This file was deleted.

src/config.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
1+
""" Configuration file for the project. """
2+
13
MAX_DISPLAY_SIZE: int = 300_000
24
TMP_BASE_PATH: str = "/tmp/gitingest"
35
DELETE_REPO_AFTER: int = 60 * 60 # In seconds
46

57
EXAMPLE_REPOS: list[dict[str, str]] = [
6-
{"name": "Gitingest", "url": "https://github.com/cyclotruc/gitingest"},
8+
{"name": "GitIngest", "url": "https://github.com/cyclotruc/gitingest"},
79
{"name": "FastAPI", "url": "https://github.com/tiangolo/fastapi"},
810
{"name": "Flask", "url": "https://github.com/pallets/flask"},
911
{"name": "Tldraw", "url": "https://github.com/tldraw/tldraw"},

src/gitingest/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
""" gitingest: A package for ingesting data from git repositories. """
2+
13
from gitingest.clone import clone_repo
24
from gitingest.ingest import ingest
35
from gitingest.ingest_from_query import ingest_from_query

src/gitingest/cli.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
""" Command-line interface for the GitIngest package. """
2+
3+
# pylint: disable=no-value-for-parameter
4+
15
import click
26

37
from gitingest.ingest import ingest
@@ -40,7 +44,7 @@ def main(
4044
4145
Raises
4246
------
43-
click.Abort
47+
Abort
4448
If there is an error during the execution of the command, this exception is raised to abort the process.
4549
"""
4650
try:

src/gitingest/clone.py

Lines changed: 17 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
1+
""" This module contains functions for cloning a Git repository to a local path. """
2+
13
import asyncio
24
from dataclasses import dataclass
35

4-
from gitingest.exceptions import AsyncTimeoutError
56
from gitingest.utils import async_timeout
67

78
CLONE_TIMEOUT: int = 20
@@ -59,11 +60,7 @@ async def clone_repo(config: CloneConfig) -> tuple[bytes, bytes]:
5960
Raises
6061
------
6162
ValueError
62-
If the repository does not exist or if required query parameters are missing.
63-
RuntimeError
64-
If any git command fails during execution.
65-
AsyncTimeoutError
66-
If the cloning process exceeds the specified timeout.
63+
If the 'url' or 'local_path' parameters are missing, or if the repository is not found.
6764
"""
6865
# Extract and validate query parameters
6966
url: str = config.url
@@ -81,29 +78,25 @@ async def clone_repo(config: CloneConfig) -> tuple[bytes, bytes]:
8178
if not await _check_repo_exists(url):
8279
raise ValueError("Repository not found, make sure it is public")
8380

84-
try:
85-
if commit:
86-
# Scenario 1: Clone and checkout a specific commit
87-
# Clone the repository without depth to ensure full history for checkout
88-
clone_cmd = ["git", "clone", "--single-branch", url, local_path]
89-
await _run_git_command(*clone_cmd)
90-
91-
# Checkout the specific commit
92-
checkout_cmd = ["git", "-C", local_path, "checkout", commit]
93-
return await _run_git_command(*checkout_cmd)
81+
if commit:
82+
# Scenario 1: Clone and checkout a specific commit
83+
# Clone the repository without depth to ensure full history for checkout
84+
clone_cmd = ["git", "clone", "--single-branch", url, local_path]
85+
await _run_git_command(*clone_cmd)
9486

95-
if branch and branch.lower() not in ("main", "master"):
87+
# Checkout the specific commit
88+
checkout_cmd = ["git", "-C", local_path, "checkout", commit]
89+
return await _run_git_command(*checkout_cmd)
9690

97-
# Scenario 2: Clone a specific branch with shallow depth
98-
clone_cmd = ["git", "clone", "--depth=1", "--single-branch", "--branch", branch, url, local_path]
99-
return await _run_git_command(*clone_cmd)
91+
if branch and branch.lower() not in ("main", "master"):
10092

101-
# Scenario 3: Clone the default branch with shallow depth
102-
clone_cmd = ["git", "clone", "--depth=1", "--single-branch", url, local_path]
93+
# Scenario 2: Clone a specific branch with shallow depth
94+
clone_cmd = ["git", "clone", "--depth=1", "--single-branch", "--branch", branch, url, local_path]
10395
return await _run_git_command(*clone_cmd)
10496

105-
except (RuntimeError, asyncio.TimeoutError, AsyncTimeoutError):
106-
raise # Re-raise the exception
97+
# Scenario 3: Clone the default branch with shallow depth
98+
clone_cmd = ["git", "clone", "--depth=1", "--single-branch", url, local_path]
99+
return await _run_git_command(*clone_cmd)
107100

108101

109102
async def _check_repo_exists(url: str) -> bool:

src/gitingest/exceptions.py

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
1+
""" Custom exceptions for the GitIngest package. """
2+
3+
14
class InvalidPatternError(ValueError):
25
"""
36
Exception raised when a pattern contains invalid characters.
4-
57
This exception is used to signal that a pattern provided for some operation
68
contains characters that are not allowed. The valid characters for the pattern
79
include alphanumeric characters, dash (-), underscore (_), dot (.), forward slash (/),
810
plus (+), and asterisk (*).
9-
1011
Parameters
1112
----------
1213
pattern : str
@@ -27,3 +28,24 @@ class AsyncTimeoutError(Exception):
2728
This exception is used by the `async_timeout` decorator to signal that the wrapped
2829
asynchronous function has exceeded the specified time limit for execution.
2930
"""
31+
32+
33+
class MaxFilesReachedError(Exception):
34+
"""Exception raised when the maximum number of files is reached."""
35+
36+
def __init__(self, max_files: int) -> None:
37+
super().__init__(f"Maximum number of files ({max_files}) reached.")
38+
39+
40+
class MaxFileSizeReachedError(Exception):
41+
"""Raised when the maximum file size is reached."""
42+
43+
def __init__(self, max_size: int):
44+
super().__init__(f"Maximum file size limit ({max_size/1024/1024:.1f}MB) reached.")
45+
46+
47+
class AlreadyVisitedError(Exception):
48+
"""Exception raised when a symlink target has already been visited."""
49+
50+
def __init__(self, path: str) -> None:
51+
super().__init__(f"Symlink target already visited: {path}")

0 commit comments

Comments
 (0)