Skip to content

Commit 7e65aac

Browse files
committed
WIP: Refactor backend to a rest api and make weak changes to the front to make it works
1 parent 2f447ae commit 7e65aac

File tree

14 files changed

+783
-338
lines changed

14 files changed

+783
-338
lines changed

docs/api_models.md

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
# API Models Documentation
2+
3+
This document describes the Pydantic models used for the `/api/ingest` endpoint in the Gitingest API.
4+
5+
## Overview
6+
7+
The `/api/ingest` endpoint uses structured Pydantic models for both input validation and response formatting. This ensures type safety, automatic validation, and consistent API responses.
8+
9+
## Models
10+
11+
### PatternType Enum
12+
13+
```python
14+
class PatternType(str, Enum):
15+
INCLUDE = "include"
16+
EXCLUDE = "exclude"
17+
```
18+
19+
Defines the two types of file filtering patterns:
20+
- `INCLUDE`: Only include files matching the pattern
21+
- `EXCLUDE`: Exclude files matching the pattern
22+
23+
### IngestRequest
24+
25+
Input model for the `/api/ingest` endpoint.
26+
27+
```python
28+
class IngestRequest(BaseModel):
29+
input_text: str = Field(..., description="Git repository URL or slug to ingest")
30+
max_file_size: int = Field(..., ge=0, le=500, description="File size slider position (0-500)")
31+
pattern_type: PatternType = Field(default=PatternType.EXCLUDE, description="Pattern type for file filtering")
32+
pattern: str = Field(default="", description="Glob/regex pattern for file filtering")
33+
token: str | None = Field(default=None, description="GitHub PAT for private repositories")
34+
```
35+
36+
**Validation Rules:**
37+
- `input_text`: Must not be empty (stripped of whitespace)
38+
- `max_file_size`: Must be between 0 and 500 (inclusive)
39+
- `pattern_type`: Defaults to "exclude"
40+
- `pattern`: Stripped of whitespace, defaults to empty string
41+
- `token`: Optional GitHub personal access token
42+
43+
**Example:**
44+
```json
45+
{
46+
"input_text": "https://github.com/cyclotruc/gitingest",
47+
"max_file_size": 243,
48+
"pattern_type": "exclude",
49+
"pattern": "*.md",
50+
"token": null
51+
}
52+
```
53+
54+
### IngestSuccessResponse
55+
56+
Response model for successful ingestion operations.
57+
58+
```python
59+
class IngestSuccessResponse(BaseModel):
60+
result: Literal[True] = True
61+
repo_url: str = Field(..., description="Original repository URL")
62+
short_repo_url: str = Field(..., description="Short repository URL (user/repo)")
63+
summary: str = Field(..., description="Ingestion summary with token estimates")
64+
tree: str = Field(..., description="File tree structure")
65+
content: str = Field(..., description="Processed file content")
66+
ingest_id: str = Field(..., description="Unique ingestion identifier")
67+
default_file_size: int = Field(..., description="File size slider position used")
68+
pattern_type: str = Field(..., description="Pattern type used")
69+
pattern: str = Field(..., description="Pattern used")
70+
token: str | None = Field(None, description="Token used (if any)")
71+
```
72+
73+
**Example:**
74+
```json
75+
{
76+
"result": true,
77+
"repo_url": "https://github.com/cyclotruc/gitingest",
78+
"short_repo_url": "cyclotruc/gitingest",
79+
"summary": "Processed 50 files, estimated tokens: 15,000",
80+
"tree": "gitingest/\n├── src/\n│ ├── server/\n│ └── gitingest/\n└── README.md",
81+
"content": "Repository content here...",
82+
"ingest_id": "abc123",
83+
"default_file_size": 243,
84+
"pattern_type": "exclude",
85+
"pattern": "*.md",
86+
"token": null
87+
}
88+
```
89+
90+
### IngestErrorResponse
91+
92+
Response model for failed ingestion operations.
93+
94+
```python
95+
class IngestErrorResponse(BaseModel):
96+
error: str = Field(..., description="Error message")
97+
repo_url: str = Field(..., description="Repository URL that failed")
98+
default_file_size: int = Field(..., description="File size slider position used")
99+
pattern_type: str = Field(..., description="Pattern type used")
100+
pattern: str = Field(..., description="Pattern used")
101+
token: str | None = Field(None, description="Token used (if any)")
102+
```
103+
104+
**Example:**
105+
```json
106+
{
107+
"error": "Repository not found or is private",
108+
"repo_url": "https://github.com/private/repo",
109+
"default_file_size": 243,
110+
"pattern_type": "exclude",
111+
"pattern": "",
112+
"token": null
113+
}
114+
```
115+
116+
### IngestResponse
117+
118+
Union type for API responses.
119+
120+
```python
121+
IngestResponse = Union[IngestSuccessResponse, IngestErrorResponse]
122+
```
123+
124+
This allows the endpoint to return either a success or error response with proper typing.
125+
126+
## Usage in FastAPI
127+
128+
The models are used in the `/api/ingest` endpoint as follows:
129+
130+
```python
131+
@router.post("/api/ingest",
132+
response_model=IngestResponse,
133+
responses={
134+
200: {"model": IngestSuccessResponse, "description": "Successful ingestion"},
135+
400: {"model": IngestErrorResponse, "description": "Bad request or processing error"},
136+
500: {"model": IngestErrorResponse, "description": "Internal server error"}
137+
})
138+
async def api_ingest(
139+
request: Request,
140+
input_text: str = Form(...),
141+
max_file_size: int = Form(...),
142+
pattern_type: str = Form("exclude"),
143+
pattern: str = Form(""),
144+
token: Optional[str] = Form(None),
145+
) -> IngestResponse:
146+
# Implementation...
147+
```
148+
149+
## Benefits
150+
151+
1. **Type Safety**: All inputs and outputs are properly typed
152+
2. **Automatic Validation**: Pydantic validates all inputs according to defined rules
153+
3. **API Documentation**: FastAPI automatically generates OpenAPI documentation
154+
4. **Consistent Responses**: Structured error and success responses
155+
5. **IDE Support**: Better autocomplete and error detection in IDEs
156+
157+
## Error Handling
158+
159+
The models provide structured error handling:
160+
161+
- **Validation Errors**: Invalid input parameters are caught and returned as `IngestErrorResponse`
162+
- **Processing Errors**: Repository processing failures return detailed error information
163+
- **Unexpected Errors**: Internal server errors are caught and formatted consistently
164+
165+
## Migration from Previous Implementation
166+
167+
The previous implementation used raw dictionaries and manual JSON responses. The new models provide:
168+
169+
- Better type safety
170+
- Automatic validation
171+
- Consistent error handling
172+
- Improved API documentation
173+
- Better developer experience

src/server/main.py

Lines changed: 5 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,8 @@
1212
from slowapi.errors import RateLimitExceeded
1313
from starlette.middleware.trustedhost import TrustedHostMiddleware
1414

15-
from server.routers import download, dynamic, index
15+
from server.routers import dynamic, index
16+
from server.routers.ingest import router as ingest
1617
from server.server_config import templates
1718
from server.server_utils import lifespan, limiter, rate_limit_exception_handler
1819

@@ -58,7 +59,7 @@ async def health_check() -> dict[str, str]:
5859
return {"status": "healthy"}
5960

6061

61-
@app.head("/")
62+
@app.head("/", include_in_schema=False)
6263
async def head_root() -> HTMLResponse:
6364
"""Respond to HTTP HEAD requests for the root URL.
6465
@@ -72,27 +73,7 @@ async def head_root() -> HTMLResponse:
7273
"""
7374
return HTMLResponse(content=None, headers={"content-type": "text/html; charset=utf-8"})
7475

75-
76-
@app.get("/api/", response_class=HTMLResponse)
77-
@app.get("/api", response_class=HTMLResponse)
78-
async def api_docs(request: Request) -> HTMLResponse:
79-
"""Render the API documentation page.
80-
81-
Parameters
82-
----------
83-
request : Request
84-
The incoming HTTP request.
85-
86-
Returns
87-
-------
88-
HTMLResponse
89-
A rendered HTML page displaying API documentation.
90-
91-
"""
92-
return templates.TemplateResponse("api.jinja", {"request": request})
93-
94-
95-
@app.get("/robots.txt")
76+
@app.get("/robots.txt", include_in_schema=False)
9677
async def robots() -> FileResponse:
9778
"""Serve the ``robots.txt`` file to guide search engine crawlers.
9879
@@ -120,5 +101,5 @@ async def llm_txt() -> FileResponse:
120101

121102
# Include routers for modular endpoints
122103
app.include_router(index)
123-
app.include_router(download)
104+
app.include_router(ingest)
124105
app.include_router(dynamic)

src/server/models.py

Lines changed: 119 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,130 @@
22

33
from __future__ import annotations
44

5-
from pydantic import BaseModel
5+
from enum import Enum
6+
from typing import Literal, Union
7+
8+
from pydantic import BaseModel, Field, field_validator
69

710
# needed for type checking (pydantic)
811
from server.form_types import IntForm, OptStrForm, StrForm # noqa: TC001 (typing-only-first-party-import)
912

1013

14+
class PatternType(str, Enum):
15+
"""Enumeration for pattern types used in file filtering."""
16+
17+
INCLUDE = "include"
18+
EXCLUDE = "exclude"
19+
20+
21+
class IngestRequest(BaseModel):
22+
"""Request model for the /api/ingest endpoint.
23+
24+
Attributes
25+
----------
26+
input_text : str
27+
The Git repository URL or slug to ingest.
28+
max_file_size : int
29+
Maximum file size slider position (0-500) for filtering files.
30+
pattern_type : PatternType
31+
Type of pattern to use for file filtering (include or exclude).
32+
pattern : str
33+
Glob/regex pattern string for file filtering.
34+
token : str | None
35+
GitHub personal access token (PAT) for accessing private repositories.
36+
"""
37+
38+
input_text: str = Field(..., description="Git repository URL or slug to ingest")
39+
max_file_size: int = Field(..., ge=0, le=500, description="File size slider position (0-500)")
40+
pattern_type: PatternType = Field(default=PatternType.EXCLUDE, description="Pattern type for file filtering")
41+
pattern: str = Field(default="", description="Glob/regex pattern for file filtering")
42+
token: str | None = Field(default=None, description="GitHub PAT for private repositories")
43+
44+
@field_validator('input_text')
45+
@classmethod
46+
def validate_input_text(cls, v):
47+
"""Validate that input_text is not empty."""
48+
if not v.strip():
49+
raise ValueError('input_text cannot be empty')
50+
return v.strip()
51+
52+
@field_validator('pattern')
53+
@classmethod
54+
def validate_pattern(cls, v):
55+
"""Validate pattern field."""
56+
return v.strip() if v else ""
57+
58+
59+
class IngestSuccessResponse(BaseModel):
60+
"""Success response model for the /api/ingest endpoint.
61+
62+
Attributes
63+
----------
64+
result : Literal[True]
65+
Always True for successful responses.
66+
repo_url : str
67+
The original repository URL that was processed.
68+
short_repo_url : str
69+
Short form of repository URL (user/repo).
70+
summary : str
71+
Summary of the ingestion process including token estimates.
72+
tree : str
73+
File tree structure of the repository.
74+
content : str
75+
Processed content from the repository files.
76+
default_file_size : int
77+
The file size slider position used.
78+
pattern_type : str
79+
The pattern type used for filtering.
80+
pattern : str
81+
The pattern used for filtering.
82+
token : str | None
83+
The token used (if any).
84+
"""
85+
86+
result: Literal[True] = True
87+
repo_url: str = Field(..., description="Original repository URL")
88+
short_repo_url: str = Field(..., description="Short repository URL (user/repo)")
89+
summary: str = Field(..., description="Ingestion summary with token estimates")
90+
tree: str = Field(..., description="File tree structure")
91+
content: str = Field(..., description="Processed file content")
92+
default_file_size: int = Field(..., description="File size slider position used")
93+
pattern_type: str = Field(..., description="Pattern type used")
94+
pattern: str = Field(..., description="Pattern used")
95+
token: str | None = Field(None, description="Token used (if any)")
96+
97+
98+
class IngestErrorResponse(BaseModel):
99+
"""Error response model for the /api/ingest endpoint.
100+
101+
Attributes
102+
----------
103+
error : str
104+
Error message describing what went wrong.
105+
repo_url : str
106+
The repository URL that failed to process.
107+
default_file_size : int
108+
The file size slider position that was used.
109+
pattern_type : str
110+
The pattern type that was used.
111+
pattern : str
112+
The pattern that was used.
113+
token : str | None
114+
The token that was used (if any).
115+
"""
116+
117+
error: str = Field(..., description="Error message")
118+
repo_url: str = Field(..., description="Repository URL that failed")
119+
default_file_size: int = Field(..., description="File size slider position used")
120+
pattern_type: str = Field(..., description="Pattern type used")
121+
pattern: str = Field(..., description="Pattern used")
122+
token: str | None = Field(None, description="Token used (if any)")
123+
124+
125+
# Union type for API responses
126+
IngestResponse = Union[IngestSuccessResponse, IngestErrorResponse]
127+
128+
11129
class QueryForm(BaseModel):
12130
"""Form data for the query.
13131

0 commit comments

Comments
 (0)