Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [3.0.0] - 2026-01-30

### Security

- **CRITICAL**: Removed client-side URL fetching to prevent SSRF vulnerabilities
- URLs are now passed to the server for secure server-side fetching
- Restricted `sign()` method to local files only (API limitation)

### Changed

- **BREAKING**: `sign()` only accepts local files (paths, bytes, file objects) - no URLs
- **BREAKING**: Most methods now accept `FileInputWithUrl` - URLs passed to server
- **BREAKING**: Removed client-side PDF parsing - leverage API's negative index support
- Methods like `rotate()`, `split()`, `deletePages()` now support negative indices (-1 = last page)
- All methods except `sign()` accept URLs that are passed securely to the server

### Removed

- **BREAKING**: Removed `process_remote_file_input()` from public API (security risk)
- **BREAKING**: Removed `get_pdf_page_count()` from public API (client-side PDF parsing)
- **BREAKING**: Removed `is_valid_pdf()` from public API (internal use only)
- Removed ~200 lines of client-side PDF parsing code

### Added

- SSRF protection documentation in README
- Migration guide (docs/MIGRATION.md)
- Security best practices for handling remote files
- Support for negative page indices in all page-based methods

## [2.0.0] - 2025-01-09

- Initial stable release with full API coverage
- Async-first design with httpx and aiofiles
- Comprehensive type hints and mypy strict mode
- Workflow builder with staged pattern
- Error hierarchy with typed exceptions
75 changes: 75 additions & 0 deletions docs/MIGRATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Migration Guide: v2.x to v3.0

## Overview

Version 3.0.0 introduces SSRF protection and removes client-side PDF parsing.

## Key Changes

### 1. `sign()` No Longer Accepts URLs (API Limitation)

**Before (v2.x)**:
```python
result = await client.sign('https://example.com/document.pdf', {...})
```

**After (v3.0)** - Fetch file first:
```python
import httpx

async with httpx.AsyncClient() as http:
url = 'https://example.com/document.pdf'

# IMPORTANT: Validate URL
if not url.startswith('https://trusted-domain.com/'):
raise ValueError('URL not from trusted domain')

response = await http.get(url, timeout=10.0)
response.raise_for_status()
pdf_bytes = response.content

result = await client.sign(pdf_bytes, {...})
```

### 2. Most Methods Now Accept URLs (Passed to Server)

Good news! These methods now support URLs passed securely to the server:
- `rotate()`, `split()`, `add_page()`, `duplicate_pages()`, `delete_pages()`
- `set_page_labels()`, `set_metadata()`, `optimize()`
- `flatten()`, `apply_instant_json()`, `apply_xfdf()`
- All redaction methods
- `convert()`, `ocr()`, `watermark_*()`, `extract_*()`, `merge()`, `password_protect()`

**Example**:
```python
# This now works!
result = await client.rotate('https://example.com/doc.pdf', 90, pages={'start': 0, 'end': 5})
```

### 3. Negative Page Indices Now Supported

Use negative indices for "from end" references:
- `-1` = last page
- `-2` = second-to-last page
- etc.

**Examples**:
```python
# Rotate last 3 pages
await client.rotate(pdf, 90, pages={'start': -3, 'end': -1})

# Delete first and last pages
await client.delete_pages(pdf, [0, -1])

# Split: keep middle pages, excluding first and last
await client.split(pdf, [{'start': 1, 'end': -2}])
```

### 4. Removed from Public API

- `process_remote_file_input()` - No longer needed (URLs passed to server)
- `get_pdf_page_count()` - Use negative indices instead
- `is_valid_pdf()` - Let server validate (internal use only)

**Still Available:**
- `is_remote_file_input()` - Helper to detect if input is a URL (still public)
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ nutrient_dws_scripts = [

[project]
name = "nutrient-dws"
version = "2.0.0"
version = "3.0.0"
description = "Python client library for Nutrient Document Web Services API"
readme = "README.md"
requires-python = ">=3.10"
Expand Down
6 changes: 4 additions & 2 deletions src/nutrient_dws/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,19 @@
ValidationError,
)
from nutrient_dws.inputs import (
FileInputWithUrl,
LocalFileInput,
is_remote_file_input,
process_file_input,
process_remote_file_input,
validate_file_input,
)
from nutrient_dws.utils import get_library_version, get_user_agent

__all__ = [
"APIError",
"AuthenticationError",
"FileInputWithUrl",
"LocalFileInput",
"NetworkError",
"NutrientClient",
"NutrientError",
Expand All @@ -30,6 +33,5 @@
"get_user_agent",
"is_remote_file_input",
"process_file_input",
"process_remote_file_input",
"validate_file_input",
]
14 changes: 7 additions & 7 deletions src/nutrient_dws/builder/builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
NutrientClientOptions,
)
from nutrient_dws.inputs import (
FileInput,
FileInputWithUrl,
NormalizedFileData,
is_remote_file_input,
process_file_input,
Expand Down Expand Up @@ -76,16 +76,16 @@ def __init__(self, client_options: NutrientClientOptions) -> None:
"""
super().__init__(client_options)
self.build_instructions: BuildInstructions = {"parts": []}
self.assets: dict[str, FileInput] = {}
self.assets: dict[str, FileInputWithUrl] = {}
self.asset_index = 0
self.current_step = 0
self.is_executed = False

def _register_asset(self, asset: FileInput) -> str:
def _register_asset(self, asset: FileInputWithUrl) -> str:
"""Register an asset in the workflow and return its key for use in actions.

Args:
asset: The asset to register
asset: The asset to register (must be local, not URL)

Returns:
The asset key that can be used in BuildActions
Expand Down Expand Up @@ -188,7 +188,7 @@ def _cleanup(self) -> None:

def add_file_part(
self,
file: FileInput,
file: FileInputWithUrl,
options: FilePartOptions | None = None,
actions: list[ApplicableAction] | None = None,
) -> WorkflowWithPartsStage:
Expand Down Expand Up @@ -229,8 +229,8 @@ def add_file_part(

def add_html_part(
self,
html: FileInput,
assets: list[FileInput] | None = None,
html: FileInputWithUrl,
assets: list[FileInputWithUrl] | None = None,
options: HTMLPartOptions | None = None,
actions: list[ApplicableAction] | None = None,
) -> WorkflowWithPartsStage:
Expand Down
16 changes: 8 additions & 8 deletions src/nutrient_dws/builder/constant.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from collections.abc import Callable
from typing import Any, Literal, Protocol, TypeVar, cast

from nutrient_dws.inputs import FileInput
from nutrient_dws.inputs import FileInputWithUrl
from nutrient_dws.types.build_actions import (
ApplyInstantJsonAction,
ApplyRedactionsAction,
Expand Down Expand Up @@ -53,7 +53,7 @@ class ActionWithFileInput(Protocol):
"""Internal action type that holds FileInput for deferred registration."""

__needsFileRegistration: bool
fileInput: FileInput
fileInput: FileInputWithUrl
createAction: Callable[[FileHandle], BuildAction]


Expand Down Expand Up @@ -133,7 +133,7 @@ def watermark_text(

@staticmethod
def watermark_image(
image: FileInput, options: ImageWatermarkActionOptions | None = None
image: FileInputWithUrl, options: ImageWatermarkActionOptions | None = None
) -> ActionWithFileInput:
"""Create an image watermark action.

Expand Down Expand Up @@ -163,7 +163,7 @@ class ImageWatermarkActionWithFileInput(ActionWithFileInput):
__needsFileRegistration = True

def __init__(
self, file_input: FileInput, opts: ImageWatermarkActionOptions
self, file_input: FileInputWithUrl, opts: ImageWatermarkActionOptions
):
self.fileInput = file_input
self.options = opts
Expand Down Expand Up @@ -196,7 +196,7 @@ def flatten(annotation_ids: list[str | int] | None = None) -> FlattenAction:
return result

@staticmethod
def apply_instant_json(file: FileInput) -> ActionWithFileInput:
def apply_instant_json(file: FileInputWithUrl) -> ActionWithFileInput:
"""Create an apply Instant JSON action.

Args:
Expand All @@ -209,7 +209,7 @@ def apply_instant_json(file: FileInput) -> ActionWithFileInput:
class ApplyInstantJsonActionWithFileInput(ActionWithFileInput):
__needsFileRegistration = True

def __init__(self, file_input: FileInput):
def __init__(self, file_input: FileInputWithUrl):
self.fileInput = file_input

def createAction(self, fileHandle: FileHandle) -> ApplyInstantJsonAction:
Expand All @@ -222,7 +222,7 @@ def createAction(self, fileHandle: FileHandle) -> ApplyInstantJsonAction:

@staticmethod
def apply_xfdf(
file: FileInput, options: ApplyXfdfActionOptions | None = None
file: FileInputWithUrl, options: ApplyXfdfActionOptions | None = None
) -> ActionWithFileInput:
"""Create an apply XFDF action.

Expand All @@ -240,7 +240,7 @@ class ApplyXfdfActionWithFileInput(ActionWithFileInput):
__needsFileRegistration = True

def __init__(
self, file_input: FileInput, opts: ApplyXfdfActionOptions | None
self, file_input: FileInputWithUrl, opts: ApplyXfdfActionOptions | None
):
self.fileInput = file_input
self.options = opts or {}
Expand Down
8 changes: 4 additions & 4 deletions src/nutrient_dws/builder/staged_builders.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from nutrient_dws.types.build_actions import BuildAction

if TYPE_CHECKING:
from nutrient_dws.inputs import FileInput
from nutrient_dws.inputs import FileInputWithUrl
from nutrient_dws.types.analyze_response import AnalyzeBuildResponse
from nutrient_dws.types.build_output import (
ImageOutputOptions,
Expand Down Expand Up @@ -114,7 +114,7 @@ class WorkflowInitialStage(ABC):
@abstractmethod
def add_file_part(
self,
file: FileInput,
file: FileInputWithUrl,
options: FilePartOptions | None = None,
actions: list[ApplicableAction] | None = None,
) -> WorkflowWithPartsStage:
Expand All @@ -124,8 +124,8 @@ def add_file_part(
@abstractmethod
def add_html_part(
self,
html: FileInput,
assets: list[FileInput] | None = None,
html: FileInputWithUrl,
assets: list[FileInputWithUrl] | None = None,
options: HTMLPartOptions | None = None,
actions: list[ApplicableAction] | None = None,
) -> WorkflowWithPartsStage:
Expand Down
Loading
Loading