Skip to content

Commit ca100c6

Browse files
committed
Release v0.7.6: The 0.7.6 Update
- Updated version to 0.7.6 - Added comprehensive demo and release notes - Updated all documentation - Update the veriosn in Dockerfile to 0.7.6
1 parent 7e8fb3a commit ca100c6

File tree

9 files changed

+1021
-19
lines changed

9 files changed

+1021
-19
lines changed

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
FROM python:3.12-slim-bookworm AS build
22

33
# C4ai version
4-
ARG C4AI_VER=0.7.0-r1
4+
ARG C4AI_VER=0.7.6
55
ENV C4AI_VERSION=$C4AI_VER
66
LABEL c4ai.version=$C4AI_VER
77

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,13 +27,13 @@
2727

2828
Crawl4AI turns the web into clean, LLM ready Markdown for RAG, agents, and data pipelines. Fast, controllable, battle tested by a 50k+ star community.
2929

30-
[✨ Check out latest update v0.7.5](#-recent-updates)
30+
[✨ Check out latest update v0.7.6](#-recent-updates)
3131

32-
✨ New in v0.7.5: Docker Hooks System with function-based API for pipeline customization, Enhanced LLM Integration with custom providers, HTTPS Preservation, and multiple community-reported bug fixes. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.5.md)
32+
**New in v0.7.6**: Complete Webhook Infrastructure for Docker Job Queue API! Real-time notifications for both `/crawl/job` and `/llm/job` endpoints with exponential backoff retry, custom headers, and flexible delivery modes. No more polling! [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.6.md)
3333

34-
✨ Recent v0.7.4: Revolutionary LLM Table Extraction with intelligent chunking, enhanced concurrency fixes, memory management refactor, and critical stability improvements. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.4.md)
34+
✨ Recent v0.7.5: Docker Hooks System with function-based API for pipeline customization, Enhanced LLM Integration with custom providers, HTTPS Preservation, and multiple community-reported bug fixes. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.5.md)
3535

36-
✨ Previous v0.7.3: Undetected Browser Support, Multi-URL Configurations, Memory Monitoring, Enhanced Table Extraction, GitHub Sponsors. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.3.md)
36+
✨ Previous v0.7.4: Revolutionary LLM Table Extraction with intelligent chunking, enhanced concurrency fixes, memory management refactor, and critical stability improvements. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.4.md)
3737

3838
<details>
3939
<summary>🤓 <strong>My Personal Story</strong></summary>

crawl4ai/__version__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# crawl4ai/__version__.py
22

33
# This is the version that will be used for stable releases
4-
__version__ = "0.7.5"
4+
__version__ = "0.7.6"
55

66
# For nightly builds, this gets set during build process
77
__nightly_version__ = None

deploy/docker/README.md

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -59,15 +59,13 @@ Pull and run images directly from Docker Hub without building locally.
5959

6060
#### 1. Pull the Image
6161

62-
Our latest release candidate is `0.7.0-r1`. Images are built with multi-arch manifests, so Docker automatically pulls the correct version for your system.
63-
64-
> ⚠️ **Important Note**: The `latest` tag currently points to the stable `0.6.0` version. After testing and validation, `0.7.0` (without -r1) will be released and `latest` will be updated. For now, please use `0.7.0-r1` to test the new features.
62+
Our latest stable release is `0.7.6`. Images are built with multi-arch manifests, so Docker automatically pulls the correct version for your system.
6563

6664
```bash
67-
# Pull the release candidate (for testing new features)
68-
docker pull unclecode/crawl4ai:0.7.0-r1
65+
# Pull the latest stable version (0.7.6)
66+
docker pull unclecode/crawl4ai:0.7.6
6967

70-
# Or pull the current stable version (0.6.0)
68+
# Or use the latest tag (points to 0.7.6)
7169
docker pull unclecode/crawl4ai:latest
7270
```
7371

@@ -102,7 +100,7 @@ EOL
102100
-p 11235:11235 \
103101
--name crawl4ai \
104102
--shm-size=1g \
105-
unclecode/crawl4ai:0.7.0-r1
103+
unclecode/crawl4ai:0.7.6
106104
```
107105

108106
* **With LLM support:**
@@ -113,7 +111,7 @@ EOL
113111
--name crawl4ai \
114112
--env-file .llm.env \
115113
--shm-size=1g \
116-
unclecode/crawl4ai:0.7.0-r1
114+
unclecode/crawl4ai:0.7.6
117115
```
118116

119117
> The server will be available at `http://localhost:11235`. Visit `/playground` to access the interactive testing interface.
@@ -186,7 +184,7 @@ The `docker-compose.yml` file in the project root provides a simplified approach
186184
```bash
187185
# Pulls and runs the release candidate from Docker Hub
188186
# Automatically selects the correct architecture
189-
IMAGE=unclecode/crawl4ai:0.7.0-r1 docker compose up -d
187+
IMAGE=unclecode/crawl4ai:0.7.6 docker compose up -d
190188
```
191189

192190
* **Build and Run Locally:**

docs/blog/release-v0.7.6.md

Lines changed: 314 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,314 @@
1+
# Crawl4AI v0.7.6 Release Notes
2+
3+
*Release Date: October 22, 2025*
4+
5+
I'm excited to announce Crawl4AI v0.7.6, featuring a complete webhook infrastructure for the Docker job queue API! This release eliminates polling and brings real-time notifications to both crawling and LLM extraction workflows.
6+
7+
## 🎯 What's New
8+
9+
### Webhook Support for Docker Job Queue API
10+
11+
The headline feature of v0.7.6 is comprehensive webhook support for asynchronous job processing. No more constant polling to check if your jobs are done - get instant notifications when they complete!
12+
13+
**Key Capabilities:**
14+
15+
-**Universal Webhook Support**: Both `/crawl/job` and `/llm/job` endpoints now support webhooks
16+
-**Flexible Delivery Modes**: Choose notification-only or include full data in the webhook payload
17+
-**Reliable Delivery**: Exponential backoff retry mechanism (5 attempts: 1s → 2s → 4s → 8s → 16s)
18+
-**Custom Authentication**: Add custom headers for webhook authentication
19+
-**Global Configuration**: Set default webhook URL in `config.yml` for all jobs
20+
-**Task Type Identification**: Distinguish between `crawl` and `llm_extraction` tasks
21+
22+
### How It Works
23+
24+
Instead of constantly checking job status:
25+
26+
**OLD WAY (Polling):**
27+
```python
28+
# Submit job
29+
response = requests.post("http://localhost:11235/crawl/job", json=payload)
30+
task_id = response.json()['task_id']
31+
32+
# Poll until complete
33+
while True:
34+
status = requests.get(f"http://localhost:11235/crawl/job/{task_id}")
35+
if status.json()['status'] == 'completed':
36+
break
37+
time.sleep(5) # Wait and try again
38+
```
39+
40+
**NEW WAY (Webhooks):**
41+
```python
42+
# Submit job with webhook
43+
payload = {
44+
"urls": ["https://example.com"],
45+
"webhook_config": {
46+
"webhook_url": "https://myapp.com/webhook",
47+
"webhook_data_in_payload": True
48+
}
49+
}
50+
response = requests.post("http://localhost:11235/crawl/job", json=payload)
51+
52+
# Done! Webhook will notify you when complete
53+
# Your webhook handler receives the results automatically
54+
```
55+
56+
### Crawl Job Webhooks
57+
58+
```bash
59+
curl -X POST http://localhost:11235/crawl/job \
60+
-H "Content-Type: application/json" \
61+
-d '{
62+
"urls": ["https://example.com"],
63+
"browser_config": {"headless": true},
64+
"crawler_config": {"cache_mode": "bypass"},
65+
"webhook_config": {
66+
"webhook_url": "https://myapp.com/webhooks/crawl-complete",
67+
"webhook_data_in_payload": false,
68+
"webhook_headers": {
69+
"X-Webhook-Secret": "your-secret-token"
70+
}
71+
}
72+
}'
73+
```
74+
75+
### LLM Extraction Job Webhooks (NEW!)
76+
77+
```bash
78+
curl -X POST http://localhost:11235/llm/job \
79+
-H "Content-Type: application/json" \
80+
-d '{
81+
"url": "https://example.com/article",
82+
"q": "Extract the article title, author, and publication date",
83+
"schema": "{\"type\":\"object\",\"properties\":{\"title\":{\"type\":\"string\"}}}",
84+
"provider": "openai/gpt-4o-mini",
85+
"webhook_config": {
86+
"webhook_url": "https://myapp.com/webhooks/llm-complete",
87+
"webhook_data_in_payload": true
88+
}
89+
}'
90+
```
91+
92+
### Webhook Payload Structure
93+
94+
**Success (with data):**
95+
```json
96+
{
97+
"task_id": "llm_1698765432",
98+
"task_type": "llm_extraction",
99+
"status": "completed",
100+
"timestamp": "2025-10-22T10:30:00.000000+00:00",
101+
"urls": ["https://example.com/article"],
102+
"data": {
103+
"extracted_content": {
104+
"title": "Understanding Web Scraping",
105+
"author": "John Doe",
106+
"date": "2025-10-22"
107+
}
108+
}
109+
}
110+
```
111+
112+
**Failure:**
113+
```json
114+
{
115+
"task_id": "crawl_abc123",
116+
"task_type": "crawl",
117+
"status": "failed",
118+
"timestamp": "2025-10-22T10:30:00.000000+00:00",
119+
"urls": ["https://example.com"],
120+
"error": "Connection timeout after 30s"
121+
}
122+
```
123+
124+
### Simple Webhook Handler Example
125+
126+
```python
127+
from flask import Flask, request, jsonify
128+
129+
app = Flask(__name__)
130+
131+
@app.route('/webhook', methods=['POST'])
132+
def handle_webhook():
133+
payload = request.json
134+
135+
task_id = payload['task_id']
136+
task_type = payload['task_type']
137+
status = payload['status']
138+
139+
if status == 'completed':
140+
if 'data' in payload:
141+
# Process data directly
142+
data = payload['data']
143+
else:
144+
# Fetch from API
145+
endpoint = 'crawl' if task_type == 'crawl' else 'llm'
146+
response = requests.get(f'http://localhost:11235/{endpoint}/job/{task_id}')
147+
data = response.json()
148+
149+
# Your business logic here
150+
print(f"Job {task_id} completed!")
151+
152+
elif status == 'failed':
153+
error = payload.get('error', 'Unknown error')
154+
print(f"Job {task_id} failed: {error}")
155+
156+
return jsonify({"status": "received"}), 200
157+
158+
app.run(port=8080)
159+
```
160+
161+
## 📊 Performance Improvements
162+
163+
- **Reduced Server Load**: Eliminates constant polling requests
164+
- **Lower Latency**: Instant notification vs. polling interval delay
165+
- **Better Resource Usage**: Frees up client connections while jobs run in background
166+
- **Scalable Architecture**: Handles high-volume crawling workflows efficiently
167+
168+
## 🐛 Bug Fixes
169+
170+
- Fixed webhook configuration serialization for Pydantic HttpUrl fields
171+
- Improved error handling in webhook delivery service
172+
- Enhanced Redis task storage for webhook config persistence
173+
174+
## 🌍 Expected Real-World Impact
175+
176+
### For Web Scraping Workflows
177+
- **Reduced Costs**: Less API calls = lower bandwidth and server costs
178+
- **Better UX**: Instant notifications improve user experience
179+
- **Scalability**: Handle 100s of concurrent jobs without polling overhead
180+
181+
### For LLM Extraction Pipelines
182+
- **Async Processing**: Submit LLM extraction jobs and move on
183+
- **Batch Processing**: Queue multiple extractions, get notified as they complete
184+
- **Integration**: Easy integration with workflow automation tools (Zapier, n8n, etc.)
185+
186+
### For Microservices
187+
- **Event-Driven**: Perfect for event-driven microservice architectures
188+
- **Decoupling**: Decouple job submission from result processing
189+
- **Reliability**: Automatic retries ensure webhooks are delivered
190+
191+
## 🔄 Breaking Changes
192+
193+
**None!** This release is fully backward compatible.
194+
195+
- Webhook configuration is optional
196+
- Existing code continues to work without modification
197+
- Polling is still supported for jobs without webhook config
198+
199+
## 📚 Documentation
200+
201+
### New Documentation
202+
- **[WEBHOOK_EXAMPLES.md](../deploy/docker/WEBHOOK_EXAMPLES.md)** - Comprehensive webhook usage guide
203+
- **[docker_webhook_example.py](../docs/examples/docker_webhook_example.py)** - Working code examples
204+
205+
### Updated Documentation
206+
- **[Docker README](../deploy/docker/README.md)** - Added webhook sections
207+
- API documentation with webhook examples
208+
209+
## 🛠️ Migration Guide
210+
211+
No migration needed! Webhooks are opt-in:
212+
213+
1. **To use webhooks**: Add `webhook_config` to your job payload
214+
2. **To keep polling**: Continue using your existing code
215+
216+
### Quick Start
217+
218+
```python
219+
# Just add webhook_config to your existing payload
220+
payload = {
221+
# Your existing configuration
222+
"urls": ["https://example.com"],
223+
"browser_config": {...},
224+
"crawler_config": {...},
225+
226+
# NEW: Add webhook configuration
227+
"webhook_config": {
228+
"webhook_url": "https://myapp.com/webhook",
229+
"webhook_data_in_payload": True
230+
}
231+
}
232+
```
233+
234+
## 🔧 Configuration
235+
236+
### Global Webhook Configuration (config.yml)
237+
238+
```yaml
239+
webhooks:
240+
enabled: true
241+
default_url: "https://myapp.com/webhooks/default" # Optional
242+
data_in_payload: false
243+
retry:
244+
max_attempts: 5
245+
initial_delay_ms: 1000
246+
max_delay_ms: 32000
247+
timeout_ms: 30000
248+
headers:
249+
User-Agent: "Crawl4AI-Webhook/1.0"
250+
```
251+
252+
## 🚀 Upgrade Instructions
253+
254+
### Docker
255+
256+
```bash
257+
# Pull the latest image
258+
docker pull unclecode/crawl4ai:0.7.6
259+
260+
# Or use latest tag
261+
docker pull unclecode/crawl4ai:latest
262+
263+
# Run with webhook support
264+
docker run -d \
265+
-p 11235:11235 \
266+
--env-file .llm.env \
267+
--name crawl4ai \
268+
unclecode/crawl4ai:0.7.6
269+
```
270+
271+
### Python Package
272+
273+
```bash
274+
pip install --upgrade crawl4ai
275+
```
276+
277+
## 💡 Pro Tips
278+
279+
1. **Use notification-only mode** for large results - fetch data separately to avoid large webhook payloads
280+
2. **Set custom headers** for webhook authentication and request tracking
281+
3. **Configure global default webhook** for consistent handling across all jobs
282+
4. **Implement idempotent webhook handlers** - same webhook may be delivered multiple times on retry
283+
5. **Use structured schemas** with LLM extraction for predictable webhook data
284+
285+
## 🎬 Demo
286+
287+
Try the release demo:
288+
289+
```bash
290+
python docs/releases_review/demo_v0.7.6.py
291+
```
292+
293+
This comprehensive demo showcases:
294+
- Crawl job webhooks (notification-only and with data)
295+
- LLM extraction webhooks (with JSON schema support)
296+
- Custom headers for authentication
297+
- Webhook retry mechanism
298+
- Real-time webhook receiver
299+
300+
## 🙏 Acknowledgments
301+
302+
Thank you to the community for the feedback that shaped this feature! Special thanks to everyone who requested webhook support for asynchronous job processing.
303+
304+
## 📞 Support
305+
306+
- **Documentation**: https://docs.crawl4ai.com
307+
- **GitHub Issues**: https://github.com/unclecode/crawl4ai/issues
308+
- **Discord**: https://discord.gg/crawl4ai
309+
310+
---
311+
312+
**Happy crawling with webhooks!** 🕷️🪝
313+
314+
*- unclecode*

0 commit comments

Comments
 (0)