|
| 1 | +# aiohttp |
| 2 | +`aiohttp` 是一个用于 Python 的异步 HTTP 客户端/服务器框架,基于 `asyncio`,非常适合需要高并发网络请求的场景,例如爬虫、API 调用等。下面我会介绍 `aiohttp` 的基本用法,包括安装、发送异步请求、处理响应,以及常见的使用场景。 |
| 3 | + |
| 4 | +--- |
| 5 | + |
| 6 | +### 1. **安装 `aiohttp`** |
| 7 | +首先,确保你已安装 `aiohttp`: |
| 8 | +```bash |
| 9 | +pip install aiohttp |
| 10 | +``` |
| 11 | + |
| 12 | +如果需要加速(可选),可以安装 `aiodns` 和 `cchardet`: |
| 13 | +```bash |
| 14 | +pip install aiodns cchardet |
| 15 | +``` |
| 16 | + |
| 17 | +--- |
| 18 | + |
| 19 | +### 2. **基本用法:发送异步 HTTP 请求** |
| 20 | + |
| 21 | +`aiohttp` 使用 `asyncio` 的事件循环,必须在异步函数中使用。下面是一个简单的示例,展示如何发送一个 GET 请求并获取响应: |
| 22 | + |
| 23 | +```python |
| 24 | +import aiohttp |
| 25 | +import asyncio |
| 26 | + |
| 27 | +async def fetch_data(url): |
| 28 | + # 创建一个 ClientSession |
| 29 | + async with aiohttp.ClientSession() as session: |
| 30 | + # 发送 GET 请求 |
| 31 | + async with session.get(url) as response: |
| 32 | + # 检查状态码 |
| 33 | + if response.status != 200: |
| 34 | + print(f"Error: Status {response.status}") |
| 35 | + return None |
| 36 | + # 读取响应内容(文本) |
| 37 | + data = await response.text() |
| 38 | + return data |
| 39 | + |
| 40 | +async def main(): |
| 41 | + url = "https://api.example.com/data" |
| 42 | + result = await fetch_data(url) |
| 43 | + print(result) |
| 44 | + |
| 45 | +# 运行事件循环 |
| 46 | +if __name__ == "__main__": |
| 47 | + asyncio.run(main()) |
| 48 | +``` |
| 49 | + |
| 50 | +--- |
| 51 | + |
| 52 | +### 3. **核心概念** |
| 53 | + |
| 54 | +#### (1) **`ClientSession`** |
| 55 | +- `ClientSession` 是 `aiohttp` 的核心对象,用于管理 HTTP 连接。 |
| 56 | +- 建议在一个请求会话中重用 `ClientSession`,而不是为每个请求创建新会话。 |
| 57 | +- 使用 `async with` 确保正确关闭会话。 |
| 58 | + |
| 59 | +#### (2) **请求方法** |
| 60 | +`aiohttp.ClientSession` 支持多种 HTTP 方法: |
| 61 | +- `session.get(url)`:GET 请求 |
| 62 | +- `session.post(url, data)`:POST 请求 |
| 63 | +- `session.put(url, data)`:PUT 请求 |
| 64 | +- `session.delete(url)`:DELETE 请求 |
| 65 | + |
| 66 | +#### (3) **响应处理** |
| 67 | +- `response.status`:获取状态码。 |
| 68 | +- `await response.text()`:读取响应内容为字符串。 |
| 69 | +- `await response.json()`:读取 JSON 格式的响应。 |
| 70 | +- `await response.read()`:读取原始字节数据。 |
| 71 | + |
| 72 | +--- |
| 73 | + |
| 74 | +### 4. **发送 POST 请求** |
| 75 | + |
| 76 | +以下是一个发送 POST 请求的示例,适用于需要提交数据的场景(如登录、表单提交): |
| 77 | + |
| 78 | +```python |
| 79 | +import aiohttp |
| 80 | +import asyncio |
| 81 | + |
| 82 | +async def post_data(url, payload): |
| 83 | + async with aiohttp.ClientSession() as session: |
| 84 | + async with session.post(url, json=payload) as response: |
| 85 | + if response.status != 200: |
| 86 | + print(f"Error: Status {response.status}") |
| 87 | + return None |
| 88 | + data = await response.json() |
| 89 | + return data |
| 90 | + |
| 91 | +async def main(): |
| 92 | + url = "https://api.example.com/submit" |
| 93 | + payload = {"username": "user", "password": "pass"} |
| 94 | + result = await post_data(url, payload) |
| 95 | + print(result) |
| 96 | + |
| 97 | +if __name__ == "__main__": |
| 98 | + asyncio.run(main()) |
| 99 | +``` |
| 100 | + |
| 101 | +--- |
| 102 | + |
| 103 | +### 5. **并发请求** |
| 104 | + |
| 105 | +`aiohttp` 的最大优势是支持并发请求,可以显著提高效率。以下是一个并发获取多个 URL 的示例: |
| 106 | + |
| 107 | +```python |
| 108 | +import aiohttp |
| 109 | +import asyncio |
| 110 | + |
| 111 | +async def fetch_data(session, url): |
| 112 | + async with session.get(url) as response: |
| 113 | + if response.status != 200: |
| 114 | + return None |
| 115 | + return await response.text() |
| 116 | + |
| 117 | +async def main(): |
| 118 | + urls = [ |
| 119 | + "https://api.example.com/data1", |
| 120 | + "https://api.example.com/data2", |
| 121 | + "https://api.example.com/data3" |
| 122 | + ] |
| 123 | + |
| 124 | + async with aiohttp.ClientSession() as session: |
| 125 | + # 使用 asyncio.gather 并发执行多个请求 |
| 126 | + tasks = [fetch_data(session, url) for url in urls] |
| 127 | + results = await asyncio.gather(*tasks, return_exceptions=True) |
| 128 | + |
| 129 | + for url, result in zip(urls, results): |
| 130 | + if result: |
| 131 | + print(f"Data from {url}: {result[:100]}...") |
| 132 | + else: |
| 133 | + print(f"Failed to fetch {url}") |
| 134 | + |
| 135 | +if __name__ == "__main__": |
| 136 | + asyncio.run(main()) |
| 137 | +``` |
| 138 | + |
| 139 | +--- |
| 140 | + |
| 141 | +### 6. **设置请求头和超时** |
| 142 | + |
| 143 | +你可以通过参数设置请求头、超时等: |
| 144 | + |
| 145 | +```python |
| 146 | +import aiohttp |
| 147 | +import asyncio |
| 148 | + |
| 149 | +async def fetch_with_headers(url): |
| 150 | + # 设置超时(总超时 10 秒,连接超时 5 秒) |
| 151 | + timeout = aiohttp.ClientTimeout(total=10, connect=5) |
| 152 | + |
| 153 | + # 设置请求头 |
| 154 | + headers = { |
| 155 | + "User-Agent": "MyApp/1.0", |
| 156 | + "Authorization": "Bearer your-token" |
| 157 | + } |
| 158 | + |
| 159 | + async with aiohttp.ClientSession(timeout=timeout) as session: |
| 160 | + async with session.get(url, headers=headers) as response: |
| 161 | + if response.status != 200: |
| 162 | + print(f"Error: Status {response.status}") |
| 163 | + return None |
| 164 | + return await response.json() |
| 165 | + |
| 166 | +async def main(): |
| 167 | + url = "https://api.example.com/protected" |
| 168 | + result = await fetch_with_headers(url) |
| 169 | + print(result) |
| 170 | + |
| 171 | +if __name__ == "__main__": |
| 172 | + asyncio.run(main()) |
| 173 | +``` |
| 174 | + |
| 175 | +--- |
| 176 | + |
| 177 | +### 7. **错误处理** |
| 178 | + |
| 179 | +异步请求可能会遇到网络错误、超时等,需要妥善处理: |
| 180 | + |
| 181 | +```python |
| 182 | +import aiohttp |
| 183 | +import asyncio |
| 184 | + |
| 185 | +async def fetch_with_error_handling(url): |
| 186 | + try: |
| 187 | + async with aiohttp.ClientSession() as session: |
| 188 | + async with session.get(url) as response: |
| 189 | + response.raise_for_status() # 抛出异常如果状态码不是 200 |
| 190 | + return await response.json() |
| 191 | + except aiohttp.ClientError as e: |
| 192 | + print(f"Client error: {e}") |
| 193 | + return None |
| 194 | + except asyncio.TimeoutError: |
| 195 | + print("Request timed out") |
| 196 | + return None |
| 197 | + except Exception as e: |
| 198 | + print(f"Unexpected error: {e}") |
| 199 | + return None |
| 200 | + |
| 201 | +async def main(): |
| 202 | + url = "https://invalid-url.example.com" |
| 203 | + result = await fetch_with_error_handling(url) |
| 204 | + print(result) |
| 205 | + |
| 206 | +if __name__ == "__main__": |
| 207 | + asyncio.run(main()) |
| 208 | +``` |
| 209 | + |
| 210 | +--- |
| 211 | + |
| 212 | +### 8. **常见使用场景** |
| 213 | + |
| 214 | +#### (1) **爬虫** |
| 215 | +- 使用 `aiohttp` 并发爬取多个网页,效率远高于同步请求(例如 `requests`)。 |
| 216 | + |
| 217 | +#### (2) **API 调用** |
| 218 | +- 调用 RESTful API,发送 GET、POST 等请求,获取 JSON 数据。 |
| 219 | + |
| 220 | +#### (3) **文件下载** |
| 221 | +- 使用 `response.read()` 下载文件: |
| 222 | + ```python |
| 223 | + async def download_file(url, filename): |
| 224 | + async with aiohttp.ClientSession() as session: |
| 225 | + async with session.get(url) as response: |
| 226 | + with open(filename, 'wb') as f: |
| 227 | + f.write(await response.read()) |
| 228 | + ``` |
| 229 | + |
| 230 | +#### (4) **Web 服务器** |
| 231 | +- `aiohttp` 也可以用作服务器框架,创建异步 Web 应用: |
| 232 | + ```python |
| 233 | + from aiohttp import web |
| 234 | + |
| 235 | + async def handle(request): |
| 236 | + return web.Response(text="Hello, world!") |
| 237 | + |
| 238 | + app = web.Application() |
| 239 | + app.add_routes([web.get('/', handle)]) |
| 240 | + web.run_app(app, port=8080) |
| 241 | + ``` |
| 242 | + |
| 243 | +--- |
| 244 | + |
| 245 | +### 9. **注意事项** |
| 246 | + |
| 247 | +1. **事件循环**: |
| 248 | + - 必须在 `asyncio.run()` 或事件循环中运行异步函数。 |
| 249 | + - 不要在 Jupyter Notebook 中直接使用 `asyncio.run()`,而是使用 `await` 或 `loop.run_until_complete()`。 |
| 250 | + |
| 251 | +2. **连接管理**: |
| 252 | + - 始终使用 `async with` 确保 `ClientSession` 正确关闭。 |
| 253 | + - 避免创建过多的 `ClientSession`,尽量重用。 |
| 254 | + |
| 255 | +3. **并发限制**: |
| 256 | + - 并发请求过多可能导致服务器拒绝(429 Too Many Requests)。 |
| 257 | + - 使用 `aiohttp.TCPConnector` 限制并发: |
| 258 | + ```python |
| 259 | + connector = aiohttp.TCPConnector(limit=50) # 限制最大并发为 50 |
| 260 | + async with aiohttp.ClientSession(connector=connector) as session: |
| 261 | + # ... |
| 262 | + ``` |
| 263 | + |
| 264 | +4. **性能优化**: |
| 265 | + - 使用 `aiodns` 加速 DNS 解析。 |
| 266 | + - 使用 `cchardet` 加速字符编码检测。 |
| 267 | + |
| 268 | +--- |
| 269 | + |
| 270 | +### 总结 |
| 271 | + |
| 272 | +- **安装**:`pip install aiohttp` |
| 273 | +- **核心**:`ClientSession` 用于管理请求,`async/await` 实现异步。 |
| 274 | +- **功能**:支持 GET、POST 等请求,处理 JSON、文本等响应。 |
| 275 | +- **优势**:高并发,适合爬虫、API 调用等场景。 |
| 276 | +- **注意**:正确管理事件循环和连接,避免资源泄漏。 |
| 277 | + |
0 commit comments