-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Linkis Component
linkis-public-enhancements/linkis-ps-public-service
What happened
English:
When using the Openfile API to query large file result sets, the service encounters Out Of Memory (OOM) errors due to loading the entire file content into memory at once.
Problem Description:
The current implementation loads the complete file content into memory before returning results, which causes OOM when dealing with large files (e.g., files over 1GB). This affects system stability and prevents users from accessing large result sets.
中文:
使用Openfile接口查询大文件结果集时,由于一次性将整个文件内容加载到内存中,导致服务出现内存溢出(OOM)错误。
问题描述:
当前实现在返回结果前会将完整的文件内容加载到内存中,当处理大文件(如超过1GB的文件)时会导致OOM。这影响了系统稳定性,并阻止用户访问大型结果集。
What you expected to happen
English:
The Openfile API should implement streaming processing to handle large files without OOM:
- Use streaming/chunked reading instead of loading entire file into memory
- Support pagination or range-based queries for large result sets
- Implement memory-efficient file processing with configurable buffer sizes
- Add file size limits and warnings for extremely large files
中文:
Openfile API应该实现流式处理来处理大文件而不会OOM:
- 使用流式/分块读取而不是将整个文件加载到内存
- 支持大结果集的分页或基于范围的查询
- 实现内存高效的文件处理,具有可配置的缓冲区大小
- 为超大文件添加文件大小限制和警告
How to reproduce
English:
- Upload or generate a large result file (>1GB) in Linkis
- Call Openfile API to query the file content
- Monitor service memory usage
- Observe OOM error when processing large files
中文:
- 在Linkis中上传或生成大型结果文件(>1GB)
- 调用Openfile API查询文件内容
- 监控服务内存使用情况
- 观察处理大文件时出现的OOM错误
Anything else
English:
Suggested Implementation:
- Streaming Read: Use
BufferedInputStreamor NIOFileChannelfor chunked reading - Pagination Support: Add
offsetandlimitparameters to API - Response Streaming: Use HTTP streaming response instead of loading full content
- Memory Limits: Configure max file size that can be read at once
- Async Processing: Consider async file processing for very large files
Technical Approach:
// Example: Streaming file read
try (InputStream input = new BufferedInputStream(new FileInputStream(file));
OutputStream output = response.getOutputStream()) {
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.read(buffer)) != -1) {
output.write(buffer, 0, bytesRead);
output.flush();
}
}中文:
建议实现方案:
- 流式读取:使用
BufferedInputStream或NIOFileChannel进行分块读取 - 分页支持:向API添加
offset和limit参数 - 响应流式传输:使用HTTP流式响应而不是加载完整内容
- 内存限制:配置一次可以读取的最大文件大小
- 异步处理:考虑对超大文件进行异步文件处理
技术方案:
// 示例:流式文件读取
try (InputStream input = new BufferedInputStream(new FileInputStream(file));
OutputStream output = response.getOutputStream()) {
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.read(buffer)) != -1) {
output.write(buffer, 0, bytesRead);
output.flush();
}
}