Skip to content

[Feature][Filesystem] Optimize OOM issue when querying large file results via Openfile API #5313

@v-kkhuang

Description

@v-kkhuang

Linkis Component

linkis-public-enhancements/linkis-ps-public-service

What happened

English:

When using the Openfile API to query large file result sets, the service encounters Out Of Memory (OOM) errors due to loading the entire file content into memory at once.

Problem Description:
The current implementation loads the complete file content into memory before returning results, which causes OOM when dealing with large files (e.g., files over 1GB). This affects system stability and prevents users from accessing large result sets.


中文:

使用Openfile接口查询大文件结果集时,由于一次性将整个文件内容加载到内存中,导致服务出现内存溢出(OOM)错误。

问题描述:
当前实现在返回结果前会将完整的文件内容加载到内存中,当处理大文件(如超过1GB的文件)时会导致OOM。这影响了系统稳定性,并阻止用户访问大型结果集。

What you expected to happen

English:

The Openfile API should implement streaming processing to handle large files without OOM:

  1. Use streaming/chunked reading instead of loading entire file into memory
  2. Support pagination or range-based queries for large result sets
  3. Implement memory-efficient file processing with configurable buffer sizes
  4. Add file size limits and warnings for extremely large files

中文:

Openfile API应该实现流式处理来处理大文件而不会OOM:

  1. 使用流式/分块读取而不是将整个文件加载到内存
  2. 支持大结果集的分页或基于范围的查询
  3. 实现内存高效的文件处理,具有可配置的缓冲区大小
  4. 为超大文件添加文件大小限制和警告

How to reproduce

English:

  1. Upload or generate a large result file (>1GB) in Linkis
  2. Call Openfile API to query the file content
  3. Monitor service memory usage
  4. Observe OOM error when processing large files

中文:

  1. 在Linkis中上传或生成大型结果文件(>1GB)
  2. 调用Openfile API查询文件内容
  3. 监控服务内存使用情况
  4. 观察处理大文件时出现的OOM错误

Anything else

English:

Suggested Implementation:

  1. Streaming Read: Use BufferedInputStream or NIO FileChannel for chunked reading
  2. Pagination Support: Add offset and limit parameters to API
  3. Response Streaming: Use HTTP streaming response instead of loading full content
  4. Memory Limits: Configure max file size that can be read at once
  5. Async Processing: Consider async file processing for very large files

Technical Approach:

// Example: Streaming file read
try (InputStream input = new BufferedInputStream(new FileInputStream(file));
     OutputStream output = response.getOutputStream()) {
    byte[] buffer = new byte[8192];
    int bytesRead;
    while ((bytesRead = input.read(buffer)) != -1) {
        output.write(buffer, 0, bytesRead);
        output.flush();
    }
}

中文:

建议实现方案:

  1. 流式读取:使用BufferedInputStream或NIO FileChannel进行分块读取
  2. 分页支持:向API添加offsetlimit参数
  3. 响应流式传输:使用HTTP流式响应而不是加载完整内容
  4. 内存限制:配置一次可以读取的最大文件大小
  5. 异步处理:考虑对超大文件进行异步文件处理

技术方案:

// 示例:流式文件读取
try (InputStream input = new BufferedInputStream(new FileInputStream(file));
     OutputStream output = response.getOutputStream()) {
    byte[] buffer = new byte[8192];
    int bytesRead;
    while ((bytesRead = input.read(buffer)) != -1) {
        output.write(buffer, 0, bytesRead);
        output.flush();
    }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions