Remote Log Parsing via PostgreSQL Server Functions

## Problem Statement

Currently, pgwatch can only parse PostgreSQL server logs when it runs on the same host as the monitored database. This requires direct filesystem access to log files. This limitation prevents monitoring remote PostgreSQL instances for log-based metrics like `server_log_event_counts`.

## Solution Overview

Enable pgwatch to parse PostgreSQL server logs remotely using PostgreSQL's Generic File Access Functions. These functions are available to users with the `pg_monitor` role (PostgreSQL 10+) and allow reading log files through SQL queries instead of direct filesystem access.

## Key PostgreSQL Functions to Use

- `pg_ls_logdir()` - List log files in the log directory
- `pg_read_file(filename, offset, length)` - Read portions of text files
- `pg_stat_file(filename)` - Get file metadata (size, modification time)

## Implementation Approach

### 1. Mode Detection and Selection

Create automatic detection logic that:
- Checks if pgwatch is on the same host as PostgreSQL (prefer local for performance)
- Tests if remote functions are available and accessible (PostgreSQL 10+, proper permissions)
- Falls back gracefully if remote access is not possible

### 2. Remote File Discovery

Replace filesystem operations with SQL-based equivalents:
- Use `pg_ls_logdir()` to list available log files (replaces `filepath.Glob()`)
- Use `pg_stat_file()` to get file sizes and modification times (replaces `os.Stat()`)
- Identify the latest log file based on modification timestamp

### 3. Remote File Reading

Implement reading log content through SQL queries:
- Track read position (byte offset) in memory between intervals
- Use `pg_read_file(path, offset, length)` to read new content since last check
- Parse CSV log lines using existing regex logic
- Handle log rotation by detecting new files and resetting position

### 4. Refactor Main ParseLogs Function

Split current implementation into two paths:
- `parseLogsLocal()` - Current filesystem-based approach (renamed, unchanged)
- `parseLogsRemote()` - New SQL-based approach
- Main `ParseLogs()` function detects mode and calls appropriate implementation

### 5. Configuration

Maintain backward compatibility (no config changes required for existing setups)

### 6. Documentation Updates

Update user documentation to cover:
- Remote log parsing capability and requirements
- Required permissions (`pg_monitor` role)
- Performance considerations and recommended intervals
- Limitations and supported log formats

## Testing Requirements

### Unit Tests
- Mock PostgreSQL responses for `pg_ls_logdir()`, `pg_read_file()`, `pg_stat_file()`
- Test mode detection logic
- Test file discovery and reading with various scenarios
- Verify regex parsing works identically for both remote and local modes

### Integration Tests
- Set up remote PostgreSQL container
- Generate test log entries
- Verify metrics are collected correctly
- Test permission denied scenarios
- Test fallback behavior when functions unavailable
- Test log rotation handling

## Performance Considerations

- Remote parsing will have higher latency than local filesystem access
- Recommend longer collection intervals for remote mode (e.g., 120s vs 60s)
- Read log content in reasonable chunks to balance network overhead and memory usage
- Works best with frequent log rotation (smaller individual files)
- Consider using standby/replica for monitoring to reduce primary load

## Security Considerations

- Validate file paths to prevent directory traversal
- Handle permission errors gracefully without exposing system details
- Ensure logs don't contain sensitive data the monitoring user shouldn't access

## Limitations

- Requires PostgreSQL 10 or later (pgwatch is already v11+)
- Only supports CSVLOG format (same as current local parsing, check https://github.com/kmoppel/pgweasel for different formats and ideas)
- Reads log content in chunks, not true streaming
- May have slight delay compared to local parsing
- File access functions respect PostgreSQL's security restrictions

## Success Criteria

1. **Functional**: pgwatch successfully parses logs from remote PostgreSQL instances without filesystem access
2. **Compatible**: No breaking changes to existing local log parsing functionality
3. **Automatic**: Detects and uses appropriate method without requiring configuration
4. **Performant**: Adds less than 1 second overhead per collection interval
5. **Robust**: Handles errors gracefully with clear logging
6. **Documented**: Clear documentation with examples and troubleshooting guide
7. **Tested**: Comprehensive unit and integration test coverage
8. **Cloud-Ready**: Works with managed PostgreSQL services (AWS RDS, Azure Database)

## Open Questions for Discussion

1. **Reading Strategy**: Read entire new content since last position, or implement chunked streaming?
   - Recommendation: Simple approach first (read new content), optimize if needed

2. **Log Rotation Handling**: How to handle files that disappear or get renamed during reading?
   - Recommendation: Use `pg_stat_file()` to detect changes, gracefully handle errors

3. **Default Behavior**: Should remote parsing be attempted automatically if local fails?
   - Recommendation: Yes, with clear logging about which mode is active

4. **Large Files**: How to handle very large log files (multi-GB)?
   - Recommendation: Set reasonable maximum read size per interval or implement windowing

5. **Non-CSVLOG Support**: Should we support other log formats remotely?
   - Recommendation: CSVLOG only initially, expand based on user feedback

## References

- [PostgreSQL Generic File Access Functions](https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADMIN-GENFILE)
- [PostgreSQL pg_monitor role](https://www.postgresql.org/docs/current/predefined-roles.html)
- Current implementation: `internal/metrics/logparse.go`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remote Log Parsing via PostgreSQL Server Functions #1070

Problem Statement

Solution Overview

Key PostgreSQL Functions to Use

Implementation Approach

1. Mode Detection and Selection

2. Remote File Discovery

3. Remote File Reading

4. Refactor Main ParseLogs Function

5. Configuration

6. Documentation Updates

Testing Requirements

Unit Tests

Integration Tests

Performance Considerations

Security Considerations

Limitations

Success Criteria

Open Questions for Discussion

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Remote Log Parsing via PostgreSQL Server Functions #1070

Description

Problem Statement

Solution Overview

Key PostgreSQL Functions to Use

Implementation Approach

1. Mode Detection and Selection

2. Remote File Discovery

3. Remote File Reading

4. Refactor Main ParseLogs Function

5. Configuration

6. Documentation Updates

Testing Requirements

Unit Tests

Integration Tests

Performance Considerations

Security Considerations

Limitations

Success Criteria

Open Questions for Discussion

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions