-
Notifications
You must be signed in to change notification settings - Fork 66
Description
Problem Statement
Currently, pgwatch can only parse PostgreSQL server logs when it runs on the same host as the monitored database. This requires direct filesystem access to log files. This limitation prevents monitoring remote PostgreSQL instances for log-based metrics like server_log_event_counts.
Solution Overview
Enable pgwatch to parse PostgreSQL server logs remotely using PostgreSQL's Generic File Access Functions. These functions are available to users with the pg_monitor role (PostgreSQL 10+) and allow reading log files through SQL queries instead of direct filesystem access.
Key PostgreSQL Functions to Use
pg_ls_logdir()- List log files in the log directorypg_read_file(filename, offset, length)- Read portions of text filespg_stat_file(filename)- Get file metadata (size, modification time)
Implementation Approach
1. Mode Detection and Selection
Create automatic detection logic that:
- Checks if pgwatch is on the same host as PostgreSQL (prefer local for performance)
- Tests if remote functions are available and accessible (PostgreSQL 10+, proper permissions)
- Falls back gracefully if remote access is not possible
2. Remote File Discovery
Replace filesystem operations with SQL-based equivalents:
- Use
pg_ls_logdir()to list available log files (replacesfilepath.Glob()) - Use
pg_stat_file()to get file sizes and modification times (replacesos.Stat()) - Identify the latest log file based on modification timestamp
3. Remote File Reading
Implement reading log content through SQL queries:
- Track read position (byte offset) in memory between intervals
- Use
pg_read_file(path, offset, length)to read new content since last check - Parse CSV log lines using existing regex logic
- Handle log rotation by detecting new files and resetting position
4. Refactor Main ParseLogs Function
Split current implementation into two paths:
parseLogsLocal()- Current filesystem-based approach (renamed, unchanged)parseLogsRemote()- New SQL-based approach- Main
ParseLogs()function detects mode and calls appropriate implementation
5. Configuration
Maintain backward compatibility (no config changes required for existing setups)
6. Documentation Updates
Update user documentation to cover:
- Remote log parsing capability and requirements
- Required permissions (
pg_monitorrole) - Performance considerations and recommended intervals
- Limitations and supported log formats
Testing Requirements
Unit Tests
- Mock PostgreSQL responses for
pg_ls_logdir(),pg_read_file(),pg_stat_file() - Test mode detection logic
- Test file discovery and reading with various scenarios
- Verify regex parsing works identically for both remote and local modes
Integration Tests
- Set up remote PostgreSQL container
- Generate test log entries
- Verify metrics are collected correctly
- Test permission denied scenarios
- Test fallback behavior when functions unavailable
- Test log rotation handling
Performance Considerations
- Remote parsing will have higher latency than local filesystem access
- Recommend longer collection intervals for remote mode (e.g., 120s vs 60s)
- Read log content in reasonable chunks to balance network overhead and memory usage
- Works best with frequent log rotation (smaller individual files)
- Consider using standby/replica for monitoring to reduce primary load
Security Considerations
- Validate file paths to prevent directory traversal
- Handle permission errors gracefully without exposing system details
- Ensure logs don't contain sensitive data the monitoring user shouldn't access
Limitations
- Requires PostgreSQL 10 or later (pgwatch is already v11+)
- Only supports CSVLOG format (same as current local parsing, check https://github.com/kmoppel/pgweasel for different formats and ideas)
- Reads log content in chunks, not true streaming
- May have slight delay compared to local parsing
- File access functions respect PostgreSQL's security restrictions
Success Criteria
- Functional: pgwatch successfully parses logs from remote PostgreSQL instances without filesystem access
- Compatible: No breaking changes to existing local log parsing functionality
- Automatic: Detects and uses appropriate method without requiring configuration
- Performant: Adds less than 1 second overhead per collection interval
- Robust: Handles errors gracefully with clear logging
- Documented: Clear documentation with examples and troubleshooting guide
- Tested: Comprehensive unit and integration test coverage
- Cloud-Ready: Works with managed PostgreSQL services (AWS RDS, Azure Database)
Open Questions for Discussion
-
Reading Strategy: Read entire new content since last position, or implement chunked streaming?
- Recommendation: Simple approach first (read new content), optimize if needed
-
Log Rotation Handling: How to handle files that disappear or get renamed during reading?
- Recommendation: Use
pg_stat_file()to detect changes, gracefully handle errors
- Recommendation: Use
-
Default Behavior: Should remote parsing be attempted automatically if local fails?
- Recommendation: Yes, with clear logging about which mode is active
-
Large Files: How to handle very large log files (multi-GB)?
- Recommendation: Set reasonable maximum read size per interval or implement windowing
-
Non-CSVLOG Support: Should we support other log formats remotely?
- Recommendation: CSVLOG only initially, expand based on user feedback
References
- PostgreSQL Generic File Access Functions
- PostgreSQL pg_monitor role
- Current implementation:
internal/metrics/logparse.go