-
Notifications
You must be signed in to change notification settings - Fork 82
Open
Description
Summary
Add support for reading and writing Parquet files in sqlcmd, similar to how CSV and other formats are supported.
Motivation
Parquet is a columnar storage format widely used in data engineering and analytics pipelines. Supporting Parquet would enable:
- Efficient data export: Column-oriented compression for large query results
- Interoperability: Direct integration with tools like Spark, Pandas, DuckDB, Polars, and cloud data lakes
- Type preservation: Parquet preserves data types better than CSV (dates, decimals, nulls)
- Performance: Faster reads for analytical workloads due to columnar format
Proposed Features
Output (-o parquet\ or similar)
- Export query results to Parquet files
- Support for common compression codecs (snappy, gzip, zstd)
- Proper SQL Server to Parquet type mapping
Input (optional, lower priority)
- Read Parquet files as input for bulk insert operations
- Could complement -i\ flag for file input
Implementation Considerations
- Go Parquet libraries: parquet-go, apache/arrow-go
- Type mapping from SQL Server types to Parquet logical types
- Memory management for large result sets (streaming vs buffered)
- Schema inference vs explicit schema definition
Related
- Similar to existing CSV output functionality
- Would complement the data export/import story for go-sqlcmd
Tasks
- Research Go Parquet libraries and choose one
- Design CLI flags (-f parquet, --parquet-compression, etc.)
- Implement SQL Server → Parquet type mapping
- Implement streaming Parquet writer for query results
- Add tests
- Document usage
- (Optional) Implement Parquet input for bulk operations
Metadata
Metadata
Assignees
Labels
No labels