-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Problem
Currently, CPS datasets from 2019-2023 use frac = 0.5 to randomly downsample to 50% of observations before reweighting. This reduces memory usage but throws away half the data randomly, potentially losing important observations.
Proposed Solution
Remove the 50% downsampling and instead use a larger L0 penalty in the reweighting step. The L0 penalty encourages sparsity by setting some weights to exactly zero, achieving similar computational benefits but in a learned way that preserves important observations.
Benefits
- Better accuracy: The model decides which observations to effectively zero out based on their contribution to targets
- Same memory footprint: Similar sparsity level (≈50% zeros) but chosen optimally
- More principled: Uses optimization rather than random sampling
Implementation
- Remove
frac = 0.5from CPS_2019 through CPS_2023 classes - Increase the L0 penalty parameter in the reweighting function to achieve approximately 50% sparsity
- Test to find the L0 value that gives similar sparsity to current approach
Related to #391 (L0 exploration)
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request