Skip to content

Replace 50% downsampling with larger L0 penalty for better accuracy #427

@MaxGhenis

Description

@MaxGhenis

Problem

Currently, CPS datasets from 2019-2023 use frac = 0.5 to randomly downsample to 50% of observations before reweighting. This reduces memory usage but throws away half the data randomly, potentially losing important observations.

Proposed Solution

Remove the 50% downsampling and instead use a larger L0 penalty in the reweighting step. The L0 penalty encourages sparsity by setting some weights to exactly zero, achieving similar computational benefits but in a learned way that preserves important observations.

Benefits

  • Better accuracy: The model decides which observations to effectively zero out based on their contribution to targets
  • Same memory footprint: Similar sparsity level (≈50% zeros) but chosen optimally
  • More principled: Uses optimization rather than random sampling

Implementation

  1. Remove frac = 0.5 from CPS_2019 through CPS_2023 classes
  2. Increase the L0 penalty parameter in the reweighting function to achieve approximately 50% sparsity
  3. Test to find the L0 value that gives similar sparsity to current approach

Related to #391 (L0 exploration)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions