Skip to content

transpose=true returns plausible but incorrect results #1172

@LilithHafner

Description

@LilithHafner

I was doing some data handling with CSV and at the end of my pipeline, the graphs looked plausible, but results were unexpected. I tracked it down to this correctness bug in CSV.jl:

julia> using CSV; CSV.File(IOBuffer("""
       Alpha,,3982,16603,,,,"40*",95,4027,,,
       Beta,,,2664,2716,,,"0*",15,833,,,
       Gamma,,,,1641,1707,1762,1814,1861,1913,,,
       """), transpose=true)
12-element CSV.File:
 (Alpha = String7("40*"), Beta = String3("0*"), Gamma = missing)
 (Alpha = missing, Beta = missing, Gamma = missing)
 (Alpha = missing, Beta = missing, Gamma = missing)
 (Alpha = missing, Beta = missing, Gamma = 1641)
 (Alpha = missing, Beta = missing, Gamma = 1707)
 (Alpha = missing, Beta = missing, Gamma = 1762)
 (Alpha = missing, Beta = missing, Gamma = 1814)
 (Alpha = String7("95"), Beta = String3("15"), Gamma = 1861)
 (Alpha = String7("4027"), Beta = String3("833"), Gamma = 1913)
 (Alpha = missing, Beta = missing, Gamma = missing)
 (Alpha = missing, Beta = missing, Gamma = missing)
 (Alpha = missing, Beta = missing, Gamma = missing)

Or, more readable but with more deps

julia> using CSV, DataFrames; CSV.read(IOBuffer("""
       Alpha,,3982,16603,,,,"40*",95,4027,,,
       Beta,,,2664,2716,,,"0*",15,833,,,
       Gamma,,,,1641,1707,1762,1814,1861,1913,,,
       """), DataFrame, transpose=true)
12×3 DataFrame
 Row │ Alpha     Beta      Gamma   
     │ String7?  String7?  Int64?  
─────┼─────────────────────────────
   1 │ 40*       0*        missing 
   2 │ 95        15        missing 
   3 │ 4027      833       missing 
   4 │ missing   missing      1641
  ⋮  │    ⋮         ⋮         ⋮
  10 │ missing   missing   missing 
  11 │ missing   missing   missing 
  12 │ missing   missing   missing 
                     5 rows omitted

julia> using CSV, DataFrames; CSV.read(IOBuffer("""
       Alpha,,3982,16603,,,,"40*",95,4027,,,
       Beta,,,2664,2716,,,"0*",15,833,,,
       Gamma,,,,1641,1707,1762,1814,1861,1913,,,
       """), DataFrame, transpose=false)
2×13 DataFrame
 Row │ Alpha    Column2  3982     16603    Column5  Column6  Column7  40*      95     4027   Column11  Column12  Column13 
     │ String7  Missing  Missing  Int64?   Int64    Int64?   Int64?   String7  Int64  Int64  Missing   Missing   Missing  
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ Beta     missing  missing     2664     2716  missing  missing  0*          15    833   missing   missing   missing 
   2 │ Gamma    missing  missing  missing     1641     1707     1762  1814      1861   1913   missing   missing   missing 

This also segfaults from time to time.

It didn't happen before I introduced the *s.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions