-
Notifications
You must be signed in to change notification settings - Fork 139
Open
Description
Hi,
I am creating dataframe for 3.5m records and 25 vector. it is taking over 1min.
# construct data for 3.5m records and close to 25 same key element in each hash.
data = [
{m: 'abc', a: 1.2, b: 2.1, c: 2.3},
{m: 'xyz', a: 1.1, b: 22.1, c: 223.3}
...
]
# Convert from array of hash to hash of array
vc = {}
data.first.keys.each do |ky|
vc[ky] = data.map{|dt| dt[ky]}
end
Benchmark.bm do |x|
x.report("df array_of_hash: ") { Daru::DataFrame.new(data, clone: false) }
x.report("df hash_of_array: ") { Daru::DataFrame.new(vc, clone: false) }
end
##
# user system total real
# df array_of_hash: 86.398855 0.311986 86.710841 ( 86.850770)
# df hash_of_array: 21.745897 0.027261 21.773158 ( 21.814447)After converting data (which also took a min), it is little faster but 21 sec is still a lot of time to create dataframe.
Any ideas how to speed this up?
Metadata
Metadata
Assignees
Labels
No labels