-
Notifications
You must be signed in to change notification settings - Fork 21
Open
Description
While toying over this data set, I noticed about a 4x redundancy (1.5MB to 384KB) in the metadata. It's probably not a big deal, but I figured I'd leave this script somewhere in case it becomes useful.
Current Format:
<id>: {
"image_filepath": "images/<id>.jpg",
"anomaly_class": <class>
},
...
Reduced Format:
<id>: <class>,
...
This changes the images filepath requirement to consistent with the top level key instead of the image_filepath key, which is already the case in the current dataset. The provided script will fail if this constraint is not satisfied.
require 'json'
require 'pathname'
def shrink(old_path)
old = File.open(old_path) { |f| JSON.load(f) }
new = old.map do |key, value|
if key != File.basename(value["image_filepath"], ".jpg")
raise "key (#{key}) / filepath (#{value["image_filepath"]}) mismatch"
end
[key, value["anomaly_class"]]
end.to_h
new_path = Pathname(old_path).sub_ext("_new.json").to_s
File.open(new_path, 'w') { |f| JSON.dump(new, f) }
end
shrink("./InfraredSolarModules/module_metadata.json")Further reduction in size and random access time could be achieved by assuming a contiguous set of image paths and then using the offset into the metadata to index into them directly. This could prevent loading the entire set of metadata if it grows too large.
Metadata
Metadata
Assignees
Labels
No labels