Skip to content

Conversation

@headius
Copy link
Contributor

@headius headius commented Dec 11, 2025

This feature leaked out without much discussion, but there's many reasons why it should not be supported:

  • Each instance of an anonymous Data in the stream will cause a new Data subtype to be defined.
  • Multiple objects of the same anonymous Data cannot be round-tripped, since they will no longer be of the same type after loading.

As with Marshal's behavior for anonymous classes and structs, Psych should reject such objects rather than attempt to fake a Data class that appears to fit the original structure.

@headius
Copy link
Contributor Author

headius commented Dec 11, 2025

@nevans @eregon Here's the PR I promised. I believe my justifications for removing this feature are sound.

While Psych does allow dumping and loading anonymous classes and structs, Marshal does not. I believe Psych should not allow serializing things that Marshal disallows:

$ ruby -e 'p Marshal.dump(Class.new)'         
-e:1:in 'Marshal.dump': can't dump anonymous class #<Class:0x000000011b6f19b0> (TypeError)
	from -e:1:in '<main>'
$ ruby -e 'p Marshal.dump(Struct.new(:foo))'
-e:1:in 'Marshal.dump': can't dump anonymous class #<Class:0x000000011dd71900> (TypeError)
	from -e:1:in '<main>'

I also don't think the feature is actually worth keeping, since it produces Data objects that no longer reflect their original class:

$ ruby -ryaml -e 'd = Data.define(:foo); d1 = d[1]; d2 = d[2]; p d1.class == d2.class; d1p, d2p = YAML.unsafe_load(YAML.dump([d1, d2])); p d1p.class == d2p.class'
true
false

And a large stream with many anonymous Data objects in it will create as many different anonymous Data types, all potentially identical in structure.

$ ruby -ryaml -e 'd = Data.define(:foo); ds = 100_000.times.map {d[it]}; YAML.unsafe_load(YAML.dump(100_000.times.map{ds})); p ObjectSpace.each_object(Class).select{it.superclass == Data}.size'
100001

None of these are what people want or expect.

This feature leaked out without much discussion, but there's many
reasons why it should not be supported:

* Each instance of an anonymous Data in the stream will cause a new
  Data subtype to be defined.
* Multiple objects of the same anonymous Data cannot be round-
  tripped, since they will no longer be of the same type after
  loading.

As with Marshal's behavior for anonymous classes and structs, Psych
should reject such objects rather than attempt to fake a Data class
that appears to fit the original structure.
raise TypeError, "can't dump anonymous Data: #{o}" unless o.class.name
ivars = o.instance_variables
if ivars.empty?
tag = ['!ruby/data', o.class.name].compact.join(':')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tag = ['!ruby/data', o.class.name].compact.join(':')
tag = "!ruby/data:#{o.class.name}"

and same below.

@eregon
Copy link
Member

eregon commented Dec 12, 2025

That makes sense to me, especially the last case where one Data subclass becomes 100_000 Data subclasses after "roundtripping".

I think we should also disallow loading anonymous Data for consistency & simplicity, but that's probably best done after #765 is merged to avoid conflicts.

@nevans
Copy link
Contributor

nevans commented Dec 12, 2025

I'm still against this. My long term objection to it is a weak objection. But I am strongly against it without at least giving a deprecation period (until at least ruby 4.1). If someone wants to load anonymous data (or struct), that's their prerogative, IMO.

To me, YAML is not only (or even primarily) about round-tripping, and the inability to fully round-trip something shouldn't be the only justification for being able to load (or dump) it. My projects over the years have loaded orders of magnitude more YAML than they've serialized. And even when I do dump YAML, sometimes that's more about printing a quick and dirty debug file that I can easily read through, and I have no intention of ever loading it into anything but my own brain (or maybe a quick one-off awk | diff pipeline, or a inspection text output onto a loki graphana dashboard or admin web page).

And LOTS of data can't be round-tripped safely. If that is the justification for disabling this, that justifies breaking many other cases, but in particular all of the other anonymous class round tripping, right? IMO, that would break many things.

irb(main):022> oclass = Class.new do attr :ivar; def initialize(...) super; @ivar = "hello obj" end end
=> #<Class:0x00007a986f106558>
irb(main):023> oclass.new.to_yaml
=> "--- !ruby/object\nivar: hello obj\n"
irb(main):024> YAML.unsafe_load oclass.new.to_yaml
=> #<Object:0x00007a986ecf6788 @ivar="hello obj">
irb(main):025> _.class
Object

Anonymous Data/Struct loading is far more useful than this, because the only way to use the anonymous object I've just loaded is manually modifying the singleton class! At which point, you're creating another class per object again.

Of course, anonymous Data (and Struct) do have the extra issue beyond other anonymous classes: they create a brand new class per object. But I still don't think that should be a necessary restriction. They also create a custom object per object, so some sort of linear memory use is already baked in. If someone wants to do that (and bust a bunch of their caches), let them. (Adding a flag to enable/disable it would be fine, of course.)

On the other hand, changing how anonymous data (and structs and objects) are dumped to be more like how anonymous Array subclasses and Hash subclasses are dumped would be much more acceptable to me. It would still be a breaking change, and a conservative approach would still give it a deprecation period or at least be careful about timing which release changes the default behavior. But it would force people to confront any round-tripping problems head-on, rather than hiding them.

irb(main):001> aclass = Class.new(Array) do def initialize(...) super; @ivar = "hello array" end end
=> #<Class:0x00007448bdd45bf0>
irb(main):002> aclass.new(10) { rand }.to_yaml
=> "--- !ruby/array:%23%3CClass:0x00007448bdd45bf0%3E\ninternal:\n- 0.8122580707133161\n- 0.9037981601701021\n- 0.04788691095832298\n- 0.5915898581951498\n- 0.01001794374303544\n- 0.3938141475427078\n- 0.9484047343467139\n- 0.6925541802973968\n- 0.29676774864966593\n- 0.6565246077519113\nivars:\n  :@ivar: hello array\n"

See the array.class.inspect in there: %23%3CClass:0x00007448bdd45bf0%3E

irb(main):003> YAML.unsafe_load aclass.new(10) { rand }.to_yaml
/home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/class_loader.rb:57:in `path2class': can't retrieve anonymous class #<Class:0x00007448bdd45bf0> (ArgumentError)

        path2class(name)
                   ^^^^
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/class_loader.rb:57:in `resolve'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/class_loader.rb:49:in `find'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/class_loader.rb:29:in `load'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/visitors/to_ruby.rb:465:in `resolve_class'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/visitors/to_ruby.rb:266:in `visit_Psych_Nodes_Mapping'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/visitors/visitor.rb:30:in `visit'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/visitors/visitor.rb:6:in `accept'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/visitors/to_ruby.rb:35:in `accept'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/visitors/to_ruby.rb:345:in `visit_Psych_Nodes_Document'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/visitors/visitor.rb:30:in `visit'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/visitors/visitor.rb:6:in `accept'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/visitors/to_ruby.rb:35:in `accept'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/nodes/node.rb:49:in `to_ruby'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych.rb:275:in `unsafe_load'
	from (irb):3:in `<main>'
	from <internal:kernel>:187:in `loop'
	... 3 levels...
/home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/class_loader.rb:57:in `path2class': undefined class/module Struct::#<Class (ArgumentError)

        path2class(name)
                   ^^^^
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/class_loader.rb:57:in `resolve'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/class_loader.rb:49:in `find'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/class_loader.rb:29:in `load'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/visitors/to_ruby.rb:465:in `resolve_class'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/visitors/to_ruby.rb:266:in `visit_Psych_Nodes_Mapping'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/visitors/visitor.rb:30:in `visit'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/visitors/visitor.rb:6:in `accept'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/visitors/to_ruby.rb:35:in `accept'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/visitors/to_ruby.rb:345:in `visit_Psych_Nodes_Document'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/visitors/visitor.rb:30:in `visit'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/visitors/visitor.rb:6:in `accept'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/visitors/to_ruby.rb:35:in `accept'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych/nodes/node.rb:49:in `to_ruby'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/psych-5.2.6/lib/psych.rb:275:in `unsafe_load'
	from (irb):3:in `<main>'
	from /home/user/.local/share/rbenv/versions/3.3.10/lib/ruby/gems/3.3.0/gems/irb-1.15.3/lib/irb/workspace.rb:101:in `eval'
	... 17 levels...

So, my counter-proposal is this:

  • At the very least, convert this into two PRs: one to print a warning (edited to add: for the deprecation period) and one to remove support (edited to add: for the next breaking changes release).
  • Add an option to disable loading anonymous Data/Struct/Object classes, but unsafe_load should continue to allow anonymous class Data/Struct/Object loading.
  • But when dumping objects with anonymous classes, take the Array/Hash approach and print out the obviously unloadable obj.class.inspect as their class name. While this is still technically backwards incompatible and should wait until at least the next breaking changes release, it's also arguably "fixing a bug" or "making it more consistent".

@nevans
Copy link
Contributor

nevans commented Dec 12, 2025

Since the vast majority of production code that is loading YAML is (or should be) using safe_load, not unsafe_load, they will never encounter this issue. If that same code is also using YAML to serialize (and later deserialize), it should also be using safe_dump.

But sometimes people may use permitted_classes: [Data, Struct] as a shorthand for all non-anonymous data and structs.

So, I'll add the following details to my counter-proposal:

  • permitted_classes should special-case Data, Struct, and Object (edited to add: and any other anonymous subclasses supported by Psych) to only allow anonymous subclasses of permitted classes based on the following new option(s).
  • Add a permitted_anonymous_subclasses: [] option. unsafe_load would default to [Object], allowing everything.
    • Alternatively, anonymous_object: false, anonymous_struct: false, anoymous_data: false would be the defaults for safe_load. unsafe_load would default to true.
      (Edited to add:) But I prefer the permit_anonymous_subclasses for being both simpler, more versatile (You could specify your own custom superclass or included module, rather than just Object), and the obvious symmetry with permitted_classes.
  • Restrictions on dumping any sort of objects, anonymous class or otherwise, should go on safe_dump, not dump. dump should err on the side of dumping anything which is dumpable, round-tripping considerations be damned. safe_dump defaults should mirror the safe_load defaults.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants