Deep copying data structures is surprisingly undocumented.
First, let us define a deep copy
cpy of an object
Any actions on
cpy must have the same results as on
and actions on
cpy must not change the state of
This means, deep copy must consider the graph-shape of references.
If there are multiple references to an identical object within
the references in
cpy must point to an identical object as well.
This requirement includes the problem of cyclic data structures.
A deep copy can simply return its argument if the object is immutable, since the state of immutable data never changes.
There are usual more practical requirements. For example, it should work for everything by default, since it becomes tedious to implement a copy method for each and every class. Nevertheless, the copy operation should fail for specific classes, which cannot be copied, like Lock, Thread, or singletons.
Some people think additional modifications should be possible, for example converting a LinkedList into an ArrayList. While this fulfills the definition above, a linked list behaves differently than an array. Random access becomes more expensive, while adding elements becomes cheaper.
The basic algorithm of deep copy must remember already copied objects for the multiple-references-to-identical-object requirement. Let a map be an additional argument to deep copy and initialize it as an empty map.
deepCopy(Object orig, Map<int,Object> m) id := orig.hashCode() if id in m, return m[id] cpy := alloc(sizeof(orig)) m[id] := cpy for each (non static) field f of orig: cpy.f := deepCopy(orig.f, m) return cpy
For native data types the field is copied by value. References to other objects are handled recursively.
To implement a deep copy operation, various approaches are possible.
In dynamically typed languages like Python
a reflection-based method is a good solution.
Using runtime type information you can easily get a list of all fields from an object.
It is possible in a language like Java as well.
However, Java prominently provides a shallow copy through
which can be implemented as a
A second variant, which is often recommended in Java due to its (relative) simplicity, is to rely on the serialization mechanisms. Serializing and deserializing provides you with a legitimate deep copy. The algorithm of serialization is mostly the same, as it needs a map as well. Performance suffers of course.
As a third option, a compiler could generate this loop (in an unrolled form) and provide an internal fieldCopy method like this:
class MyObj int foo static MyObj bar MyObj baz MyObj::fieldCopy(MyObj cpy, Map<int,Object> m) cpy.foo = this.foo // native type // 'bar' is static cpy.baz = deepCopy(this.baz, m) // reference
The literature for this approach is not found under deep copying, but marshalling or serialization in high-performance computing.
Obviously, creating a copy constructs a new object.
Therefore, the operation should take place in a constructor.
Just think about
which can only be modified in constructors.
So, whatever implementation approach we take
it should be implemented in the copy constructor.
It is unspecified whether the copy constructor should create a shallow or a deep copy.
Hence, the copy constructor should actually take an additional argument
to distinguish between shallow and deep copy.
For a different behavior (neither shallow nor deep),
the programmer is free to implement
a copy constructor without an additional argument.