Deep copying data structures is surprisingly undocumented.
Definition
First, let us define a deep copy cpy
of an object orig
:
Any actions on cpy
must have the same results as on orig
and actions on cpy
must not change the state of orig
.
This means, deep copy must consider the graph-shape of references.
If there are multiple references to an identical object within orig
,
the references in cpy
must point to an identical object as well.
This requirement includes the problem of cyclic data structures.
A deep copy can simply return its argument if the object is immutable, since the state of immutable data never changes.
Requirements
There are usual more practical requirements. For example, it should work for everything by default, since it becomes tedious to implement a copy method for each and every class. Nevertheless, the copy operation should fail for specific classes, which cannot be copied, like Lock, Thread, or singletons.
Some people think additional modifications should be possible, for example converting a LinkedList into an ArrayList. While this fulfills the definition above, a linked list behaves differently than an array. Random access becomes more expensive, while adding elements becomes cheaper.
Algorithm
The basic algorithm of deep copy must remember already copied objects for the multiple-references-to-identical-object requirement. Let a map be an additional argument to deep copy and initialize it as an empty map.
deepCopy(Object orig, Map<int,Object> m)
id := orig.hashCode()
if id in m, return m[id]
cpy := alloc(sizeof(orig))
m[id] := cpy
for each (non static) field f of orig:
cpy.f := deepCopy(orig.f, m)
return cpy
For native data types the field is copied by value. References to other objects are handled recursively.
Approaches
To implement a deep copy operation, various approaches are possible.
In dynamically typed languages like Python
a reflection-based method is a good solution.
Using runtime type information you can easily get a list of all fields from an object.
It is possible in a language like Java as well.
However, Java prominently provides a shallow copy through Object.clone
,
which can be implemented as a memcpy
.
A second variant, which is often recommended in Java due to its (relative) simplicity, is to rely on the serialization mechanisms. Serializing and deserializing provides you with a legitimate deep copy. The algorithm of serialization is mostly the same, as it needs a map as well. Performance suffers of course.
As a third option, a compiler could generate this loop (in an unrolled form) and provide an internal fieldCopy method like this:
class MyObj
int foo
static MyObj bar
MyObj baz
MyObj::fieldCopy(MyObj cpy, Map<int,Object> m)
cpy.foo = this.foo // native type
// 'bar' is static
cpy.baz = deepCopy(this.baz, m) // reference
The literature for this approach is not found under deep copying, but marshalling or serialization in high-performance computing.
Constructor Behavior
Obviously, creating a copy constructs a new object.
Therefore, the operation should take place in a constructor.
Just think about final
fields,
which can only be modified in constructors.
So, whatever implementation approach we take
it should be implemented in the copy constructor.
It is unspecified whether the copy constructor should create a shallow or a deep copy.
Hence, the copy constructor should actually take an additional argument
to distinguish between shallow and deep copy.
For a different behavior (neither shallow nor deep),
the programmer is free to implement
a copy constructor without an additional argument.