Why Copying an Object is a terrible thing to do?
Venkat Subramaniam
venkats@durasoftcorp.com
http://www.durasoftcorp.com
Abstract
In this article we address the issue of copying objects and the correct way to write the code for copying objects. We address the issue of extensibility and the problems with copy constructors and clone methods in languages like Java, C# and C++.
Copy Constructor Problem in C++ and Java (and C#)
Classes are used to model concepts in object-oriented systems. Instances or objects of these are created and used throughout an application. It is not unusual that we are interested in making a copy of an object during the run time. How does one make such a copy? In languages like C++ (unfortunately), as a writer of a class, you do not have to do any thing to make a copy of an object. By default an object may be copied to create another “equal” instance. Lets consider an example of a Person class:
// C++ code sample. Other examples will be in Java.
public class Person
{
	private:
		Brain* pBrain;
		int age;
	public:
		Person(Brain* pABrain, int theAge)
		{
			pBrain = pABrain; 
			age = theAge;
		}
	…
};
Here is an example where we are making a copy of a Person object:
Person sam(new Brain(), 1);
Person bob = sam;
The problem with the above code is both sam and bob end up pointing to the same Brain! This is widely known as shallow copy. This is the default behavior in C++. Of course, as a user of the Person class, you intended (in good faith!) for the two Persons, sam and bob, to have their own Brains. One way this could have been addressed is for C++ run time to make a deep copy of objects that an object points to. This would, however, lead to trouble as well. Consider the example below:
// Last C++ code sample. 
public class Person
{
	private:
		Brain* pBrain;
		int age;
		City* pCityOfResidence;
	public:
		Person(Brain* pABrain, int theAge, …)
		{
		 pBrain = pABrain; 
		 age = theAge;
		 …
		}
	…
};
In this case, if C++ run-time were to make deep copies of every pointed object, then two copies of Brain would exist, one for sam and one for bob. However, we would also end up with two copies of the City where they reside as well! This, again, is surely is not what we want. We want to make a copy of the Brain, however, we do not want to copy the City. Why?

The answer to that goes beyond the code. In object-oriented programming, we have at least two relationships between objects: aggregation and association. Association is a relationship where objects are related; while aggregation is where one object may claim ownership over another. In the above example Person is associated with the City, but aggregates the Brain. We want to deep copy what’s being aggregated, however, what we want to do with associated objects is not clear. At the code level, a pointer (reference in Java and C#) is used to represent both association and aggregation. There is a semantic mismatch between the code and the object model .

C++ took the wrong approach to this problem. It simply said, by default, we will make a shallow copy and let the objects share their contents that they point to. Experienced C++ programmers will tell how much time and money has been wasted due to this problem. The general recommended solution in C++ is to (remember to) write a copy constructor! The copy constructor, that the author of the Person class would write, will take care of proper copying of the Person objects.

Java solves this problem by giving an error if you try to copy a Person object. Java’s argument is: looking at the code I can’t tell if that is aggregation or association. So, I will not make a guess. I will leave it to the author of the class, who knows what this is supposed to model. Now, how do you fix this in Java? Should you write a copy constructor for the Person? In both C++ and Java, writing a public copy constructor is a terrible idea . Why?

Do not provide a public copy constructor
While rest of the examples are given in Java, it maps pretty much to C++ and C# as well. Let us consider the class Person written in Java.
// Java is used in the rest of the article.
public class Person
{
	private Brain brain;
	private int age;
	public Person(Brain aBrain, int theAge)
	{
		brain = aBrain; 
		age = theAge;
	}
	public Person(Person another)
	{
		age = another.age;
		brain = new Brain(another.brain);
		// we assume we have a copy constructor for Brain
	}
	public String toString()
	{
		return "This is person with " + brain;
		// Not meant to sound rude as it reads!
	}
	…
}

public class Brain
{
	public Brain() {}
	public Brain(Brain another) {} // Assume proper copying of the Brain
}
The Person class has a copy constructor to make a “proper” copy of the object.

Now try the following code written in a main method of a User class:

Person sam = new Person(new Brain(), 1);
Person bob = new Person(sam);
System.out.println(sam);
System.out.println(bob);
The output from the above statements will be:
This is person with Brain@3fbdb0
This is person with Brain@3e86d0
You notice that the two Persons have their own brain! Problem solved?

Not really. The copy constructor of Person depends on the Brain class. It creates an instance of the Brain. What if we have a class, say SmarterBrain which extends Brain and we have a user writing code like the following:

Person sam = new Person(new SmarterBrain(), 1);
Person bob = new Person(sam);
System.out.println(sam);
System.out.println(bob);
The output from the above code will be:
This is person with SmarterBrain@3e86d0
This is person with Brain@50169
It does not meet our expectations. While sam has SmarterBrain, bob ends up with just a regular brain. The Person class is not extensible for the addition of new types of Brains. It fails the Open-Closed Principle2 (OCP). The OCP, by Bertrand Meyer, states that software entities should be open for extension, but closed for modifications .
Lets try to fix it
One approach would be to write the copy constructor of the Brain as follows:
public Person(Person another)
{
    age = another.age;
    if (another.brain instanceof SmarterBrain)
       brain = new SmarterBrain((SmarterBrain)another.brain);
    else
       brain = new Brain(another.brain);
}
This code relies on the use of Java’s run-time type identification (RTTI) and will result in code that needs to be modified if another type of Brain (a derived class of Brain) is introduced in the system. This code fails OCP as well.

A good way to copy an object is to let the object do it. In other words, rather than the Person creating a Brain (using another instance of Brain), i.e., new Brain(brain);, why not ask the brain to make a copy of itself? That is, why not say, brain.createOneJustLikeYourself();

The advantage of this approach is that the Person object does not have to guess what the real type of the Brain is. This is based on the Prototype Pattern3 .

The base class Object in Java provides a default implementation of the clone method, in which it performs a shallow copy (In C#/.NET, the method is called MemberwiseClone). However, the clone method in Object is protected and also the implementation checks if the class implements a Cloneable interface. This makes it impossible for any one to make an accidental copy of the object, unless the author of the class overrides and makes it the clone method public.

The code below shows the usage of cloning in Java for the Brain class, its sub-class and the Person class.

public class Person implements Cloneable
{
	private Brain brain;
	private int age;
	public Person(Brain aBrain, int theAge)
	{
		brain = aBrain; 
		age = theAge;
	}
	public String toString()
	{
		return "This is person with " + brain;
		// Not meant to sound rude as it reads!
	}
	public Object clone()
	{
		Person another = null;
		try
		{
			another = (Person) super.clone();
			// shallow copy made so far. Now we will make it deep
			another.brain = (Brain) brain.clone();
		}
		catch(CloneNotSupportedException e) {} 
		//This exception will not occur
	
		return another;
	}
	…
}

public class Brain implements Cloneable
{
	public Brain() {}
	public Object clone() throws CloneNotSupportedException
	{ return super.clone(); }
		// Shallow copy adequate for this class.
}
public class SmarterBrain extends Brain
{
	public SmarterBrain() {}
	public Object clone() throws CloneNotSupportedException
	{
		SmarterBrain another = (SmarterBrain) super.clone();
		//… take care of any deep copies to be made here
		return another;
	}
}
Based on this code, the following User code produces the output indicated below the code:
Person sam = new Person(new SmarterBrain(), 1);
Person bob = (Person) sam.clone();
System.out.println(sam);
System.out.println(bob);
This is person with SmarterBrain@50169
This is person with SmarterBrain@1fcc69
Are we there yet?
While the above approach seems to be working fine and is extensible, clone brings its share of problems!

The first problem is that no constructor is called on the object being cloned. As a result, it is your responsibility, as a writer of the clone method, to make sure all the members have been properly set. Here is an example of where things could go wrong. Consider a class keeping track of the total number of objects of that type, using a static int member. In the constructors you would increase the count. However, if you clone the object, since no constructor is called, the count will not truly reflect the number of objects!

Further, if the class has final fields, these can’t be given a value in the clone method. This leads to problems with properly initializing the object’s final fields. If the final field is referring to some internal state of the object, then the cloned object ends up sharing the internal state and this surely is not correct for mutable objects.

Consider the following class for example:

public class Person implements Cloneable
{
	private final Brain brain; // brain is final since I do not want 
				// any transplant on it once created!
	private int age;
	public Person(Brain aBrain, int theAge)
	{
		brain = aBrain; 
		age = theAge;
	}
	public String toString()
	{
		return "This is person with " + brain;
		// Not meant to sound rude as it reads!
	}
	public Object clone()
	{
		try
		{
			Person another = (Person) super.clone();
			// shallow copy made so far. Now we will make it deep
			
			another.brain = (Brain) brain.clone();
//ERROR: you can't set another.brain
			
			return another;
		}
		catch(CloneNotSupportedException e) {} 
		//This exception will not occur
	}
	…
}

These issues are addressed in detail in Item 10 of Joshua Bloch's Effective Java1 .

Joshua concludes his discussion on cloning by saying "... you are probably better off providing some alternative means of object copying or simply not providing the capability." He goes on to say "A fine approach to object copying is to provide a copy constructor."

I agree with him on the idea of "simply not providing the capability" to copy. However, his suggestion on providing a copy constructor could lead to issues raised above!

The final shot
A solution to the problem is to implement the clone method and a protected copy constructor! Lets modify the Person class to do just that.
public class Person implements Cloneable
{
	private final Brain brain; // brain is final since I do not want 
				// any transplant on it once created!
	private int age;
	public Person(Brain aBrain, int theAge)
	{
		brain = aBrain; 
		age = theAge;
	}
	protected Person(Person another)
	{
		Brain refBrain = null;
		try
		{
			refBrain = (Brain) another.brain.clone();
			// You can set the brain in the constructor
		}
		catch(CloneNotSupportedException e) {}
		brain = refBrain;
		age = another.age;
	}
	public String toString()
	{
		return "This is person with " + brain;
		// Not meant to sound rude as it reads!
	}
	public Object clone()
	{
		return new Person(this);
	}
	…
}
Now consider having a class derive from Person.
public class SkilledPerson extends Person
{
	private String theSkills;
	public SkilledPerson(Brain aBrain, int theAge, String skills)
	{
		super(aBrain, theAge);
		theSkills = skills;
	}
	protected SkilledPerson(SkilledPerson another)
	{
		super(another);
		theSkills = another.theSkills;
	}
	
	public Object clone()
	{
		return new SkilledPerson(this);
	}
	public String toString()
	{
		return "SkilledPerson: " + super.toString();
	}
}
You may try running the following code:
public class User
{
	public static void play(Person p)
	{
		Person another = (Person) p.clone();
		System.out.println(p);
		System.out.println(another);
	}
	public static void main(String[] args)
	{
		Person sam = new Person(new Brain(), 1);
		play(sam);
		SkilledPerson bob = new SkilledPerson(new SmarterBrain(), 1, "Writer");
		play(bob);
	}
}
The output produced will be:
This is person with Brain@1fcc69
This is person with Brain@253498
SkilledPerson: This is person with SmarterBrain@1fef6f
SkilledPerson: This is person with SmarterBrain@209f4e
The above example shows how you can implement clone safely by relying on the construction process. This is especially useful in the cases where your class contains final fields. Observe that, if we keep a count of the number of objects, the clone as implemented here will keep a correct count of the number of objects.
Conclusion
Copying an object by specifying new followed by the class name often leads to code that is not extensible. Using clone, the application of prototype pattern, is a better way to achieve this. However, using clone as it is provided in Java (and C#) can be problematic as well. It is better to provide a protected (non-public) copy constructor and invoke that from the clone method. This gives us the ability to delegate the task of creating an object to an instance of a class itself, thus providing extensibility and also, safely creating the objects using the protected copy constructor.
References
  1. Joshua Bloch, Effective Java Programming Language Guide. Addison-Wesley, Boston, MA, 2001.
  2. Robert C Martin, The Open-Closed Principle. C++ Report, 1996. http://www.objectmentor.com/resources/articles/ocp.pdf.
  3. Erich Gamma, et. al., Design Patterns, Elements of Reusable Object-Oriented Software, Addison-Wesley, Boston, MA, 1994.