Is it safe to deserialize untrusted data, provided my code makes no assumptions about the state or class of the deserialized object, or can the mere act of deserializing cause undesired operation?
(Threat model: The attacker may freely modify the serialized data, but that's all he can do)
What Is Deserialization? Insecure deserialization vulnerabilities involve the use of unknown or untrusted data and can result in attacks such as denial of service (DoS), malicious code execution, bypassing authentication measures or other abuses of application logic.
Serialization, however, offers an attack surface to hackers as they can edit objects in byte streams or deploy malicious scripts in the runtime environment to create unexpected things. Java serialization vulnerability is often exploited as an entry point for Remote Code Execution and data tampering attacks.
Deserialization is the opposing process which takes data from a file, stream or network and rebuilds it into an object. Serialized objects can be structured in text such as JSON, XML or YAML. Serialization and deserialization are safe, common processes in web applications.
Exploits from insecure deserialization occur when untrusted user input executes malicious code or abuses the system's logic. This can either manifest itself in the form of a DDoS attack or arbitrary code execution when the data gets deserialized. Insecure deserialization is a major threat to web applications.
Deserialization itself can already be unsafe. A serializable class may define a readObject method (see also the specification), which is called when an object of this class is going to be deserialized from the stream. The attacker cannot provide this code, but using a crafted input she can invoke any such readObject method that is on your classpath, with any input. 
It is possible to make a readObject implementation that opens the door to arbitrary bytecode injection. Simply read a byte array from the stream and pass it to ClassLoader.defineClass and ClassLoader.resolveClass() (see the javadoc for the former and the later). I don't know what the use of such an implementation would be, but it is possible.
Writing secure readObject methods is hard. Up until somewhat recently the readObject method of HashMap contained the following lines.
int numBuckets = s.readInt();
table = new Entry[numBuckets];
This makes it very easy for an attacker to allocate several gigabytes of memory with just a few dozen bytes of serialized data, which will have your system down with an OutOfMemoryError in no time.
The current implementation of Hashtable seems to still be vulnerable to a similar attack; it computes the size of the allocated array based on the number of elements and the load factor, but there is no guard in place against unreasonable values in loadFactor, so we can easily request a billion slots be allocated for each element in the table. 
Fixing the vulnerability in HashMap was done as part of changes to address another security issue related to hash-based maps. CVE-2012-2739 describes a denial-of-servic attack based on CPU consumption by creating a HashMap with very many colliding keys (i.e. distinct keys with the same hash value). The documented attacks are based on query parameters in URLs or keys in HTTP POST data, but deserialization of a HashMap is also vulnerable to this attack. 
The safeguards that were put into HashMap to prevent this type of attack are focussed on maps with String keys. This is adequate to prevent the HTTP-based attacks, but is easily circumvented with deserialization, e.g. by wrapping each String with an ArrayList (whose hashCode is also predictable). Java 8 includes a proposal (JEP-180) to further improve the behaviour of HashMap in the face of many collisions, which extends the protection to all key types that implements Comparable, but  that still allows an attack based on ArrayList keys. 
The upshot of this is that is possible for the attacker to engineer a byte streams such that the CPU effort it takes to deserialize an object from this stream grows quadratically with the size of the stream.
By controlling the input to the deserialization process an attacker can trigger the invocation of any readObject deserialization-method. It is theoretically possible for such a method to allow bytecode injection. In practice it is certainly possible to easily exhaust memory or CPU resources this way, resulting in denial-of-service attacks. Auditing your system against such vulnerabilities is very difficult: you have to check every implementation of readObject, including those in third-party libraries and the runtime library.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With