A bug hunting story

Today I found a bug. It was so interesting that I decided to write a longer post here about it.
I created a strip down solution with the only classes and methods I need to demonstrate the bug. This is the reason if the story wont seem too realistic.

A long long time ago I need a dictionary to store some integers with a key which was based on a string but has some other features (not shown here). So I created MyKey class for this:

[Serializable]
public class MyKey
{
    private string key = null;

    public MyKey(string key)
    {
        if (key == null)
        {
            throw new ArgumentNullException("key");
        }

        this.key = key;
    }

    private int? hashCode = null;
    public override int GetHashCode()
    {
        int ret = 0;

        if (hashCode == null)
        {
            hashCode = this.key.GetHashCode();
        }

        ret = hashCode.Value;

        return ret;
    }

    public override bool Equals(object obj)
    {
        bool ret = false;

        MyKey other = obj as MyKey;
        if (other != null)
        {
            ret = Equals(other);
        }

        return ret;
    }

    public bool Equals(MyKey other)
    {
        bool ret = false;

        if (other != null)
        {
            if (this.hashCode == other.hashCode)
            {
                if (this.key == other.key)
                {
                    ret = true;
                }
            }
        }

        return ret;
    }

    public override string ToString()
    {
        string ret = String.Concat("\"", key, "\"");
        return ret;
    }
}

It was used happily like this:

// create data
var data = new Dictionary<MyKey, int>();
data[new MyKey("alma")] = 1;

Later I wrote some code to persist these data via serialization.
Everything was working like a charm.

// serialize and save it
var serializedData = Serializer.Serialize(data);
SaveToFile(serializedData);

...

// load and deserialize data
var serializedData = LoadFromFile();
var data = Serializer.Deserialize(serializedData);

There was a usecase when after deserialization some of the values in data must be changed:

// as in deserialized data
var specificKey = new MyKey("alma");
if (data[specificKey] == 1) // a KeyNotFoundException occures here!
{
    data[specificKey] = 2;
}

KeyNotFoundException? I was sure that there should be a value in all of data instances with the given key! Lets see in QuickView:

There is an “alma” key!
Let’s comment out the line causing the exception and check data after the expected value modification to “2”:

Much more interesting isnt it?
I quickly put all the data creation, serialization, deserialization code into one unit test to have a working chunk of code I can use for bug hunting:

[TestMethod]
public void TestMethod1()
{
    var d = new Dictionary<mykey, int="">();
    d[new MyKey("alma")] = 1;

    var serialized = Serializer.Serialize(d);

    var data = Serializer.Deserialize(serialized);

    var specificKey = new MyKey("alma");
    {
        data[specificKey] = 2;
    }
}

But in the unit test everything was working! I simply cant reproduce the bug in such a way.
But when running App1, which was creating and serializing the data and running App2 which was deserializing and modifying it the bug always presents itself.
How can be a duplicate key in a Dictionary<,>? MyKey‘s implemetation, especially the Equals() override is so trivial that it cannot allow two instances created from
same string to be not equal.

But wait a minute!

How can the hashCode’s differ?!?!?!

Yes. A quick search on the net answers everything. MSDN clearly describes in a big “Important” box:

The hash code itself is not guaranteed to be stable. Hash codes for identical strings can differ across versions of the .NET Framework and across platforms (such as 32-bit and 64-bit) for a single version of the .NET Framework. In some cases, they can even differ by application domain.

As a result, hash codes should never be used outside of the application domain in which they were created, they should never be used as key fields in a collection, and they should never be persisted.

App1 was running in x86 and App2 in x64 environment. Thats why the string hashcodes differ.

The fix is really easy. Just turn off hashCode calculation optimalization for serialization:

[Serializable]
public class MyKey
{
   ...

   [NonSerialized]
   private int? hashCode = null;
   ...
}

Now hashCode will be recalculated once in all runtime environments.

I never thought about the possibility of unstable hashcodes.
I hope I am not the only man in the world with such wasteful brain.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.