Optimizing searches through lists

I just finished completing a significantly complex piece of work, and thought I would pen a few words on how I improved the performance of this code. This will help me remember how I did this as well as help the novice programmer:

This is how my initial code looked like:

for (int i = 0; i < small_list.Count; i++)
{
      item = small_list[i];

      result = (from p in big_list where p.ToLower().StartsWith(item.ToLower()) && String.Compare(p, item, true) != 0 select p).ToList();
}

If the size of the small list is “S” and the size of the big list is “B”, every time the code will go through “B” items to get the result.

I optimized this code so that I use a dictionary to lookup the results of the search as shown below:

        for (int i = 0; i < small_list.Count; i++)
        {
                item = small_list[i];

                if (lookup.ContainsKey(item))
                {
                    result = lookup[item];
                }
            }

The dictionary is populated by only going through the big list once, as shown below:

            for (int i = 0; i < big_list.Count; i++)
            {
                searched = big_list[i];

                for (int j = 0; j < small_list.Count; j++)
                {
                    key = small_list[j];

                    if (String.Compare(searched, key, true) != 0 &&
                        searched.ToLower().StartsWith(key.ToLower()))
                    {
                        if (lookup.ContainsKey(key))
                        {
                            lookup[key].Add(searched);
                        }
                        else
                        {
                            lookup[key] = new List<string>() { searched };
                        }
                    }
                }
            }

Then, I realized that we could optimize this even further, because not all items in the small_list would need to be searched for every item in the big list, because the “key” in the above code, needs to be smaller in length than the searched item.

So, we could improved performance even further by creating another lookup which would store the items of the small_list indexed by the length of the items, so, what that would mean is that, before we iterate through the small_list, we would use this new lookup to get only those items from the small list through which it makes sense to search for.

The code would then change to:

            for (int i = 0; i < big_list.Count; i++)
            {
                searched = big_list[i];
        smaller_list.clear()

        foreach (KeyValuePair<int, List<string>> pair in small_list_lookup)
                {
                    if (pair.Key > excludedFolderKey.Length)
                    {
                        smaller_list.AddRange(small_list_lookup[pair.Key]);
                    }
                }

                for (int j = 0; j < smaller_list.Count; j++)
                {
                    key = smaller_list[j];

                    if (String.Compare(searched, key, true) != 0 &&
                        searched.ToLower().StartsWith(key.ToLower()))
                    {
                        if (lookup.ContainsKey(key))
                        {
                            lookup[key].Add(searched);
                        }
                        else
                        {
                            lookup[key] = new List<string>() { searched };
                        }
                    }
                }
            }

And the dictionary we are using here can be created by the below code:

Dictionary<int, List<string>> small_list_lookup = new Dictionary<int, List<string>>();

            for (int i = 0; i < small_list.Count; i++)
            {
                int length = small_list[i].Length;

                if (small_list_lookup.ContainsKey(length))
                {
                    small_list_lookup[length].Add(small_list[i]);
                }
                else
                {
                    small_list_lookup[length] = new List<string>() { small_list[i] };
                }
            }

Comments

Popular posts from this blog

Tutorial: Using Google Cloud Storage from C# and .NET

Late 2008 Macbook only giving 1.5 gb/s speed with 6 gb/s Intel SSD?

Enable Visual Studio to use more than 2GB of memory