Sorting .Net Collections

(by John Roberts for Blayd Software)

Note: this article relates to the .Net Framework version 2.0, some of the classes and class members referenced in this article were not available in earlier framework versions.

Introduction

Collections with sorting capabilities are required in many situations, for example, we may want to provide advanced or more efficient searching capabilities, such as binary searching, which require the collection to be sorted, we may want to provide greater flexibility to users of the collection or we may want to provide application developers with the functionality to data bind the collection to a user interface control that supports sorting. For collections or lists that are only used internally, within a component, the .Net Framework provides a few, ready to use, classes that provide sorting functionality. The basic Array class has been in the framework from version 1.0 and provides sorting and a binary search method and from version 2.0 of the .Net Framework has provided support for generics. The SortedDictionary<TKey, TValue> and SortedList<TKey, TValue> classes, from the System.Collections.Generic namespace both store their data is sorted order, based on their keys but neither class provides (explicit) sort or binary search methods. The System.Collections.Generic.List<T> class provides (explicit) sorting and a binary search method. The List<T> class uses the Array class, internally, to provide its sorting functionality and is fine for use internally within a component, however, it relies on the caller to supply a comparer, rather than specifying the required sort property, when sorting multi-property business objects and therefore it may not be ideal for public exposure from within a component.

Extending the List<t> class to provide a collection with sorting capability from a component is a possibility, however, if you don't like the idea of exposing the existing List<T> sort methods to external callers and we don't, then a different solution is needed. If you are wondering why we don't like the idea of exposing the List<T> sort methods, it is because we prefer to expose the sort methods in a more caller or user friendly way and there is also the matter of controlling the sorting process. If we let the caller specify the comparer then we have no way of knowing whether the sort criteria are valid and if they are not then any exception that occurs in the comparer may pass through our sort method an will almost certainly leave our collection in an unknown state. Fortunately there is a fairly simple way around this potential problem, derive from System.Collections.ObjectModel.Collection<T> rather than List<T>. The Collection<T> base class does not have any existing sort methods and we can, therefore, implement our sort methods in any way we choose and as an added bonus we can, as you will see later in the article, still use the List<T> class internally to sort the collection, which, interestingly, means that we will ultimately be using the Array class to sort our collection!

In this article we going to discuss the development of a collection class with sorting and binary search capabilities. The collection is primarily intended to be used from code, however, as you will see later in the article, the collection can be used by application developers to data bind to a user interface control that has sorting capabilities e.g. the DataGridView control. If the primary objective was to provide a collection with sorting and searching functionality for data binding then developing a collection with an IBindingList implementation would be the preferred solution. However, for the purpose of this article we are assuming that a full IBindingList implementation would be overkill and our intention is to produce a collection that may be used for data binding but that is not its primary purpose. If you are interested in and want to know more about IBindingList then the Blayd Software DataSource and BindingList article details the development of a BindingList<T> derived collection class and a data source class that is used to supply the collection to binding clients.

We will show as much of the source code as we have space for, however, we will not able to show it all. You can download the full source code for this article using the link provided in the "Article Options" on the left of the page. The source code shown in this article will be in C# but the download sample code is available as either a C# or a VB Visual Studio 2005 project.

Collected Class

In order to develop the collection class we need something to put in it i.e. the collected class and for the purpose of this article we need a business type object that implements several properties on which the collection can be sorted and to make it more interesting and relevant the properties need to be of differing types i.e. String, Integer, DateTime, etc. The Identity class is to be our collected class, instances of this class will be stored in the collection and the collection will provide the functionality to allow callers to sort the collection on any of the Identity class property values.

ExpandPartial listing of the Identity class:

There is nothing particularly remarkable about the Identity class, it simply provides the names, gender, date of birth, etc. of a fictitious (for the purpose of this article) person. However, it does provide us with several property types to sort on, Long, Integer, String, DateTime and the Gender enumeration.

Comparer Class

As you will see later in this article we are going to utilize the List<T> class to perform the sorting and searching. To make use of the List<T> sorting and searching functionality we will need to supply a comparer to both the sort and the binary search methods. There are, typically, several available options when specifying a comparer, the Comparison<T> delegate or an IComparer<T> implementation either in a class derived from the Comparer<T> abstract base class or in your own class. If you are searching or sorting on a single type of value, for example, a String or an Integer then the StringComparer or Comparer<T>.Default (for the relevant type and providing the type implements the IComparable<T> interface) would be available.

We want to provide the functionality to sort on any of the Identity class properties and we also want to make use of the List<T> sort and binary search methods internally, therefore we need to develop an implementation of the generic IComparer<T> interface. Because we are sorting the list based on the value of a caller specified Identity class property, our comparer needs to know what the sort property is and it also needs to have the ability to retrieve the value of the specified property from the two Identity instances passed to the Compare method. This seems like a fairly straight forward requirement, however, it can be tricky to implement this kind of functionality without creating either an explicit or an implicit dependency between the comparer and the Identity class.

We are going to use the System.ComponentModel.PropertyDescriptor class to allow callers to specify the required sort property because this technique is compatible with the approach specified in the IBindingList interface and a PropertyDescriptor instance is fairly easy for the caller to create and provides all of the information the collection needs to sort its data. This is all starting to look pretty good, we can pass the PropertyDescriptor to our comparer, in the constructor, to specify which Identity property to use in the comparison. It gets even better when we examine the members available in the PropertyDescriptor class, it has a PropertyType property which returns the Type of the property it is representing and a GetValue method that returns the value of the property for a specified instance of the Identity class. However, there is one major drawback to using this approach, the GetValue method is, typically, implemented using reflection and reflection calls are relatively slow. The comparer can resolve the Type of the required sort property when it is initialized as this value remains constant throughout the sort but it will have to make a GetValue call on each Identity instance passed to its Compare method and it will have to make these two calls on every call to its Compare method, therefore, sorting a large list with a fairly random distribution of objects, based on the sort property, is going to be relatively slow.

An alternative approach to the PropertyDescriptor, reflection, technique is to provide our comparer class with knowledge of the Identity class. If the comparer knows about the properties of the Identity class it can access the properties directly. However, there is also a drawback to this approach, it introduces a dependency between the comparer and the Identity class that may not be apparent to a developer who, at sometime in the future, extends the Identity class by adding one or more new properties. If the comparer class is not updated at the same time, it will not know about the new properties and will not, therefore, be able to sort on any of the new properties. Although this is, undoubtedly, a pain, it should be immediately obvious to the developer making the changes to the Identity class as, surely, they will regression test the sorting process.

We now have a choice to make, do we go for the PropertyDescriptor, reflection, technique which requires very little code, is fairly intuitive and providing the Type of each Identity property implements the IComparable<T> interface as a fallback, does not introduce a dependency between the compare and the Identity classes. Well no, we don't, in the tests that we have made, this technique performs poorly in relation to the performance achieved by directly accessing the Identity class properties. Performance degrades as the size of the collection grows but the difference can be noticeable in relatively small collections e.g. around 50 objects, therefore, if your collection will never contain a lot of objects or if the distribution of the objects is fairly regular you might take a different view. However, even though there is more code required to resolve the PropertyDescriptor to the relevant Identity class property, the code is not very pretty and remembering to update the comparer when updating the Identity class is not ideal, we still prefer the option that provides the best performance.

Note

If you prefer a more generic approach and require a comparer that can be used to compare any property of any class then you should check out the comparers in the Code Generator section of the site.

PropertyComparer<T> Standard Introduction

Generating and Using Comparer Classes

ExpandPartial listing of the IdentityComparer class:

It is immediately evident from the above listing that the IdentityComparer class is tied to a particular version of the Identity class, the _sortProperties array field is populated with the names of the Identity class properties. The sort array index of the Name of the PropertyDescriptor specified in the constructor is retrieved and is then resolved in the CompareCore method to the relevant Identity class property to use in the comparison. There are other, more maintenance friendly, ways of retrieving the Identity class property names, we could, for example, use reflection or we could just use the Identity class PropertyDescriptorCollection directly, however, ultimately we would still have to resolve the property name to the actual property in the CompareCore method, so the comparer would still be tied to a particular Identity class version. The technique of explicitly naming the Identity properties in the static constructor does, at least, make the dependency obvious.

Collection Class That Supports Sorting

To provide a collection with sorting and binary searching capabilities we have chosen to extend the generic Collection<T> base class. The Collection<T> class is located in the System.Collections.ObjectModel namespace and can be used as is i.e. it is a concrete base class or can be extended to provide extra functionality. We can use the constructors of our derived collection class to ensure that the Collection<T> base class uses an instance of the List<T> class as its internal storage, the Collection<T> base class will wrap the List<T> instance that we specify and will make it available to our derived collection class in the protected Items property. Using this technique enables our derived collection class to use the sorting and searching methods provided by the List<T> class rather than explicitly implementing its own sorting and searching functionality.

ExpandPartial listing of the IdentityCollection class:

The IdentityCollection class provides its own BinarySearch and Sort overloaded methods, however, ultimately both methods delegate to the List<T> instance that is used internally as storage. The IdentityCollection class constructors ensure that the Collection<T> base class uses an instance of the List<T> class for internal storage. Both the BinarySearch and Sort methods pass an instance of the IdentityComparer class to the List<T> instance to enable the search and sort to be performed on the Identity property specified in the PropertyDescriptor. The BinarySearch method uses the current sort property by passing the _sortProperty field to the IdentityComparer constructor because the binary search will only work as expected if the internal list is already sorted on the property that is being searched. The Sort method uses either the default sort property, by calling the GetDefaultSortProperty method or the property in the specified PropertyDescriptor, depending on which overload is called.

Data Binding the Collection Class

As stated earlier in the article, if we were creating a collection class that is intended primarily to be used by user interface components or controls, the recommend option would be to develop a collection that provides an implementation of the IBindingList interface. The easiest way of providing an IBindingList implementation is by deriving your own collection class from the generic BindingList<T> class which is contained in the System.ComponentModel namespace. However, this option is overkill in a lot of situations and it is certainly easier just to add the functionality you require to a class derived from Collection<T> than it is to implement all of the requirements of the IBindingList interface even though the BindingList<T> class does ease the burden. The IdentityCollection class that we have developed here may be intended primarily to be used from code but it can still be used by application developers and it can still be data bound to a user interface component or control. Application develops can also make use of the sorting functionality provided by the IdentityCollection class, they will just have to sort the collection from code, rather than have the user interface component or control sort the collection automatically.

To data bind the IdentityCollection to the DataGridView control:

  • Add a DataGridView control to a Form and position, size and anchor it as required.
  • Add a BindingSource component to the Form.
  • Set the BindingSource.DataSource property to the IdentityCollection class. You will have to add the collection as a project data source as part of the setting process.
  • Set the DataGridView.DataSource property to the BindingSource component.
  • In the DataGridView properties window, click the ellipsis button in the Columns property setting to display the edit column collection dialog. In the dialog, arrange the Identity class property columns in the required order and set the SortMode of each column to Programmatic.
  • In the DataGridView properties window, set the MultiSelect property to False.

The rest of the data binding and sorting has to be achieved in code within the Form. Create a Form level field of type IdentityCollection and a field of type DataGridViewColumn. Create a helper method to populate the IdentityCollection, in this example the method is named LoadCollection. Define and connect a handler to the Form.Load event.

ExpandIn the Form and Form.Load event handler, place the following code:

At this point if you run the application you will see the contents of the IdentityCollection displayed in the DataGridView control, however, you won't be able to sort the collection. A bit more code is required to perform sorting. Define and connect a handler to the DataGridView.ColumnHeaderMouseClick event.

ExpandIn the DataGridView.ColumnHeaderMouseClick event handler, place the following code:

If you run the application now you will see the contents of the IdentityCollection in the DataGridView control and you will be able to sort the grid by left clicking any of the grid header cells. Clicking the current sort header cell again will reverse the sort order. In addition to the sorting, demonstrated above, if you have set the DataGridView properties appropriately, you will be able to edit an Identity and add an Identity to or remove an Identity from the IdentityCollection.

The updating of the DataGridView works because the BindingSource component implements its own ListChanged event, so even though our collection does not implement IBindingList, which defines the ListChanged event, the BindingSource component is smart enough to raise its own event when we call the BindingSource.ResetBindings method. Incidentally, you might be tempted to call the BindingSource.ApplySort method, rather than the IdentityCollection.Sort method, however the call will fail because the BindingSource component only supports the ApplySort method for collections that implement the IBindingList interface.

Conclusion

Adding a sort and binary search capability to a collection class is beneficial in many situations and it is certainly fairly straight forward to take advantage of the sorting and searching functionality already present in some of the .Net Framework classes. If you need smarter or perhaps faster sorting then you can always use your own algorithm, however, in many situations the QuickSort algorithm provided in the Array class is adequate and you do have the opportunity to fine tune your comparer to get the desired results. Whether you take the minimalist approach described in this article or go with a full-blown IBindingList implementation depends on what you are trying to achieve with the collection class and to a certain extent how much time you have!