Kevin Greer's Stuff
[ start | index | login ]
start > 2005-05-05 > 1

2005-05-05 #1

Created by kgr. Last edited by kgr, 3 years and 155 days ago. Viewed 229 times. #1
[edit] [rdf]
labels
attachments

Compound Collections

Collection classes group together the set of objects. Different types of collections provide different semantics and access characteristics. For example, a "bag" is a collection which allows duplicates, whereas a "set" is one that doesn't. A "map" or "dictionary" lets you store a key and value together and lookup the value based on the key. Maps are usually implemented as either hash tables or some sort of tree structure. Tree structures have the advantage of storing the in sorted order. You also often have lists, queues, double ended queues, priority queues, etc.

An overview of the Java collection framework:

>>http://www.cs.sjsu.edu/faculty/kirchher/CS151/CollectionsOverview.htm

Interfaces and specifications for the Smalltalk-80 collection classes:

>>http://portal.acm.org/citation.cfm?id=141937.141938

Here is the Collection hierarchy from one Smalltalk system:

Collection
     KeyedCollection
       ArrayedCollection
         Array
         ByteArray
         List*
           OrderedCollection
         String
       Dictionary*
         IdentityDictionary*
     Set*
       IdentitySet*

While many languages now have a decent set of "simple" collection classes, what I think is really needed now is "compound", "composite", or "multi-" collections.

When programming, I often need to access a group of data in several different ways. Maybe I need to do lookups on more than one field or I need to traverse the data in more than one sort order. Often I want to store the data in a priority queue for processing but then need to lookup data from somewhere in the middle of the queue based on some key other than the queue order. With conventional collection classes I need to, either choose the access method that I plan on using most often and then perform inefficient sorts or scans for everything other than the primary means or else I need to add the data myself to multiple collections at once. For maintainability it is best to encapsulate my "collection of collections" within an Object where I can hide the details of manipulating multiple collections from the clients of the library. The problem with this is that my Object probably doesn't implement all of the collection class interfaces that it encapsulates and I can't just hand out my inner-collections to others for fear that they'll actually use them and corrupt my compound collection. Another problem is that writing this type of code is mindless and error-prone.

What we need is a composite collection: a collection which is composed of many individual collection classes, each contributing its own set of semantics and access characteristics. When I add or remove data from the compound collection it is actually added to all of the composite collections.

Non-Collection Components

Not all of the components of a compound collection need to actually be collections; they merely need to be parties which are interested in things like additions and removals from the collection. An example might be a "Size" component. Some collection implementations, for example arrays, know their size, while others, like trees, often don't. If you wanted to know the size of a collection you could simply add the "Size" component and it would just keep a running tally of the collections size as items were added and removed. You might do something similar but a little more complicated to keep a running total of balance, queue statistics, value distribution or any other type of summary information. You could also add bodyguard types of components to prevent certain types of data from being added. You could restrict the types of objects which could be added to your collection or add restrictions on individual values. You could disallow duplicate values on particular fields. You could adapt values as they enter into or out-of the collection (cloneing or freezing for example to prevent corruption of your collection). You could add filters or authentication. A very useful component would be one that serializes or journals updats to a file in order to add simple persistence.

Self Composing Collections

Some access semantics could themselves be created as Objects. This would include things like search criteria, filters, sort-orders, etc. You could then apply these generic objects to any compound collection and if the collection did not support the feature, the feature would add itself to the collection, initialize itself, and then redo the call. This means that a collection might start off small but then would grow by adding more composites as more demands where placed upon it. An example might be that when I first attempt to lookup a value by "last-name" that a composite is added to handle the lookup. Even though I "might" lookup by "middle-name", the overhead of maintaining this index is never incurred unless I actually "do" lookup by that value.

Many applications get around the single access characteristic problem of collection class libraries by storing their data in relational databases, even when all of their data would easily fit in memory. There are a lot of performance and impedance-mismatch problems with this approach.

I think that most applications need this feature and that once developers get used to it they will wonder how they ever lived without it.

I intend to support compound collections in FISH.

Please login to post a comment.
peerbox.com | Copyright 2005-2006 Kevin G. R. Greer