Map and Reduce in Coldfusion, a pratical example
Traditionally, functions such as map() and reduce() have been limited to Arrays, but being that Coldfusion rocks hard \m/, I wanted to go one step further and extend the functionality to Structures as well.
This example makes use of both types of collections types (arrays and structs) and several Collections.cfc methods to find the most popular keywords used in the descriptions of the blogs that ColdfusionBloggers.org aggregates.
Obtain our dataset
The blog feed list from ColdfusionBloggers.ors is presented in OPML, which means we can easily extract out the descriptions from each blog item using xml functions. Extract the descriptions
Parsing the feed has given us an array of xml objects to work with, but ideally we only want the value of the 'description' element from each item. Using map() we can transform the collection into a lighter weight version of just the descriptions from each blog entry, an array of strings. Break down the descriptions
This next step could have been combined when we maped the collection before, but it segways nicely into an example of using forEach(). forEach() doesn't create or return another collection, it simply lets you take action on (or modify) each item in it. We will use forEach to further breakdown the descriptions into individual words. Flatten the Heirarchy
After breaking down the descriptions into individual words, we are left with an array of arrays collection. While that is ok, it will makes things a easier down the road to flatten out the hierarchy into a single dimension. This will result in a one dimensional array containing each of the words used in all the blog descriptions. Count each Word
Now that we have an array with all of the keywords from the description, we can reduce the collection to down to unique words and the number of times they appear. Ignore common words
To get meaningful results from our collection we can use the filter method to get rid of common words, articles and symbols. Filter will only keep those items in the collection that 'pass' the comparitor. Most Popular Word
Now we that we have the word counts, we can search the collection using max() to find the word with the highest frequency. We could sort the collection at this point and just pop off the first recrod, but Collections.cfc has a max() method which allows you to define what is deamed the 'largest'. Top 40 Popular Words
For an extra bit of fun, the following code will reduce the collection into an array so that it can be sorted, and then from that, display a list of the top 40 most popular keywords people have used to describe their blog. After running all of the code above, you will end up with the following:
"Perpetual optimism is a force multiplier." - Colin Powell