Last week I presented our first Sidelines paper (with Daniel Zhou and Paul Resnick) at ICWSM in San Jose. Slides (hosted on slideshare) are embedded below, or you can watch a video of most of the talk on VideoLectures.
Opinion and topic diversity in the output sets can provide individual and societal benefits. If news aggregators relying on votes and links to select and subsets of the large quantity of news and opinion items generated each day simply select the most popular items may not yield as much diversity as is present in the overall pool of votes and links.
To help measure how well any given approach does at achieving these goals, we developed three diversity metrics that address different dimensions of diversity: inclusion/exclusion, nonalienation, and proportional representation (based on KL divergence).
To increase diversity in result sets chosen based on user votes (or things like votes), we developed the sidelines algorithm. This algorithm temporarily suppresses a voter’s preferences after a preferred item has been selected. In comparison to collections of the most popular items, from user votes on Digg.com and links from a panel of political blogs, the Sidelines algorithm increased inclusion while decreasing alienation. For the blog links, a set with known political preferences, we also found that Sidelines improved proportional representation.
Our approach differs and is complementary to work that selects for diversity or identifies bias based on classifying content (e.g. Park et al, NewsCube; ) or by classifying referring blogs or voters (e.g. Gamon et al, BLEWS). While Sidelines requires votes (or something like votes), it doesn’t require any information about content, voters, or long term voting histories. This is particularly useful for emerging topics and opinion groups, as well as for non-textual items.