Skip to content

bias mining in political bloggers’ link patterns

I was pretty excited by the work that Andy Baio and Joshua Schachter did to identify and show the political leanings in the link behavior of blogs that are monitored by Memeorandum. They used singular value decomposition [1] on an adjacency matrix between sources and items based on link data from 360 snapshots of Memeorandum’s front page.

For the political news aggregator project, we’ve been gathering link data from about 500 blogs. Our list of sources is less than half of theirs (I only include blogs that make full posts available in their feeds), but we do have full link data rather than snapshots, so I was curious if we would get similar results.

The first 10 columns of two different U matrices are below. They are both based on link data from 3 October to 7 November; the first includes items that had an in-degree of at least 4 (5934 items), the second includes items with an in-degree of at least 3 (9722 items). In the first, the second column (v2) seems to correspond fairly well to the political leaning of the blog; in the second, the second column (v3) is better.

I’ll be the first to say that I haven’t had much time look at these results in any detail, and, as some of the commenters on Andy’s post noted, there are probably better approaches for identifying bias than SVD. If you’d like to play too, you can download a csv file with the sources and all links with an in-degree >= 2 (21517 items, 481 sources). Each row consists of the source title, source url, and then a list of the items the source linked to from 3 October to 7 November. Some sources were added part way though this window, and I didn’t collect link data from before they were added.

[1] One of the more helpful singular value decomposition tutorials I found was written by Kirk Baker and is available in PDF.

{ 2 } Comments

  1. Daniel Zhou | November 22, 2008 at 5:09 pm | Permalink

    this does look nice. i’m trying to study it too

  2. Joshua Gerrish | February 8, 2009 at 2:33 pm | Permalink

    Vahed Qazvivian, Xiaodong Shi and I used co-training instead of SVD to analyze political blog posts. In addition to labeling blog posts, we also did an analysis of the accuracy of the labeling.

    There’s a link to the course paper here:

    http://www.eecs.umich.edu/~cscott/past_courses/eecs545/projects/

    We attempted to label blogs based on a mixture of post content, ingoing and outgoing links. Results weren’t that promising, but we made a couple of design decisions that were probably responsible for this. For one thing, we aggregated all posts for each blog and looked at the blog-level leaning.

    If you’re interested in any of the data or code, feel free to contact me.

Post a Comment

Your email is never published nor shared. Required fields are marked *