Skip to content

bias mining in political bloggers’ link patterns

I was pretty excited by the work that Andy Baio and Joshua Schachter did to identify and show the political leanings in the link behavior of blogs that are monitored by Memeorandum. They used singular value decomposition [1] on an adjacency matrix between sources and items based on link data from 360 snapshots of Memeorandum’s front page.

For the political news aggregator project, we’ve been gathering link data from about 500 blogs. Our list of sources is less than half of theirs (I only include blogs that make full posts available in their feeds), but we do have full link data rather than snapshots, so I was curious if we would get similar results.

The first 10 columns of two different U matrices are below. They are both based on link data from 3 October to 7 November; the first includes items that had an in-degree of at least 4 (5934 items), the second includes items with an in-degree of at least 3 (9722 items). In the first, the second column (v2) seems to correspond fairly well to the political leaning of the blog; in the second, the second column (v3) is better.

I’ll be the first to say that I haven’t had much time look at these results in any detail, and, as some of the commenters on Andy’s post noted, there are probably better approaches for identifying bias than SVD. If you’d like to play too, you can download a csv file with the sources and all links with an in-degree >= 2 (21517 items, 481 sources). Each row consists of the source title, source url, and then a list of the items the source linked to from 3 October to 7 November. Some sources were added part way though this window, and I didn’t collect link data from before they were added.

[1] One of the more helpful singular value decomposition tutorials I found was written by Kirk Baker and is available in PDF.

US political news and opinion aggregation

Working with Paul Resnick and Xiaodan Zhou, I’ve started a project to build political news aggregators that better reflect diversity and represent their users, even when there is an unknown political bias in the inputs. We’ll have more on this to say later, but for now we’re making available a Google gadget based on a prototype aggregator’s results.

The list of links is generated from link data from about 500 blogs and refreshed every 30 minutes. Some of the results will be news stories, some will be op-ed columns from major media services, others will be blog posts, and there are also some other assorted links.

At this early point in our work, the results tend to be more politically diverse than an aggregator such as Digg, but suffer from problems with redundancy (we aren’t clustering links about the same story yet). As our results get better, the set of links the gadget shows should improve.

Update 15 December: I twittered last week that I’ve added bias highlighting to the widget, but I should expand a bit on that here.

Inspired by Baio and Schachter’s coloring of political bias on Memeorandum, I’ve added a similar feature to the news aggregator widget. Links are colored according the average bias of the blogs linking to them. This is not always a good predictor of the item’s bias or whether it better supports a liberal or conservative view. Sometimes a conservative blogger writes a post to which more liberal bloggers than conservative bloggers, and in that case, the link will be colored blue.

If you don’t like the highlighting, you can turn it off in the settings.

wikis in organizations

Antero Aunesluoma presents at WikiFest

In early September, I attended WikiSym 08 in Porto, Portugal, so this post is nearly two months overdue. In addition to presenting a short paper on the use of a wiki to enhance organizational memory and sharing in a Boeing workgroup, I participated on the WikiFest panel organized by Stewart Mader.

Since then, a couple of people have asked me to post the outline of my presentation for the WikiFest panel. These notes are reflections from the Medshelf, CSS-D, SI, and Boeing workgroup wiki projects and are meant for those thinking about or getting started with deploying a wiki in a team. For those that have been working with wikis and other collaborative tools for a while, there probably aren’t many surprises here.

  1. Consider the wiki within your ecosystem of tools. For CSS-D and MedShelf, the wikis were able to offload many of the frequently asked questions (and, to an even greater extent, the frequent responses) from the corresponding email lists. This helps to increase the signal to noise ratio on the lists for list members that have been around for a while, and increasing their satisfaction with the lists and perhaps making them more likely to stick around.

    Another major benefit of moving some of this content from the mailing lists to the wiki is that new readers had less to read to get an answer. If you’ve ever search for the answer to a problem and found part of the solution in a message board or mailing list archive, you may be familiar with the experience of having to read through several proposed, partial solutions, synthesizing as you go, before arriving at the information you need. If all of that information is consolidated as users add it to the wiki, it can reduce the burden of synthesizing information from each time it is accessed to just each time someone adds new information to the wiki.

    In addition to considering how a wiki (or really, any other new tool) will complement your existing tools, consider what it can replace. At Boeing, the wiki meant that workgroup members could stop using another tool they didn’t like. If there was a directive to use the wiki in addition to the other tool, it probably wouldn’t have been as enthusiastically adopted. One of the reasons that the SI Wiki has floundered a bit is that there are at least three other digital places this sort of information is stored: two CTools sites and an intranet site. When people don’t know where to put things, sometimes we just don’t put them at all.

  2. Sometimes value comes from aggregation rather than synthesis. In the previous point, I made a big deal out of the value of using the wiki to synthesize information from threaded discussions and various other sources. When we started the MedShelf project, I was expecting all wikis to be used this way, but I was very wrong. With Medshelf, a lot of the value comes from individuals’ stories about coping with the illness. Trying to synthesize that into a single narrative or neutral article would have meant losing these individual voices, and for content like this, it aggregation — putting it all in the same place — can be the best approach.

    The importance of these individual voices also meant that many more pages than I expected were single-authored.

  3. Don’t estimate the value of a searchable & browsable collection. Using the workgroup wiki, team members have found the information need because they knew about one project and then were able to browse links to documentation other, related projects that had the information they needed. Browsing between a project page and a team member’s profile has also helped people to identify experts on a given topic. The previous tools for documenting projects didn’t allow for connections between different project repositories and made it hard to browse to the most helpful information. But this only works if you are adding links between related content on the wiki, or if your wiki engine automatically adds related links.

    For the wikis tied to mailing lists (CSS-D and Medshelf), some people arriving at the wiki through a search engine, looking for a solution to a particular problem, have browsed to the list information and eventually joined the list. This is certainly something that happens with mailing list archives, but which makes a better front door — the typical mailing list archive or a wiki?

  4. Have new users arrive in parallel rather than serial (after seeding the wiki with content).
  5. The Boeing workgroup wiki stagnated when it was initially launched, and did not really take off until the wiki evangelist organized a “wiki party” (snacks provided) where people could come and get started on documenting their past projects. Others call this a Barn Raising. This sort of event can give potential users both a bit of peer (or management) pressure and necessary technical support to get started adding content. It also serves the valuable additional role of giving community members a chance to express their opinions about how the tool can/should be used, and to negotiate group norms and expectations for the new wiki.

    Even if you can’t physically get people together — for the mailing list wikis, this was not practical — it’s good to have them arrive at the same time, and to have both some existing content and suggestions for future additions ready and waiting for them.

  6. Make your contributors feel appreciated. Wikis typically don’t offer the same affordances for showing gratitude as a threaded discussion, where it is usually easy to add a quick “thank you” reply or to acknowledge someone else’s contribution while adding new information. With wikis, thanks are sometimes rare, and users may see revisions to content they added as a sign that they did something wrong, rather than provided a good starting point to which others added. It can make a big difference to acknowledge particularly good writeups publicly in a staff meeting or on the mailing list, or to privately express thanks or give a compliment.

Continue reading ›

palace ball

For nearly a month now, I’ve been obligated to write a post explaining the rules of Palace Ball (during times of heightened nationalism, it may also be called Freedom Ball). There isn’t a whole lot to it beyond what you see in the above video, but it works roughly as follows:

Palace ball field

Palace ball field

  1. You can play on a rectangular or square field; something about the size of a tennis court or little larger should work. There are end zones at opposite ends. There is no out of bounds at the sides.
  2. The primary ball (or palace ball) starts in the center of the field. It should probably be 18-20″ in diameter, maybe a bit more. Something like this ball should work well. Experiment for best results.
  3. Each of the two players start with a ball for bowling / throwing at the primary ball. These are called bowlers. They should be about 10″ in diameter and can either be kickballs or playground balls like the palace ball. The bowlers should be different colors, since each player can only use his or her own bowler. Unlike in the video, each player should have the same type and size of bowler, or it’s just unfair (I still contend that this is the only reason Ben won the match in the video).
  4. When the game starts, each player tries to repeatedly throw / toss / bowl your bowler in the palace ball, pushing it into the opponent’s end zone. They must release their bowler at least 3 feet from the palace ball.
  5. You can play for a set period of time, or to a certain score.

The game would probably work quite well with doubles (still one bowler per team), but more than that is probably a bit much.

Smart Mobs, iPhone 3G, and AT&T’s Direct Fulfillment process

This is an iPhone post. I’d been waiting to replace my sometimes-barely functioning phone for a good while, so, like many others, I showed up at a local AT&T store on Friday in hopes of getting my iPhone. After spending an embarrassing amount of time in line, and shortly before getting to the front, we were told that the store was out. No problem, I’d place an order and get it when it shows up.

I didn’t think too much about it until a few days later when someone who ordered the same model and color phone at the same store several hours after me mentioned that their phone had shipped. Mine hadn’t, so the sequence of order fulfillment seemed a bit strange. Curious and confused, I turned to Google. This led me to several threads and blog posts discussing AT&T’s Direct Fulfillment system, the longest of which is a now 220-page thread on AT&T’s own customer support system. The discussion in this thread is interesting to me as a customer and as a student. Though the thread contains a bit of vitriol, misinformation, and even paranoia, the posters are able to work together to build a fairly coherent model of AT&T’s direct fulfillment process.

The thread starts out with questions about whether others have received their phones — customers’ questions that can help them calibrate their own expectations. Some eager customers soon noticed that in addition to checking their own order status, AT&T’s order status system allows users to view and track orders from others in their zip code by simply incrementing the order number in the URL. From this, users notice that some orders, placed after their own unshipped orders, have already shipped — is the system unfair somehow, or are some models just shipping sooner? The posters share information and anecdotes that confirm that at least some orders for the same model and color of phone are being posted out of order.

Elsewhere on the web, Greg de Vitry builds a tool that scrapes a range of order numbers and aggregates data from several users to count total daily shipments. The tool’s users see the tool’s shipment tally and begin questioning AT&T’s official statement that they are shipping tens of thousands of orders per day. Greg soon updates his tool to collect model numbers, which again confirms that orders are not being shipped according to first-in-first-out. As more users enter their information, it becomes plausible (if not likely) that forum readers and users of the tool have a better overview of the direct fulfillment process than many of AT&T’s own frontline employees.

The thread’s users eventually begin to seek media attention, hoping that if they expose the number of unshipped orders and haphazard fashion in which they are being filled, Apple and AT&T will be embarrassed enough to ship them their phones faster or compensate them. Users post to CNN’s iReport and email Fox News.

In addition to sharing information, the thread’s posters are telling jokes, commiserating together, and wishing each other luck. The conversation feels very similar to the conversations in the line outside of the AT&T store on Friday, except the forum posters have more diversity in information and can share it with the entire virtual line much more easily than they could with the local lines.

Betting to Improve the Odds

The New York Times has a nice writeup on some of the ways companies are using prediction markets. One of the examples given is Best Buy’s use of an internal prediction market to forecast potential delays in products or services, and to catch these delays in time to prevent further slips. I tend to agree that prediction markets have a lot of potential for the aggregation both public and private information, and their accuracy in some instances is remarkable. This use of forecasting delays, though, does raise a couple questions.

As I think about the Best Buy example or similar uses, it seems plausible that there is also a bit of a self-fulfilling prophecy effect. If my work is on schedule, but the prediction market indicates that a project I’m assigned to is going to be late by a month, I may slow down to match the forecast delay. In the example in the article, the claim is that the expected delay had the opposite effect — it directed attention to the problem so further delays could be avoided — but the self-fulfilling prophecy effect still seems like a plausible outcome in many situations.

A second question I found myself thinking about (and this one is a bit more of a stretch) is how much companies will need to make prediction market information public. If a company has announced a planned release date or expected sales, and management’s forecast shows them on target, but the company’s prediction market shows them under performing, how much do they have to tell shareholders? Might there be situations in which the management wouldn’t want to have this information, for fear of shareholder lawsuits or other consequences if they do not disclose it?

Training, Integration, and Identity: A Roundtable Discussion of Undergraduate and Professional Master’s Programs in iSchools

Libby Hemphill and I are hosting a roundtable discussion at the 2008 iConference, hosted by UCLA, at the end of February.

Professional students, whether undergraduates or masters’ students, represent a significant portion of the iSchool community. How do iSchools effectively educate those students while continuing to develop successful research programs? This roundtable discussion will focus on how iSchools educate their professional students and engage them in the research aspect of their programs. Innovative approaches to training and integration will be the central theme of this discussion. In an iSchool – where students training for professions including librarianship, information policy, human-centered computing, preservation and researchers exploring such topics as incentive-centered design, forensic informatics, computational linguistics, and digital libraries have both competing and complimentary goals – the potentials for collaboration, innovation, misunderstanding, and disharmony are all high.

The annual iConference provides a unique opportunity for us, as a community, to discuss the roles our professional students have in shaping our identity and our practices. The proposed roundtable will invite participants to discuss questions such as:

  • What should the role of research in training information professionals be?
  • How can we best engage professional students in our research?
  • How do iSchools address the unique curricular challenges we face in preparing students for a very wide variety of careers?
  • What do we want an Information degree to signal in the marketplace?
  • What are some successes in which research and professional training have benefited one another?

Participants will share innovative approaches to professional education, best practices in engaging professional students in research programs, and remaining challenges. We intend roundtable participation to represent the diversity of iSchools’ current programs.

We’ve setup a wiki for pre-conference sharing of exemplary programs, questions, and thoughts. It’s pretty sparse right now, but we’ll be adding some of our thoughts before the conference, and we welcome your contributions!

This is a topic that I started giving more thought around the time of the 2006 iConference, and I am looking forward to the discussion in February.

social sites repurposing contacts

A month or so ago, Cory Doctorow wrote a column about how your “creepy ex-co-workers will kill Facebook,” and introduced what he calls “boyd’s law:”

Adding more users to a social network increases the probability that it will put you in an awkward social circumstance.

I think there’s an important corollary: adding more features and content types to a social network increases the probability that it will put you in an awkward social circumstance.

Recent concerns about Beacon are one example. Yes, the privacy issues of an opt-out tool that follows you around from site to site recording your behavior are huge. But there’s also the issue of having this content added to the Facebook at all. Even among my close friends, I don’t want a list of their recent purchases. It’s not something we do in person, and it’s not something I want to do online. A site, though, can cause the same problem by adding content that I share with some people, but not necessarily my current friends. Facebook users presumably friend each other based on the norms for sharing the content that existed on Facebook at the time, adding more content or just changing how Facebook shares the content already there can cause some problems. Some of the content Beacon tried to so forcefully share isn’t that much different than if LinkedIn suddenly started sharing relationship status: you don’t want software deciding to re-purpose one set of social ties into another. For now, Facebook is handling this challenge with extremely fine-grained privacy controls, but that’s a lot of overhead.

The de-placing of facebook
When Facebook was smaller and the bounds were clearer, users had less need of the privacy settings. Two years ago, I had a pretty clear distinction in my head. Facebook was for some social communication and sharing among my college friends and some friends from high school. It had a clear identity, and felt either like a place or very connected to my school as place. I knew who I would “run into” on Facebook, and I knew that the content would be related to college students’ self-expression, communication, and socialization. Within the bounds, it was possible to identify a fairly consistent set of behaviors and information that members were willing to share with each other. Not so anymore. As Facebook adds users and features, it undermines this sense of place. Anyone, including the creepy ex-coworker, might show up. With new features and new applications, I am also less able to anticipate Facebook’s content.

I’m not necessarily criticizing Facebook’s decision to reduce their placiness. Its leadership has decided to trade some of the sense of place for growth, instead becoming an application platform and contact/identity management system. That’s their gamble to take, but I am critical that they seem to be moving in this direction without clearly thinking through some of the consequences for their members.

Other examples of repurposing contacts
Facebook isn’t the only company that has recently re-purposed existing social network information to share additional content. This December, if you use Google Reader and GTalk, Google decided to share all of your shared RSS feed items with all of your GTalk contacts. Your GTalk contacts were already being added to from people you email, so for many users, this exposed their shared items to many people they’d emailed a few times. This decision seems to be based on the incredibly naïve assumption that if you share content with some people, you want to share it with everyone you email. One user reported that this “ruined Christmas.”

Unforunately, as Google and Yahoo increasingly leverage our inboxes to compete with Facebook, we can probably look forward to more of the missteps.

Pursuit of places
I do think it’s possible to grow while keeping a distinct sense of place. After purchasing Flickr, del.icio.us, and upcoming, Yahoo! kept their contact lists separate and retained the identity of each property. Some would probably criticize Yahoo! for not integrating their brand, but I think that time will show they’ve made the right choice. It’s also true that managing the separate contact lists is very similar to the overhead of Facebook’s privacy settings, but there are a some key differences: managing your Flickr contacts does not interfere with the sense of Flickr as a bounded place, and you can (at least currently) be reasonably comfortable that Flickr is not going to repurpose your Flickr contacts outside of the social norms for Flickr users.

This also makes me believe that social startups like dopplr and others can succeed by creating a clear identity as a place. Even if Facebook offered better features (and perhaps more convenience) for sharing my travel status and tips with others, I’d still seek out Dopplr for its characteristics as a space — it’s a much more pleasant experience.

citizen-centered design and regulation in cabin design

This is a quick and very late heads up about Ken Erickson’s participation in a panel organized by Dori Tunstall at AAA this morning. The below description is cribbed from Dori’s blog:

Anthropologist Ken Erickson explores the world of FAA and Americans With Disabilities Act (ADA) regulations in the design of Boeing airplanes accessible to people with physical disabilities. He addresses how interdisciplinary teams handle the conflicts between the ethos of citizen-centered designing and formal government regulation.

Ken’s company, Pacific Ethnography, did some work with my group on universal cabin design.

On a mostly unrelated note, a profile of my workgroup appeared in this month’s Frontiers (pdf).

neartime: find flickr photos taken nearby in time and space

Ever since I first started geotagging photos and posting them to Flickr, I’ve wanted to use this information to find photos that were taken in roughly the same location at roughly the same time. can I find photos with myself in them? Can I find other pictures from an event without having to use textual searches? I’m not the only person with these aspirations. Building on a post by Dave Winer about a similar experience in Social Cameras, Thomas Hawk of Zoomr talks about combining location information with timestamps to find near photos. Mor Naaman mentions this form of browsing in an October 2006 article in Computer, noting that Microsoft’s World-Wide Media Exchange (WWMX) let you browse photos by time and location in 2003.

WWMX’s photo database, though, is very small. Flickr has many, many more geotagged and timestamped photos. Flickr doesn’t make that particularly easy to explore by time and date within their interface. To find photos taken near, in location and time, to a given photo from a photo’s page, you would: (1) Go to the photo’s page. (2) Click the map. (3) Click to explore photos taken near that location. (4) Adjust map to desired zoom level. (5) Once the map loads, open the filters. (6) Enter a taken on date from the original photo. (7) If there are many results, go to the “link to the this page” link. (8) Paste URL in browser. (9) Edit time range in the URL. Hit enter. (10) Repeat.

I’m lazy; this was too much for me. Additionally, depending on the location and the event, I may want to play with the parameters a bit, and wanted a better interface to do this than the URL. For my own convenience, I’ve written a bookmarklet that will take you from a (geoagged) Flickr photo page to a page of thumbnails of photos located nearby geographically and chronologically. You can try it by dragging the below bookmarklet to your toolbar:

neartime

You may get some unexpected results. There are four general contributors to this that I’ve noticed. (1) Some users just geotag photos wherever seem right. (2) Some users don’t have their date and time set right. (3) Sometimes the combination of users’ recorded photo info and time zones doesn’t work out (I’m using a time zone offset from the photo’s location, which helps a bit). (4) Sometimes the Flickr search returns incorrect results.

There are also some better ways to implement search (particularly with respect to paginating photos according to distance and time rather than the options provided by the Flickr API), but those will have to come later. In the meantime, have fun and let me know what you think.

Update – 24 August 2010: I’ve updated the bookmarklet to work with the new Flickr photo page.