together, in a sense : @display

{ Tag Archives } research

4 May 2010

Word clouds to support reflection

When preparing our Persuasive 2010 paper on Three Good Things, we ended up cutting a section on using word clouds to support reflection. The section wasn’t central to this paper, but it highlights one of the design challenges we encountered, and so I want to share it and take advantage of any feedback.

Our Three Good Things application (3GT) is based on a positive psychology exercise that encourages people to record three good things that happen to them, as well as the reasons why they happened. By focusing on the positive, rather than dwelling on the negative, it is believed that people can train themselves to be happier.

Example 3GT tag clouds

When moving the application onto a computer (and out of written diaries), I wanted to find a way to leverage a computer’s ability to analyze a user’s previous good things and reasons to help them identify trends. If people are more aware of what makes them happy, or why these things happen, they might make decisions that cause these good things to happen more. In 3GT, I made a simple attempt to support this trend detection by generating word clouds from a participant’s good things and reasons. I used simple stop-wording, lowerizing, and no stemming.

Limited success for Word Clouds

When we interviewed 3GT users, we expected to find that the participants believed the word clouds helped them notice and reinforce trends in their good things. Results here were mixed. Only one participant we interviewed described how the combination of listing reasons and seeing them summarized in the word clouds had helped her own reflection:

“You’ve got tags that show up, like tag clouds on the side, and it kind of pulls out the themesâ€¦ as I was putting the reasoning behind why certain [good] things would happen, I started to see another aspect of a particular individual in my life. And so I found it very fascinating that I had pulled out that informationâ€¦ it’s made me more receptive to that person, and to that relationship.”

A second participant liked the word cloud but was not completely convinced of its utility:

I like having the word cloud. I noticed that the biggest thing in my reason words is â€œcatâ€. (Laughs). And the top good words isnâ€™t quite as helpful, because Iâ€™ve written a lot of things like â€˜greatâ€™ and â€˜enjoyingâ€™ â€“ evidently Iâ€™ve written these things a lot of times. So itâ€™s not quite as helpful. But itâ€™s got â€˜catâ€™ pretty good there, and â€˜morningâ€™, and Iâ€™m not sure if thatâ€™s because Iâ€™ve had a lot of good mornings, or I tend to write about things in the morning.

Another participant who had examined the word cloud noticed that “people” was the largest tag in his good things cloud and “liked thatâ€¦ [his] happiness comes from interaction with people,” but that he did not think that this realization had any influence over his behavior outside of the application.

One participant reported looking at the word clouds shortly after beginning to post. The words selected did not feel representative of the good things or reasons he had posted, and feeling that they were “useless,” he stopped looking at them. He did say that he could imagine it “maybe” being useful as the words evolved over time, and later in the interview revisited one of the items in the word cloud: “you know the fact that it says ‘I’m’ as the biggest word is probably good â€“ it shows that I’m giving myself some credit for these good things happening, and thatâ€™s good,” but this level of reflection was prompted by the interview, not day-to-day use of 3GT.

Another participant did not understand that word size in the word cloud was determined by frequency of usage and was even more negative:

It was like you had taken random words that I’ve typed, and some of them have gotten bigger. But I couldnâ€™t see any reason why some of them would be bigger than the other ones. I couldnâ€™t see a pattern to it. It was sort of weirdâ€¦ Some of the words are odd wordsâ€¦ And then under the Reason words, it’s like theyâ€™ve put together some random words that make no sense.

Word clouds did sometimes help in ways that we had not anticipated. Though participants did not find that they helped them identify trends that would influence future decisions, looking at the word cloud from her good things helped at least one participantâ€™s mood.

I remember ‘dissertation’ was a big thing, because for a while I was really gunning on my dissertation, and it was going so well, the proposal was going well with a first draft and everything. So that was really cool, to be able to document that and seeâ€¦ I can see how that would be really useful for when I get into a funk about not being able to be as productive as I was during that timeâ€¦ I like the ‘good’ words. They make me feel, I feel very good about them.

More work?

The importance of supporting reflection has been discussed in the original work on Three Good Things, as well as in other work that has shown how systems that support effective self-reflection can improve usersâ€™ ability to adopt positive behaviors as well as increase their feelings of self-efficacy. While some users found benefit in word clouds to assist reflection, a larger portion did not notice them or found them unhelpful. More explanation should be provided about how word clouds are generated to avoid confusion. They should also perhaps not be shown until a participant has entered a sufficient amount of data. To help participants better notice trends, improved stop-wording might be used, as well as detecting n-grams (e.g. â€œdidnâ€™t smokeâ€ versus â€œsmokeâ€) and grouping of similar terms (e.g., combining â€œbreadâ€ and â€œporkâ€ into â€œfoodâ€). Alternatively, a different kind of reflection exercise might be more effective, one where participants are asked to review their three good things posts and write a longer summary of the trends they have noticed.

Also tagged design, social software, three good things, word clouds

Comments (2)

28 January 2010

Using Mechanical Turk for experiments

In my upcoming CHI paper, “Presenting Diverse Political Opinions: How and How Much,” we used Amazon’s Mechanical Turk (AMT) to recruit subjects and to administer the study. I’ll talk a bit more about the research questions and results in a future post, but I’ve had enough questions about using Mechanical Turk that I think a blog post may be helpful.

In this study, Paul Resnick and I explored individuals’ preferences for diversity of political opinion in online news aggregators and evaluated whether some very basic presentation techniques might affect satisfaction with the range of opinions represented in a collection of articles.

To address these questions, we needed subjects with known political preferences, from the United States, and with at least some very basic political knowledge, and we wanted to collect some demographic information about each subject. Each approved subject was then assigned to either a manipulation check group or to the experimental group. Subjects in the manipulation check group viewed individual articles and indicated their agreement or disagreement with each; subjects in the experimental group viewed entire collections and answered questions about the collection. The subjects in the experimental group were also assigned to a particular treatment (how the list would appear to them). Once approved, subjects could view a list up to once per day.

Screening. To screen subjects, we used a Qualification test in AMT. When unqualified subjects viewed at task (HIT – human intelligence task, in mTurk parlance), they were informed that they needed to complete a qualification. The qualification test asked subjects two questions about their political preferences, three multiple choice questions about US politics, and a number of demographic questions. Responses were automatically downloaded and evaluated to complete screening and assignment.

To limit our subjects to US residents, we also used the automatic locale qualification.

Assignment. We handled subject assignment in two ways. To distinguish between the treatment group and the manipulation check group, we created to additional qualifications that were automatically assigned; an approved subject would be granted only one of these qualifications, and could thus could only complete the associated task type.

Tasks (HITs). The task implementation was straightforward. We hosted tasks on our own server using the external question HIT type. When a subject loaded a task, AMT passed us the subject’s worker ID. We verified that the subject was qualified for the task and loaded the appropriate presentation for that subject. Each day, we uploaded one task of each type, with many assignments; assignments are the number of turkers that can complete each task.

Because we needed real-time access to the manipulation check data, the responses to this task were stored in our own database after a subject submitted the form; the subject could then return to AMT. This was not necessary for the experimental data, and so the responses were sent directly to AMT for later retrieval.

Quality control. Careless clicking or hurrying through the task is a potential problem on mTurk. Using multiple raters does not work when asking subjects about their opinions. Kittur, Chi, and Suh recommend asking Turkers verifiable questions as a way to deal with the problem¹. We did not, however, ask verifiable questions about any of the articles or the list, because that might have changed how turkers read the list and responded to our other questions. Instead, we randomly repeated a demographic question from the qualification test. 5 subjects changed their answer substantially (e.g. aging more than one year or in reverse or shifting on either of the political spectrum questions by 2 points or more). Though there are many possible explanations for these shifts â€“ such as shared accounts within a household, careless clicking, easily shifting political opinions, deliberate deception, or lack of effort â€“ all of these explanations are not desirable for study subjects, and so they were excluded. We also examined how long each subject took to complete tasks (looking for implausibly fast responses); this did not lead to the exclusion of any additional subjects or responses.

Some reflection. We had to pay turkers a bit more than we expected (~$12/hr) and we recruited fewer subjects than we anticipated. The unpaid qualification task may be a bit of a barrier, especially because potential subjects could only complete one of our paid tasks per day (and only one was listed at a time). Instead, we might have implemented the qualification as a paid task, but that might result in paying for subjects who would never return to complete an actual task.

Further resources

Guide to experiments on Amazon’s Mechanical Turk: Winter Mason has put together a great getting started guide, including an overview of the AMT command line tools. Also see his paper from HComp 2009.
Experimental Turk: Blog about using Mechanical Turk for social science experiments, including the reproduction of several classic experiments on mTurk.
Mechanical Turk @ A Computer Scientist in a Business School: Panos Ipeirotis has written several blog posts about AMT and turkers, including turker motivations and demographics and using AMT in other contexts (particularly labeling data).
Experimenting on Mechanical Turk: 5 How Tos: Another set of tips on Mechanical Turk experiments, from Markus Jakobsson (PARC).
Dolores Labs blog: Lots of great crowdsourcing examples and tips.

1. Kittur, A., Chi, E. H., and Suh, B. (2009). â€œCrowdsourcing User Studies With Mechanical Turk,â€ Proc. CHI 2009: 453-456. (ACM | PARC)
2. Mason, W. and Watts, D. J. (2009). â€œFinancial incentives and the â€˜performance of crowds,â€™â€ SIGKDD Workshop on Human Computation: 77-85. (ACM | Yahoo)

This study is part of the BALANCE project and was funded by NSF award #IIS-0916099.

Also tagged BALANCE, crowdsourcing, mechanical turk, politics

Comments (0)

24 December 2009

updated viz of political blogsâ€™ link similarity

I’ve been meaning to post a simple update to my previous visualization of political blogsâ€™ link similarities. In the previous post, I used GEM for layout, which was not, in hindsight, the best choice.

In the visualization in this post, the edges between blogs (the nodes, colorized as liberal, independent, and conservative) are weighted as the Jaccard similarity between any two blogs. The visualization is then laid out in GUESS using multidimensional scaling (MDS) based on the Jaccard similarities.

Also tagged BALANCE, politics, visualization

Comments (1)

26 May 2009

Sidelines at ICWSM

Last week I presented our first Sidelines paper (with Daniel Zhou and Paul Resnick) at ICWSM in San Jose. Slides (hosted on slideshare) are embedded below, or you can watch a video of most of the talk on VideoLectures.

Opinion and topic diversity in the output sets can provide individual and societal benefits. If news aggregators relying on votes and links to select and subsets of the large quantity of news and opinion items generated each day simply select the most popular items may not yield as much diversity as is present in the overall pool of votes and links.

To help measure how well any given approach does at achieving these goals, we developed three diversity metrics that address different dimensions of diversity: inclusion/exclusion, nonalienation, and proportional representation (based on KL divergence).

To increase diversity in result sets chosen based on user votes (or things like votes), we developed the sidelines algorithm. This algorithm temporarily suppresses a voterâ€™s preferences after a preferred item has been selected. In comparison to collections of the most popular items, from user votes on Digg.com and links from a panel of political blogs, the Sidelines algorithm increased inclusion while decreasing alienation. For the blog links, a set with known political preferences, we also found that Sidelines improved proportional representation.

Our approach differs and is complementary to work that selects for diversity or identifies bias based on classifying content (e.g. Park et al, NewsCube; ) or by classifying referring blogs or voters (e.g. Gamon et al, BLEWS). While Sidelines requires votes (or something like votes), it doesn’t require any information about content, voters, or long term voting histories. This is particularly useful for emerging topics and opinion groups, as well as for non-textual items.

Also tagged algorithm, BALANCE, conference, diversity, icwsm, news, politics, recommender systems, sidelines

Comments (0)

9 December 2008

visualizing political blogs’ linking

There are a number of visualizations of political bloggers’ linking behavior, notably Adamic and Glance’s 2005 work that found political bloggers of one bias tend to link to others of the same bias. Also check out Linkfluence’s Presidential Watch 08 map, which indicates similar behavior.

These visualizations are based on graphs of when one blog links to another. I was curious to what extent this two-community behavior occurs if you include all of the links from these blogs (such as links to news items, etc). Since I have link data for about 500 blogs from the news aggregator work, it was straightforward to visualize a projection of the bipartite blog->item graph. To classify each blog as liberal, conservative, or independent, I used a combination of the coding from Presidential Watch, Wonkosphere, and my own reading.

Projection of links from political blogs to items (Oct - Nov 2008). Layout using GEM algorithm in GUESS.

The visualization shows blogs as nodes. Edges represent shared links (at least 6 items must be shared before drawing an edge) and are sized based on their weight. Blue edges run between liberal blogs, red edges between conservative blogs, maroon between conservative and independent, violet blue between liberal and independent, purple between independent blogs, and orange between liberal and conservative blogs. Nodes are sized as a log of their total degree. This visualization is formatted to appear similar to the Adamic and Glance graph, though there are some important differences, principally because this graph is undirected and because I have included independent blogs in the sample.

This is just a quick look, but we can see that the overall linking behavior still produces two fairly distinct communities, though a bit more connected than just the graph of blog to blog links. It’d be fun to remove the linked blog posts from this data (leaving mostly linked news items) to see if that changes the picture much. Are some media sources setting the agenda for bloggers of both parties, or are the conservative bloggers reading and reacting to one set news items and liberal bloggers reading and reacting to another? I.e., is the homophily primarily in links to opinion articles, or does it also extend to the linked news items?

I’m out of time at this point in the semester, though, so that will have to wait.

Also tagged BALANCE, blogs, graph, linking, networks, politics, visualization

Comments (1)

10 November 2008

bias mining in political bloggers’ link patterns

I was pretty excited by the work that Andy Baio and Joshua Schachter did to identify and show the political leanings in the link behavior of blogs that are monitored by Memeorandum. They used singular value decomposition [1] on an adjacency matrix between sources and items based on link data from 360 snapshots of Memeorandum’s front page.

For the political news aggregator project, we’ve been gathering link data from about 500 blogs. Our list of sources is less than half of theirs (I only include blogs that make full posts available in their feeds), but we do have full link data rather than snapshots, so I was curious if we would get similar results.

The first 10 columns of two different U matrices are below. They are both based on link data from 3 October to 7 November; the first includes items that had an in-degree of at least 4 (5934 items), the second includes items with an in-degree of at least 3 (9722 items). In the first, the second column (v2) seems to correspond fairly well to the political leaning of the blog; in the second, the second column (v3) is better.

I’ll be the first to say that I haven’t had much time look at these results in any detail, and, as some of the commenters on Andy’s post noted, there are probably better approaches for identifying bias than SVD. If you’d like to play too, you can download a csv file with the sources and all links with an in-degree >= 2 (21517 items, 481 sources). Each row consists of the source title, source url, and then a list of the items the source linked to from 3 October to 7 November. Some sources were added part way though this window, and I didn’t collect link data from before they were added.

[1] One of the more helpful singular value decomposition tutorials I found was written by Kirk Baker and is available in PDF.

Also tagged BALANCE, blogs, linking, news, opinion, politics

Comments (2)

9 November 2008

US political news and opinion aggregation

Working with Paul Resnick and Xiaodan Zhou, I’ve started a project to build political news aggregators that better reflect diversity and represent their users, even when there is an unknown political bias in the inputs. We’ll have more on this to say later, but for now we’re making available a Google gadget based on a prototype aggregator’s results.

The list of links is generated from link data from about 500 blogs and refreshed every 30 minutes. Some of the results will be news stories, some will be op-ed columns from major media services, others will be blog posts, and there are also some other assorted links.

At this early point in our work, the results tend to be more politically diverse than an aggregator such as Digg, but suffer from problems with redundancy (we aren’t clustering links about the same story yet). As our results get better, the set of links the gadget shows should improve.

Update 15 December: I twittered last week that I’ve added bias highlighting to the widget, but I should expand a bit on that here.

Inspired by Baio and Schachter’s coloring of political bias on Memeorandum, I’ve added a similar feature to the news aggregator widget. Links are colored according the average bias of the blogs linking to them. This is not always a good predictor of the item’s bias or whether it better supports a liberal or conservative view. Sometimes a conservative blogger writes a post to which more liberal bloggers than conservative bloggers, and in that case, the link will be colored blue.

If you don’t like the highlighting, you can turn it off in the settings.

Also tagged aggregation, BALANCE, news, opinion, politics

Comments (2)

8 November 2008

wikis in organizations

In early September, I attended WikiSym 08 in Porto, Portugal, so this post is nearly two months overdue. In addition to presenting a short paper on the use of a wiki to enhance organizational memory and sharing in a Boeing workgroup, I participated on the WikiFest panel organized by Stewart Mader.

Since then, a couple of people have asked me to post the outline of my presentation for the WikiFest panel. These notes are reflections from the Medshelf, CSS-D, SI, and Boeing workgroup wiki projects and are meant for those thinking about or getting started with deploying a wiki in a team. For those that have been working with wikis and other collaborative tools for a while, there probably aren’t many surprises here.

Consider the wiki within your ecosystem of tools. For CSS-D and MedShelf, the wikis were able to offload many of the frequently asked questions (and, to an even greater extent, the frequent responses) from the corresponding email lists. This helps to increase the signal to noise ratio on the lists for list members that have been around for a while, and increasing their satisfaction with the lists and perhaps making them more likely to stick around.
Another major benefit of moving some of this content from the mailing lists to the wiki is that new readers had less to read to get an answer. If you’ve ever search for the answer to a problem and found part of the solution in a message board or mailing list archive, you may be familiar with the experience of having to read through several proposed, partial solutions, synthesizing as you go, before arriving at the information you need. If all of that information is consolidated as users add it to the wiki, it can reduce the burden of synthesizing information from each time it is accessed to just each time someone adds new information to the wiki.

In addition to considering how a wiki (or really, any other new tool) will complement your existing tools, consider what it can replace. At Boeing, the wiki meant that workgroup members could stop using another tool they didn’t like. If there was a directive to use the wiki in addition to the other tool, it probably wouldn’t have been as enthusiastically adopted. One of the reasons that the SI Wiki has floundered a bit is that there are at least three other digital places this sort of information is stored: two CTools sites and an intranet site. When people don’t know where to put things, sometimes we just don’t put them at all.
Sometimes value comes from aggregation rather than synthesis. In the previous point, I made a big deal out of the value of using the wiki to synthesize information from threaded discussions and various other sources. When we started the MedShelf project, I was expecting all wikis to be used this way, but I was very wrong. With Medshelf, a lot of the value comes from individuals’ stories about coping with the illness. Trying to synthesize that into a single narrative or neutral article would have meant losing these individual voices, and for content like this, it aggregation — putting it all in the same place — can be the best approach.
The importance of these individual voices also meant that many more pages than I expected were single-authored.
Don’t estimate the value of a searchable & browsable collection. Using the workgroup wiki, team members have found the information need because they knew about one project and then were able to browse links to documentation other, related projects that had the information they needed. Browsing between a project page and a team member’s profile has also helped people to identify experts on a given topic. The previous tools for documenting projects didn’t allow for connections between different project repositories and made it hard to browse to the most helpful information. But this only works if you are adding links between related content on the wiki, or if your wiki engine automatically adds related links.
For the wikis tied to mailing lists (CSS-D and Medshelf), some people arriving at the wiki through a search engine, looking for a solution to a particular problem, have browsed to the list information and eventually joined the list. This is certainly something that happens with mailing list archives, but which makes a better front door — the typical mailing list archive or a wiki?
Have new users arrive in parallel rather than serial (after seeding the wiki with content).

The Boeing workgroup wiki stagnated when it was initially launched, and did not really take off until the wiki evangelist organized a “wiki party” (snacks provided) where people could come and get started on documenting their past projects. Others call this a Barn Raising. This sort of event can give potential users both a bit of peer (or management) pressure and necessary technical support to get started adding content. It also serves the valuable additional role of giving community members a chance to express their opinions about how the tool can/should be used, and to negotiate group norms and expectations for the new wiki.

Even if you can’t physically get people together — for the mailing list wikis, this was not practical — it’s good to have them arrive at the same time, and to have both some existing content and suggestions for future additions ready and waiting for them.

Make your contributors feel appreciated. Wikis typically don’t offer the same affordances for showing gratitude as a threaded discussion, where it is usually easy to add a quick “thank you” reply or to acknowledge someone else’s contribution while adding new information. With wikis, thanks are sometimes rare, and users may see revisions to content they added as a sign that they did something wrong, rather than provided a good starting point to which others added. It can make a big difference to acknowledge particularly good writeups publicly in a staff meeting or on the mailing list, or to privately express thanks or give a compliment.

Also tagged collaboration, conference, cscw, wiki

Comments (0)