Tuesday, May 29, 2012

Finding the "Top Dogs": Recoding for the Top Quartile (in R)


This post deals with separating out the top (or bottom) quartile of a given variable.

As part of our final project, my group redefined the concept of opinion leaders in the diffusion study our class looked at. (Which is more for background information than anything else, but it may help to understand similar situations in which this kind of recoding could come in handy.) Previously this study had defined opinion leaders (think “the cool kids that everyone wants to be like”) as those with admin status on the site (in this case Wikipedia). We added a couple more variables: barn stars (awarded by peers/other Wikipedia editors), and the number of edits to their user profile page. It’s this “user edits” variable that we’re concerned with in this post.
  
Since the original dataset had data from various timeframes, we decided to create three new variables delineating the number of user profile pages before and during the study’s time period, as well as one adding both together.

First we had to create an index which combined the "Pre" and "Period" timeframes into a "Both" variable, thusly: 

userEditsBoth<-(userEditsPre+
userEditsPeriod)

Next we ran a summary of the “userEdits” (for each of the timeframes described above) variable to find the values of the upper quartile.  The syntax looks like this:

summary(userEditsPre)
summary(userEditsPeriod)
summary(userEditsBoth)

The results looked something like this:


> summary(userEditsPeriod)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    0.00    1.00   12.56    8.00 3239.00 

See the 8.00 under "3rd Qu." and then 3239.00 under "Max."? That's where we got the values to use in the "highuserEditsPeriod" below. Simple enough. We recoded as a binary variable with just the top quartile of values = 1 and the rest = 0. Like this:

highuserEditsPre<-recode(userEditsPre, "51:6143='1'; else='0'")
highuserEditsPeriod<-recode(userEditsPeriod, "8:3239='1'; else='0'")
highuserEditsBoth<-recode(userEditsBoth, "66:9382='1'; else='0'")

(Yes, someone really did make 3,239 edits to their profile page in a single month. I know!)



You cannot remain unhappy when you look at the lowl.

After that, we were able to use this new variable (highuserEdits...) in the index for our new variable that redefines opinion leaders in this study (to examine whether they were more or less likely to adopt a new tool to suggest pages for them to edit).

But that is another story…

The members of the group involved in this class project are Heather Dumas, Bree Stewart, and Xuan He. 

No comments:

Post a Comment