This website was extremely helpful to me doing my regression analysis. Hope it is helpful to you!
http://www.montefiore.ulg.ac.be/~kvansteen/GBIO0009-1/ac20092010/Class8/Using%20R%20for%20linear%20regression.pdf
Wednesday, June 6, 2012
Recoding Variables
This post is a combined effort of Megan H, Denise M, and Steven B.
In case you are in the middle of recoding your data here are some tips and an example from our paper and syntax. First when you recode the data you need to find a way to make your independent variables coded the same. As seen below we decided to take a number of different survey questions and code them 0-2. This gave us a chance to categorize the people into categories despite the questions asking different things. In our example there are three different types of questions being asked but we were able re-code them so we could measure and compare the variables with one another. This is at the discretion of the researcher but when you do this you should explain why you coded it as you did. This is our example from our paper.
In case you are in the middle of recoding your data here are some tips and an example from our paper and syntax. First when you recode the data you need to find a way to make your independent variables coded the same. As seen below we decided to take a number of different survey questions and code them 0-2. This gave us a chance to categorize the people into categories despite the questions asking different things. In our example there are three different types of questions being asked but we were able re-code them so we could measure and compare the variables with one another. This is at the discretion of the researcher but when you do this you should explain why you coded it as you did. This is our example from our paper.
In our study we have decided to code all of our independent
variables on a 0-2 scale. 0 codes as a non- gamer, 1 as a moderate gamer, and 2
as an extreme gamer. We decided these
measurements were best to get measurable and meaningful results for our
research. We have five independent
variables for our study that all have to do with playing computer or internet
games.
First, Xbox live, was measured, originally in the survey
each respondent was asked their status of X-box live with the following
options, Never Used; Previous User; Currently Active. We decided to code never used as a zero
(non-gamer), previously used as a 1 (moderate gamer), and currently active as a
2 (Extreme Gamer)
Students were asked the same question about World of Warcraft
was asked, measured, and coded in the exact same format of x-box live.
Students were asked in general if they played computer games
or not on the survey given. We coded
those who do not as a 0. We coded those
who do as a 2.
Student who originally completed the survey was asked how
often they played Facebook Games, the possible responses were Hourly; Several times a day; Once a day;
Several times a week; Once a week; Rarely; Never. We coded this on the 0-2 scale as well, Never
and non-applicable were coded as 0, and, rarely, once a week, and several times
a week were all coded as a 1, Once a
day, several times a day, and hourly were all coded a 2.
Students were also asked how frequently they played Internet
games in general. This question had the
same possible responses as Facebook games and we coded it the same as Facebook
games.
This was our syntax for recoding, Check your syntax for what numbers were originally coded in order to re-code. If your codebook is not clear you can run summaries and histograms of the variables to try to find out what the code is.
S1.SNS.XboxLive<-recode(S1.SNS.XboxLive, "1=0; 2=1; 3=2")
S1.SNS.WoW<-recode(S1.SNS.WoW, "1=0; 2=1; 3=2")
S1.OUT.GameCon<-recode(S1.OUT.GameCon, "NA=0; 7=2")
S1.CU.Games<-recode(S1.CU.Games, "NA=0; 2=2")
S1.FBU.Game<-recode(S1.FBU.Game, "7=0; NA=0; 6=1; 5=1; 4=1; 3=2; 2=2; 1=2")
S1.IU.Games<-recode(S1.IU.Games, "7=0; NA=0; 6=1; 5=1; 4=1; 3=2; 2=2; 1=2")
S1.SNS.WoW<-recode(S1.SNS.WoW, "1=0; 2=1; 3=2")
S1.OUT.GameCon<-recode(S1.OUT.GameCon, "NA=0; 7=2")
S1.CU.Games<-recode(S1.CU.Games, "NA=0; 2=2")
S1.FBU.Game<-recode(S1.FBU.Game, "7=0; NA=0; 6=1; 5=1; 4=1; 3=2; 2=2; 1=2")
S1.IU.Games<-recode(S1.IU.Games, "7=0; NA=0; 6=1; 5=1; 4=1; 3=2; 2=2; 1=2")
Check that your re-codes are accurate when you are finished recoding by running histograms of the variables to make sure your recodes were accurate. This helped us spot multiple mistakes we made before our re-codes were finally done correctly
Correlation Tables in R flagged with significance level stars (*, **, and ***)
If you want to create a lower triangle correlation matrix which is flagged with stars (*, **, and ***) according to levels of statistical significance, this syntax may be helpful (found it here). All you have to do is cut and paste into R and insert your data table. You will need the Hmisc and xtable packages.
corstarsl <- function(x){
require(Hmisc)
x <- as.matrix(x)
R <- rcorr(x)$r
p <- rcorr(x)$P
## define notions for significance levels; spacing is important.
mystars <- ifelse(p < .001, "***", ifelse(p < .01, "** ", ifelse(p < .05, "* ", " ")))
## trunctuate the matrix that holds the correlations to two decimal
R <- format(round(cbind(rep(-1.11, ncol(x)), R), 2))[,-1]
## build a new matrix that includes the correlations with their apropriate stars
Rnew <- matrix(paste(R, mystars, sep=""), ncol=ncol(x))
diag(Rnew) <- paste(diag(R), " ", sep="")
rownames(Rnew) <- colnames(x)
colnames(Rnew) <- paste(colnames(x), "", sep="")
## remove upper triangle
Rnew <- as.matrix(Rnew)
Rnew[upper.tri(Rnew, diag = TRUE)] <- ""
Rnew <- as.data.frame(Rnew)
## remove last column and return the matrix (which is now a data frame)
Rnew <- cbind(Rnew[1:length(Rnew)-1])
return(Rnew)
}
##Create table _insert your dataframe below
New_table<-corstarsl(yourdataframe)
corstarsl <- function(x){
require(Hmisc)
x <- as.matrix(x)
R <- rcorr(x)$r
p <- rcorr(x)$P
## define notions for significance levels; spacing is important.
mystars <- ifelse(p < .001, "***", ifelse(p < .01, "** ", ifelse(p < .05, "* ", " ")))
## trunctuate the matrix that holds the correlations to two decimal
R <- format(round(cbind(rep(-1.11, ncol(x)), R), 2))[,-1]
## build a new matrix that includes the correlations with their apropriate stars
Rnew <- matrix(paste(R, mystars, sep=""), ncol=ncol(x))
diag(Rnew) <- paste(diag(R), " ", sep="")
rownames(Rnew) <- colnames(x)
colnames(Rnew) <- paste(colnames(x), "", sep="")
## remove upper triangle
Rnew <- as.matrix(Rnew)
Rnew[upper.tri(Rnew, diag = TRUE)] <- ""
Rnew <- as.data.frame(Rnew)
## remove last column and return the matrix (which is now a data frame)
Rnew <- cbind(Rnew[1:length(Rnew)-1])
return(Rnew)
}
##Create table _insert your dataframe below
New_table<-corstarsl(yourdataframe)
## exporting tables to either html or .tex (I prefer .tex but you will have to install TeX)
print.xtable(newtable, type="latex", file="filename.tex")
print.xtable(newtable, type="html", file="filename.html") ## see here for formatting tips
Monday, June 4, 2012
Comparison of Data Analysis Packages
This is a link to an interesting page I found that compares different statistical packages...
http://brenocon.com/blog/2009/02/comparison-of-data-analysis-packages-r-matlab-scipy-excel-sas-spss-stata/
For me, reading this made me grateful for being exposed to R, but also learning other programs as well. To me, it largely depends on what you're trying to do that makes one program better to use than another.
http://brenocon.com/blog/2009/02/comparison-of-data-analysis-packages-r-matlab-scipy-excel-sas-spss-stata/
For me, reading this made me grateful for being exposed to R, but also learning other programs as well. To me, it largely depends on what you're trying to do that makes one program better to use than another.
Matched Sets of Graphs in R
Sometimes you may want to see grpahs side by side in R. To accomplish this you can use the function
par(mfcol=c(2,4))
You can change the numbers within this function depending on how mnay graphs you want to appear together and which ones you want next to each other. The first number specifies the number of rows of graphs that will appear, in this case 2. The second number specifies the number of graphs that will appear in each row, in this case 4.
par(mfcol=c(2,4))
You can change the numbers within this function depending on how mnay graphs you want to appear together and which ones you want next to each other. The first number specifies the number of rows of graphs that will appear, in this case 2. The second number specifies the number of graphs that will appear in each row, in this case 4.
Hi guys,
Just some quick input on how to do Poisson regression which is a form of regression when you have a count variable. It is very simialr to using binomial regression except you use the code
summary(regress<-glm("Y-variable"~"X-variable1"+"Xvariable2"+... +"X-variableLAST", family=poisson)
Make sure you have the program "car" uploaded and that should work.
This should give you your quartiles, your coefficients, standard error and significence along with your null and residual deviance to calculate the pseudo R^2
hope that helps!
Just some quick input on how to do Poisson regression which is a form of regression when you have a count variable. It is very simialr to using binomial regression except you use the code
summary(regress<-glm("Y-variable"~"X-variable1"+"Xvariable2"+... +"X-variableLAST", family=poisson)
Make sure you have the program "car" uploaded and that should work.
This should give you your quartiles, your coefficients, standard error and significence along with your null and residual deviance to calculate the pseudo R^2
hope that helps!
Almost done!
Spring quarter of 450/550 is almost done!
Things to do:
- Make sure that you have your participation all squared away. This means:
- double check that you watched at least a couple of Khan academy videos on relevant topics (while logged in to the account where you selected me as a coach).
- Right now Shanique and Nathan have plenty of Khan academy views on their accounts.
- But other folks who
- Make sure that you created your two videos.
- Make sure that you made at least one helpful blog post.
- I have record from class attendance and my subjective sense of in-class participation.
- Make sure too that you have tagged your contributions to
- helpful links page
- crash course in statistics
- anywhere else?
- Take a look at the updated turn in form for the HW.
- Just turn in one project for your group.
- Any questions?
- I will be in the lab starting at Mon (2:00) and Tues (10:00).
- You are encouraged to show me your progress.
Subscribe to:
Posts (Atom)