Tuesday, November 13, 2012

Crash course: Quick overview of ANOVA and Epidemiology from Module 5 and 6

The following a quick crash course of basic ideas and concepts of ANOVA and Epi statistics Risk Difference (RD), Relative Risk (RR), and Odds Ratio (OR). Please note for the ANOVA section the dataset is here and the presentation is in both PowerPoint (PPTX) and Adobe PDF (PDF) formats.

Sunday, October 7, 2012

epiR package

Q: I tried doing the examples at the back of the module and am getting the error message below. I installed EpiR so not sure what I'm doing wrong, as I think I'm following the instructions from the module??? Pg 28 and 29 of module 6.

> a <- 621
> b <- 440034
> c <- 117
> d <- 96531
> epi.2by2(a, b, c, d, method="case.control")
Error in c[i] : object of type 'builtin' is not subsettable

A: The issue appearing is that epiR does not accept input in this form, where if we assume a 2 x 2 contingency table with exposed/unexposed as our row and disease/healthy as our column with a and b being row 1, and c and d being row 2 (or a and c being column 1 and b, and d being column 2). Instead, epiR requires that the "data" be in the form of a matrix/table, so what we need to do is:

> a <- 621
> b <- 440034
> c <- 117
> d <- 96531
> data <- matrix(c(a, c, b, d), nrow = 2, ncol = 2)
> epi.2by2(data, method="case.control")

where the matrix looks like so:
          Disease +  Disease - 
Expose +  a          b         
Expose -  c          d
As a confirmation, the OR should be 1.16. Be cautious of how you order the data in the matrix. The current set up applies the data by going down the column, if you want to go by row you need to use the option, "BYROW=T".

Tuesday, September 25, 2012

Determining Probabilities and Quantiles in R and SAS for a Normal Distribution

In R, let's say we wanted to determine the probability of P(X ≤ c). If X is already standardized, i.e. normalized, then we would do:
> pnorm(c)
If not, then we would do:
> pnorm(c,mu,sd)
where c is the value of interest, mu is the mean of the sample, and sd is the standard deviation. In SAS, we can use the PROC IML step:
PROC IML;
   prob = CDF('Normal',c);
   PRINT prob;
QUIT;
or
PROC IML;
   prob = CDF('Normal',c,mu,sd);
   PRINT prob;
QUIT;
Example: If we let c = 1.96, mu = 0, and sd = 1, then the probability associated with this particular example is 0.975. You should get familiar with this number because, when we do a two-sided hypothesis test, we assume α = 0.05 and test for 100(1-α/2) = 100(1-0.05/2) = 0.975.

In the case where we want to determine the quantile associated with a particular probability, i.e. what is the 100(n)th percentile (assuming X follows a normal distribution of mean mu and standard deviation mu), then in R we use:
> qnorm(n,mu,sd)
and in SAS we do:
PROC IML;
   quant = QUANTILE('Normal',n,mu,sd);
   PRINT quant;
QUIT;
NOTE: n is a value between 0 and 1. For example, if we are interested in the 90th percentile, then for either R or SAS, the input value is 0.90.

Monday, September 17, 2012

Last day of the Ramp-up course available

I have posted videos corresponding to Day 3 of the ramp-up course for R and SAS.

The videos cover how to create tables, contingency tables, and graphical and numerical summaries in R (1 video) and SAS (2 video). They cover various commands in R, including tables(), barplot(), boxplot(), hist(), and the commands for numerical summaries and using tapply() to apply a function to a variable based on subgroups; and in SAS as well, including PROC FREQ, PROC TABULATE, PROC FORMAT, and SET in DATA steps. Hopefully the videos are clear (resolution wise) and comprehensive enough. If I have left out anything please message me about it. The other 2 SAS videos can be found here:


It should be noted I did not make videos for the last section, Testing and Regression; we will address this during the course of the semester. However, there is something we should discuss, how to create a subset in SAS. To do this we use the DATA step in a manner similar to creating a new variable, which was mentioned in Creating Tables in SAS. If, for example, you wanted to only look at the hotdogs where the type is Beef or Poultry, we do:
DATA hotdogs_subset;
   SET hotdogs;
   WHERE Type = "Beef" OR Type = "Poultry";
RUN;
Another way is to do:
DATA hotdogs_subset;
   SET hotdogs;
   WHERE Type NE "Meat";
RUN;
This option only works because there are 3 types and we don't want the Type being Meat; NE stands for "Not Equal". We can also do other conditions, such as looking at only healthy hotdogs, i.e. Calories < 150:
DATA hotdogs_subset;
   SET hotdogs;
   IF Calories < 150; /* Comment: IF and WHERE are often interchangeable */
RUN;

Hopefully this is clear enough, if not I'll make a video tutorial to cover this concept.

Day 2 of the Ramp-up course available

I have posted videos corresponding to Day 2 of the ramp-up course for R and SAS.

The videos cover how to manipulate and extract data in R (3 videos) and SAS (1 video). They cover various commands in R, including attach(), $, coordinates and matrix properties, sort(), order(), subset(), and boolean statements; and in SAS as well, including PROC PRINT, PROC SORT, and SET in DATA steps. Hopefully the videos are clear (resolution wise) and comprehensive enough. If I have left out anything please message me about it. The other 3 videos can be found here:



Friday, September 14, 2012

First set of SAS videos are up and running

The first set of SAS videos relating to the ramp-up course are online. You can find the appropriate ones through the labels to your right or reading the description in the video on the YouTube channel. The first video is about getting familiar with the SAS GUI (Graphical User Interface)

while the other videos cover the Basic functions in SAS and How to read in datasets using the DATA step and using PROC PRINT. Hopefully the videos are clear (resolution wise) and comprehensive enough. If I have left out anything please message me about it.

Wednesday, September 12, 2012

First set of R videos are up and running

The first set of R videos relating to the ramp-up course are online. You can find the appropriate ones through the labels to your right or reading the description in the video on the YouTube channel. This video is about getting familiar with R and the basic functions.

Hopefully the videos are clear (resolution wise) and comprehensive enough. If I have left out anything please message me about it. I also have videos on How to save an object and load an image in R and How to read in datasets for Windows and Mac users.