Cochran-Mantel-Haenszel Statistics using Proc FREQ

The Cochran-Mantel-Haenszel test statistic is one of the many statistical measures used in case control studies. To obtain this useful statistic, which compares the odds ratio, you can declare the CMH option on the TABLE statement in PROC FREQ.

General Syntax

PROC FREQ DATA = ;
TABLE X*Y/CMH;
RUN;


Example:

The data set Migraine contains hypothetical data for a clinical trial of migraine treatment. Subjects of both genders receive either a new drug therapy or a placebo. Their response to treatment is coded as ‘Better’ or ‘Same’. The data are recorded as cell counts, and the number of subjects for each treatment and response combination is recorded in the variable Count.


data Migraine;
input Gender $ Treatment $ Response $ Count @@;
datalines;
female Active Better 16 female Active Same 11
female Placebo Better 5 female Placebo Same 20
male Active Better 12 male Active Same 16
male Placebo Better 7 male Placebo Same 19
;
Run;

The following PROC FREQ statements create a multiway table stratified by Gender, where Treatment forms the rows and Response forms the columns. The CMH option produces the Cochran-Mantel-Haenszel statistics. For this stratified table, estimates of the common relative risk and the Breslow-Day test for homogeneity of the odds ratios are also displayed. The NOPRINT option suppresses the display of the contingency tables. 


proc freq data=Migraine;
tables Gender*Treatment*Response / cmh; 
weight Count;
title ‘Clinical Trial for Treatment of Migraine Headaches’;
run;




For a stratified table, the three CMH statistics displayed in Output above test the same hypothesis. The significant -value (0.004) indicates that the association between treatment and response remains strong after adjusting for gender.

The CMH option also produces a table of relative risks, as shown in next Output. Because this is a prospective study, the relative risk estimate assesses the effectiveness of the new drug; the “Cohort (Col1 Risk)” values are the appropriate estimates for the first column (the risk of improvement). The probability of migraine improvement with the new drug is just over two times the probability of improvement with the placebo.
The large -value for the Breslow-Day test (0.2218) in Output 3.7.3 indicates no significant gender difference in the odds ratios.







EPOCH RESEARCH INSTITUTE OFFER: 
Email us: info@epoch.co.in Call :079 – 40327000
SAS Training & Placement Programs with Internship : Epoch Research Institute India Largest and Oldest  SAS Training Institute
Epoch Research Institute Links:
Labels:
CHM, Proc Freq, Procedure, SAS, Statistics, 
#Base SAS training#BIGDATASASTRAININGEPOCH#Clinical SAS Online Training#epochresearchinstitute SAS TRAINING#Live SAS web Training#online SAS training#SAS Live web classroom Training#SAS Online Training%display%goto%windowAdd-InsAdvanceAdvance interviewAdvance SASallanalyticsanalysisanalyticsanalytics courses onlineanalytics training coursesapplicationBANGLOREbaseBase SASBASE SAS CERTIFICATIONBase SAS trainingBeginnerBIBig Databig data analytics training bangalorebigdatabioavalibilitybooksBusiness IntelligenceCareerCertified SAS ProgrammerCHMclinicalclinical data namagementclinical SASClinical SAS Online Trainingclinical trialClinical Trial Questionscodecode freeCodingCorrelationCROcustom styleDASHBOARDdataData ExplorationData IntegrationDATA MANAGEMENTDEBUGdefinationdefinitiondrug discoveryEGElectronic SubmissionEminorenhance editorEnterprise Guideepoch feedbackepoch sas trainingEPOCH SAS TRAINING AHMEDABADERRORETLexcelExploreFDAForecastingFREEFree SASFree TrainingFREE WEBINARfreq reportGCHARThadoopICH GCP quizIMPORTIndexInterview QuestionsJOBknowLibraryLife ScienceLive SAS web TraininglocfLOGmacroMacro interview questionsmarketingNewsOLAPonline SAS trainingOptionsPerformance Tuningpharmaphase trialPivot tablePredictive analyticsPredictive ModelingProc FreqProc meansProc Mixedproc sqlPROC TRANSPOSEProcedureProgrammerprogrammingquizREGRegulatory Authorityreportingsample programSASSAS 9.3SAS AdvanceSAS BISAS BI CertificationSAS BI Dashboard 4.3 – What’s Newsas bookSAS ConsultantSAS DatasetSAS DI CertificationSAS EG 5.1SAS Eminorsas enterprise guidesas functionsSAS Good programming PracticeSAS GraphSAS Instructor TipsSAS Interview PreparationSAS LinksSAS Live web classroom TrainingSAS MacroSAS Online TrainingSAS Programming Tipssas publicationSAS ReportingSAS Stored ProcessSAS StyleSAS TIPSSAS TrainingSAS WBCASTSAS WebinarsastalkssastrainingSHORTCUTSQLSQL. DATA STEPSStandardsSTATStatisticsStrategysummarizingTechnologyterminologyTIPSTrafic lightingtrainingtraining.TRANSPOSEtrialTTESTuser interfaceutilityvariableWeb Report StudioWebinarWhats NewWhats new in SAS 9.3http://www.epoch.co.in

An overview of statistical tests in SAS

1. Introduction and description of data

We will illustrate doing some basic statistical tests in SAS, including  t-tests, chi square, correlation, regression, and analysis of variance.  We demonstrate this using the auto data file.  The program below reads the data and creates a temporary data file called auto.  (Please note that we have made the values of mpg to be missing for the AMC cars.  This differs from the other example data files where the AMC cars have valid data for mpg.)

DATA auto ;
LENGTH make $ 20 ;
INPUT make $ 1-17 price mpg rep78 hdroom trunk weight
length turn displ gratio foreign ;
CARDS;
AMC Concord 4099 . 3 2.5 11 2930 186 40 121 3.58 0
AMC Pacer 4749 . 3 3.0 11 3350 173 40 258 2.53 0
AMC Spirit 3799 . . 3.0 12 2640 168 35 121 3.08 0
Audi 5000 9690 17 5 3.0 15 2830 189 37 131 3.20 1
Audi Fox 6295 23 3 2.5 11 2070 174 36 97 3.70 1
BMW 320i 9735 25 4 2.5 12 2650 177 34 121 3.64 1
Buick Century 4816 20 3 4.5 16 3250 196 40 196 2.93 0
Buick Electra 7827 15 4 4.0 20 4080 222 43 350 2.41 0
Buick LeSabre 5788 18 3 4.0 21 3670 218 43 231 2.73 0
Buick Opel 4453 26 . 3.0 10 2230 170 34 304 2.87 0
Buick Regal 5189 20 3 2.0 16 3280 200 42 196 2.93 0
Buick Riviera 10372 16 3 3.5 17 3880 207 43 231 2.93 0
Buick Skylark 4082 19 3 3.5 13 3400 200 42 231 3.08 0
Cad. Deville 11385 14 3 4.0 20 4330 221 44 425 2.28 0
Cad. Eldorado 14500 14 2 3.5 16 3900 204 43 350 2.19 0
Cad. Seville 15906 21 3 3.0 13 4290 204 45 350 2.24 0
Chev. Chevette 3299 29 3 2.5 9 2110 163 34 231 2.93 0
Chev. Impala 5705 16 4 4.0 20 3690 212 43 250 2.56 0
Chev. Malibu 4504 22 3 3.5 17 3180 193 31 200 2.73 0
Chev. Monte Carlo 5104 22 2 2.0 16 3220 200 41 200 2.73 0
Chev. Monza 3667 24 2 2.0 7 2750 179 40 151 2.73 0
Chev. Nova 3955 19 3 3.5 13 3430 197 43 250 2.56 0
Datsun 200 6229 23 4 1.5 6 2370 170 35 119 3.89 1
Datsun 210 4589 35 5 2.0 8 2020 165 32 85 3.70 1
Datsun 510 5079 24 4 2.5 8 2280 170 34 119 3.54 1
Datsun 810 8129 21 4 2.5 8 2750 184 38 146 3.55 1
Dodge Colt 3984 30 5 2.0 8 2120 163 35 98 3.54 0
Dodge Diplomat 4010 18 2 4.0 17 3600 206 46 318 2.47 0
Dodge Magnum 5886 16 2 4.0 17 3600 206 46 318 2.47 0
Dodge St. Regis 6342 17 2 4.5 21 3740 220 46 225 2.94 0
Fiat Strada 4296 21 3 2.5 16 2130 161 36 105 3.37 1
Ford Fiesta 4389 28 4 1.5 9 1800 147 33 98 3.15 0
Ford Mustang 4187 21 3 2.0 10 2650 179 43 140 3.08 0
Honda Accord 5799 25 5 3.0 10 2240 172 36 107 3.05 1
Honda Civic 4499 28 4 2.5 5 1760 149 34 91 3.30 1
Linc. Continental 11497 12 3 3.5 22 4840 233 51 400 2.47 0
Linc. Mark V 13594 12 3 2.5 18 4720 230 48 400 2.47 0
Linc. Versailles 13466 14 3 3.5 15 3830 201 41 302 2.47 0
Mazda GLC 3995 30 4 3.5 11 1980 154 33 86 3.73 1
Merc. Bobcat 3829 22 4 3.0 9 2580 169 39 140 2.73 0
Merc. Cougar 5379 14 4 3.5 16 4060 221 48 302 2.75 0
Merc. Marquis 6165 15 3 3.5 23 3720 212 44 302 2.26 0
Merc. Monarch 4516 18 3 3.0 15 3370 198 41 250 2.43 0
Merc. XR-7 6303 14 4 3.0 16 4130 217 45 302 2.75 0
Merc. Zephyr 3291 20 3 3.5 17 2830 195 43 140 3.08 0
Olds 98 8814 21 4 4.0 20 4060 220 43 350 2.41 0
Olds Cutl Supr 5172 19 3 2.0 16 3310 198 42 231 2.93 0
Olds Cutlass 4733 19 3 4.5 16 3300 198 42 231 2.93 0
Olds Delta 88 4890 18 4 4.0 20 3690 218 42 231 2.73 0
Olds Omega 4181 19 3 4.5 14 3370 200 43 231 3.08 0
Olds Starfire 4195 24 1 2.0 10 2730 180 40 151 2.73 0
Olds Toronado 10371 16 3 3.5 17 4030 206 43 350 2.41 0
Peugeot 604 12990 14 . 3.5 14 3420 192 38 163 3.58 1
Plym. Arrow 4647 28 3 2.0 11 3260 170 37 156 3.05 0
Plym. Champ 4425 34 5 2.5 11 1800 157 37 86 2.97 0
Plym. Horizon 4482 25 3 4.0 17 2200 165 36 105 3.37 0
Plym. Sapporo 6486 26 . 1.5 8 2520 182 38 119 3.54 0
Plym. Volare 4060 18 2 5.0 16 3330 201 44 225 3.23 0
Pont. Catalina 5798 18 4 4.0 20 3700 214 42 231 2.73 0
Pont. Firebird 4934 18 1 1.5 7 3470 198 42 231 3.08 0
Pont. Grand Prix 5222 19 3 2.0 16 3210 201 45 231 2.93 0
Pont. Le Mans 4723 19 3 3.5 17 3200 199 40 231 2.93 0
Pont. Phoenix 4424 19 . 3.5 13 3420 203 43 231 3.08 0
Pont. Sunbird 4172 24 2 2.0 7 2690 179 41 151 2.73 0
Renault Le Car 3895 26 3 3.0 10 1830 142 34 79 3.72 1
Subaru 3798 35 5 2.5 11 2050 164 36 97 3.81 1
Toyota Celica 5899 18 5 2.5 14 2410 174 36 134 3.06 1
Toyota Corolla 3748 31 5 3.0 9 2200 165 35 97 3.21 1
Toyota Corona 5719 18 5 2.0 11 2670 175 36 134 3.05 1
Volvo 260 11995 17 5 2.5 14 3170 193 37 163 2.98 1
VW Dasher 7140 23 4 2.5 12 2160 172 36 97 3.74 1
VW Diesel 5397 41 5 3.0 15 2040 155 35 90 3.78 1
VW Rabbit 4697 25 4 3.0 15 1930 155 35 89 3.78 1
VW Scirocco 6850 25 4 2.0 16 1990 156 36 97 3.78 1
;
RUN;

2. T-tests

We can use proc ttest to perform a t-test to determine whether the average mpg for domestic cars differ from the foreign cars.

PROC TTEST DATA=auto;
CLASS foreign;
VAR mpg;
RUN;

Here is the output produced by the proc ttest.  The results show that foreign cars have significantly higher gas mileage ( mpg ) than domestic cars. Note that the overall N is 71 (not 74).  This is because mpg was missing for 3 of the observations, so those observations were omitted from the analysis.

TTEST PROCEDURE

Variable: MPG

FOREIGN N Mean Std Dev Std Error Minimum Maximum
--------------------------------------------------------------------------------
0 49 19.79591837 4.85188791 0.69312684 12.00000000 34.00000000
1 22 24.77272727 6.61118690 1.40950978 14.00000000 41.00000000

Variances T DF Prob>|T|
---------------------------------------
Unequal -3.1685 31.6 0.0034
Equal -3.5597 69.0 0.0007

For H0: Variances are equal, F' = 1.86    DF = (21,48)    Prob>F' = 0.0776

Note that the output provides two t values, one assuming that the variances are Unequal and another assuming that the variances are Equal, and below that is shown a test of whether the variances are equal.  The test for equal variances has an F value of 1.86, with a p value of 0.0776 indicating that the variances of the two groups do not significantly differ, therefore the Equal variance t-test would be the appropriate test to use.  In this case, we would report a t value of -3.5597 with a p value of 0.007, concluding that the mean mpg for foreign cars is significantly greater than the mpg for domestic cars.  Had the F test of equal variances been significant, then theUnequal variance t value (-3.1685) would have been the appropriate value to use.  This is especially important when the sample sizes for the 2 groups differ, because when the variances of the two groups differ and the sample sizes of the two groups differ, then the results assuming Equal variances can be quite inaccurate and could differ from the Unequal variance result..

3. Chi-square tests

We can use proc freq to examine the repair records of the cars (rep78, where 1 is the worst repair record, 5 is the best repair record) by foreign (foreign coded 1, domestic coded 0).  Using the chi2 option we can request a chi-square test that tests if these two variables are independent, as shown below.

PROC FREQ DATA=auto;
TABLES rep78*foreign / CHISQ ;
RUN;

The results are shown below, first giving the crosstab and then the chi-square test.

TABLE OF REP78 BY FOREIGN

REP78 FOREIGN

Frequency|
Percent |
Row Pct |
Col Pct | 0| 1| Total
---------+--------+--------+
1 | 2 | 0 | 2
| 2.90 | 0.00 | 2.90
| 100.00 | 0.00 |
| 4.17 | 0.00 |
---------+--------+--------+
2 | 8 | 0 | 8
| 11.59 | 0.00 | 11.59
| 100.00 | 0.00 |
| 16.67 | 0.00 |
---------+--------+--------+
3 | 27 | 3 | 30
| 39.13 | 4.35 | 43.48
| 90.00 | 10.00 |
| 56.25 | 14.29 |
---------+--------+--------+
4 | 9 | 9 | 18
| 13.04 | 13.04 | 26.09
| 50.00 | 50.00 |
| 18.75 | 42.86 |
---------+--------+--------+
5 | 2 | 9 | 11
| 2.90 | 13.04 | 15.94
| 18.18 | 81.82 |
| 4.17 | 42.86 |
---------+--------+--------+
Total 48 21 69
69.57 30.43 100.00

Frequency Missing = 5


STATISTICS FOR TABLE OF REP78 BY FOREIGN
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 4 27.264 0.001
Likelihood Ratio Chi-Square 4 29.912 0.001
Mantel-Haenszel Chi-Square 1 23.851 0.001
Phi Coefficient 0.629
Contingency Coefficient 0.532
Cramer's V 0.629

Effective Sample Size = 69
Frequency Missing = 5
WARNING: 40% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.


Notice the warning that SAS gave at the end of the results. The chi-square is not really valid when you have empty cells (or cells with expected values less than 5). In such cases, you can request Fisher’s exact test (which is valid under such circumstances) with the exact option as shown below.

PROC FREQ DATA=auto;
  TABLES rep78*foreign / CHISQ EXACT ;
RUN;

The results are shown below (omitting the crosstab, which is exactly the same as the prior results).  The Fisher’s Exact Test is significant, showing that there is an association between rep78 and foreign.   In other words, the repair records for the domestic cars differ from the repair record of the foreign cars.

STATISTICS FOR TABLE OF REP78 BY FOREIGN

Statistic DF Value Prob
------------------------------------------------------
Chi-Square 4 27.264 0.001
Likelihood Ratio Chi-Square 4 29.912 0.001
Mantel-Haenszel Chi-Square 1 23.851 0.001
Fisher's Exact Test (2-Tail) 6.27E-06
Phi Coefficient 0.629
Contingency Coefficient 0.532
Cramer's V 0.629

4. Correlation

Let’s use proc corr to examine the correlations among price mpg and weight.

PROC CORR DATA=auto;
  VAR price mpg weight ;
RUN;

The results of the proc corr are shown below. 

Correlation Analysis
3 'VAR' Variables: PRICE MPG WEIGHT

Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
PRICE 74 6165 2949 456229 3291 15906
MPG 71 21.33803 5.88447 1515 12.00000 41.00000
WEIGHT 74 3019 777.19357 223440 1760 4840

Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0/Number of Observations
PRICE MPG WEIGHT
PRICE 1.00000 -0.47774 0.53861
0.0 0.0001 0.0001
74 71 74

MPG -0.47774 1.00000 -0.80749
0.0001 0.0 0.0001
71 71 71

WEIGHT 0.53861 -0.80749 1.00000
0.0001 0.0001 0.0
74 71 74

The top portion of the output shows simple descriptive statistics for the variables (note that the N for mpg is 71 because it has 3 missing observations).  The second part of the output shows the correlation matrix for thepricempg, and weight   Each entry shows the correlation, and below that the 2 tailed p value for the hypothesis test that the correlation is 0, and below that is the sample size (N) on which the correlation is based.
By looking at the sample sizes, we can see how proc corr handled the missing values.  Since mpg had 3 missing values, all the correlations that involved it have an N of 71, whereas the rest of the correlations were based on an N of 74.  This is called pairwise deletion of missing data since SAS used the maximum number of non-missing values for each pair of variables.  It is possible to ask SAS to only perform the correlations on the records which had complete data for all of the variables on the var statement.  This is called listwise deletion of missing data, meaning that when any of the variables are missing, the entire record will be omitted from analysis.  You can request listwise deletion with the nomiss option as illustrated below.

PROC CORR DATA=auto NOMISS ;
  VAR price mpg weight ;
RUN;

The results are shown below.  Notice that the N for all the simple statistics is 71, and notice that the N is not displayed along with the correlations.  That is because the N is 71 for all of them (as shown in the title, N = 71).

Correlation Analysis
3 'VAR' Variables: PRICE MPG WEIGHT

Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
PRICE 71 6248 2983 443582 3291 15906
MPG 71 21.33803 5.88447 1515 12.00000 41.00000
WEIGHT 71 3021 791.31589 214520 1760 4840

Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 71
PRICE MPG WEIGHT
PRICE 1.00000 -0.47774 0.54176
0.0 0.0001 0.0001

MPG -0.47774 1.00000 -0.80749
0.0001 0.0 0.0001

WEIGHT 0.54176 -0.80749 1.00000
0.0001 0.0001 0.0

5. Regression

Let’s perform a regression analysis where we predict price from mpg and weight.   The proc reg example below does just this.

PROC REG DATA=auto;
MODEL price = mpg weight ;
RUN;

The results are shown below.  Two interesting things to note are:
    – Only 71 observations are used (not all 74) because mpg had three missing values.  Proc reg deletes missing cases using listwise deletion.   If you have lots of missing data, this is important to notice
    – Looking at the predictors, the results show that weight is the only variable that significantly predicts price (with a t-value of 2.603 and a p-value of 0.0113).

NOTE: 74 observations read.
NOTE: 3 observations have missing values.
NOTE: 71 observations used in computations.

Model: MODEL1
Dependent Variable: PRICE

Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F

Model 2 185670655.62 92835327.809 14.444 0.0001
Error 68 437038564.86 6427037.7185
C Total 70 622709220.48

Root MSE 2535.16029 R-square 0.2982
Dep Mean 6247.63380 Adj R-sq 0.2775
C.V. 40.57793

Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 2394.284967 3647.8753623 0.656 0.5138
MPG 1 -58.668896 87.29400011 -0.672 0.5038
WEIGHT 1 1.689685 0.64914497 2.603 0.0113

6. Analysis of variance (and analysis of covariance)

Let’s compare the average miles per gallon (mpg) among the cars in the different repair groups using Analysis of Variance. You might think to use proc anova for such an analysis, but proc anova assumes that the sample sizes for all groups are equal, an assumption that is frequently untrue.   Instead, we will use proc glm to perform an ANOVA comparing the prices among the repair groups.  Since there are so few cars with a repair record (rep78) of 1 or 2, we will use a where statement to omit them, allowing us to concentrate on the cars with repair records of 3, 4 and 5.  The proc glm below performs an Analysis of Variance testing whether the averagempg for the 3 repair groups (rep78) are the same.  It also produces the means for the 3 repair groups.

PROC GLM DATA=auto;
WHERE (rep78 = 3) OR (rep78 = 4) OR (rep78 = 5);
CLASS rep78;
MODEL mpg = rep78 ;
MEANS rep78 ;
RUN;

The results of the proc glm are shown below.  SAS informs us that it used only 57 observations (due to the missing values of mpg).  The results suggest that there are significant differences in mpg among the three repair groups (based on the F value of 8.08 with a p value of 0.009).  The means for groups 3, 4 and 5 were 19.43, 21.67, and 27.36 .

General Linear Models Procedure
Class Level Information

Class Levels Values
REP78 3 3 4 5

Number of observations in data set = 59
NOTE: Due to missing values, only 57 observations can be used in this analysis.
General Linear Models Procedure

Dependent Variable: MPG
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 2 497.26406926 248.63203463 8.08 0.0009
Error 54 1661.40259740 30.76671477
Corrected Total 56 2158.66666667

R-Square C.V. Root MSE MPG Mean
0.230357 25.60050 5.5467752 21.666667

Source DF Type I SS Mean Square F Value Pr > F
REP78 2 497.26406926 248.63203463 8.08 0.0009

Source DF Type III SS Mean Square F Value Pr > F
REP78 2 497.26406926 248.63203463 8.08 0.0009

Level of -------------MPG-------------
REP78 N Mean SD

3 28 19.4285714 4.23764934
4 18 21.6666667 4.93486992
5 11 27.3636364 8.73238487

You can use the tukey option on the means statement to request Tukey tests for pairwise comparisons among the three means. 

PROC GLM DATA=auto;
WHERE (rep78 = 3) OR (rep78 = 4) OR (rep78 = 5);
  CLASS rep78;
  MODEL mpg = rep78 ;
  MEANS rep78 / TUKEY ;
RUN;

The results just for the Tukey tests are shown below (the rest of the output is identical).  The Tukey comparisons that are significant are indicated by “***”.  The group with rep78 of 5 is significantly different from 3 and significantly different from 4.  However, the group with rep78 of 3 is not significantly different from rep78 of 4.

Tukey's Studentized Range (HSD) Test for variable: MPG

NOTE: This test controls the type I experimentwise error rate.

Alpha= 0.05 Confidence= 0.95 df= 54 MSE= 30.76671
Critical Value of Studentized Range= 3.408

Comparisons significant at the 0.05 level are indicated by '***'.

Simultaneous Simultaneous
Lower Difference Upper
REP78 Confidence Between Confidence
Comparison Limit Means Limit

5 - 4 0.581 5.697 10.813 ***
5 - 3 3.178 7.935 12.692 ***

4 - 5 -10.813 -5.697 -0.581 ***
4 - 3 -1.800 2.238 6.277

3 - 5 -12.692 -7.935 -3.178 ***
3 - 4 -6.277 -2.238 1.800

7. Problems to look out for

  • If you have lots of missing data, be sure to check the N when you do correlations, regression, or ANOVA.