Analysis of Covariance Strengthens Predictive Statistics

Medical Researcher
Suppose you're a medical researcher trying to analyze the results of a clinical trial of three kinds of disease treatment. The trial results show three different sets of survival times for the patients in the three treatment groups. As the researcher, you're trying to determine which of the drugs involved helps patients live longer.

However, there's another variable in the mix: The patients in all three groups are different ages. How do you statistically "control" for this age difference – which naturally has an effect on life span – when analyzing the results? The mathematical method for accomplishing this task is known as "analysis of covariance."

In statistics and probability theory, covariance measures how much two variables change together and how strong the relationship is between the two. Analysis of Covariance (ANCOVA) is a sophisticated method that includes additional variables (covariates) into the mathematical or statistical model. This allows researchers to account for variations associated not with "treatments" themselves, but with one or more covariates. It differs from an Analysis of Variance (ANOVA), which is used to determine whether differences among test samples might be caused by random variation.

In this model of clinical trials of drugs for a specific disease, the supplementary information is each patient's age. Analysis of Covariance allows the researcher to adjust the effect of the treatment (how long the patients survive) with a particular age, such as the mean age of all patients. In this case, age would be called a "covariate." It isn't related to the treatment being tested (drugs), but it can affect the test results (survival time).

By adjusting for this variable using ANCOVA, a researcher can reduce the variation in results (survival time) observed among the three test groups that relates to their ages, but not to the drug treatment. In this way researchers can get a clearer picture of how well a treatment is likely to perform, which can lead to a prediction of how long a patient receiving the treatment would be likely to survive.

In layperson's terms, whenever one hears a prediction such as "one in three patients survives one year or longer using this treatment," this statistic probably has been determined from clinical trials in which the results were analyzed using covariance to account for any variables such as age, weight, sex, etc.

See another example

Here's how it works in mathematical analysis:

The first step in performing an ANCOVA is to compute each linear regression, or to show the results as points on a graph and to draw a straight line between them to show their relationship. The goal for linear regression is to find the equation (slope and intercept) of the line that best fits the points. This line visually summarizes the relationship between the variables.

The next step is to compare the slopes of the regression lines. The "null hypothesis," meaning the statement being tested, always assumes that the slopes of the regression lines are the same. In other words, the lines should trend in the same direction at the same angle.

However, if the slopes differ significantly from one another, it's not possible to complete the final step in the analysis of covariance, which is comparing where the points intersect the Y-axis on the graph (the Y-intercepts). That's because if the slopes differ, the lines will cross each other at some point rather than being parallel. In other words, if the slopes are the same, then the Y-intercepts of the regression lines should also be the same.

If the slopes differ significantly, the analysis of covariance is finished. All that can be determined is that the slopes aren't the same. In other words, there's no way to say statistically whether the covariates somehow have affected the test results.

Data for an analysis of covariance are shown on a scatter graph. Typically the independent variable is plotted on the X-axis and the dependent variable plotted on the Y-axis. In most cases, the regression lines for each set of points are shown individually, even if the slopes aren't significantly different. By showing all linear regressions, the graph allows researchers to see the similarities and differences among the slopes.

In summary, an analysis of covariance is likely to be a better method for controlling for certain common factors in determining the strength of predictors in test results.

Syndicate content