Article Text

Download PDFPDF
Understanding statistical terms: 2

Abstract

An increasing number of statistical terms appear in journal articles and other medical information. A working knowledge of these is essential in assessing clinical evidence. With this in mind, we are producing a series of explanatory articles covering various statistical terms and their uses. This, the second article in the series, will focus on some of the most common terms used in reporting the results of randomised controlled trials.1

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

TERMS RELATING TO TRIAL RESULTS

Significance

The differences found between groups in a study might be ‘statistically significant’ and/or ‘clinically significant’; these two are different. For example, a study that showed one treatment was 8% more effective than the other could be statistically significant (i.e. the two are shown to be different in the statistical analysis) but not clinically significant (i.e. the 8% might, in reality, be so small that a patient would not really notice, or not value, a benefit of this size). On the other hand, the benefit from a really effective treatment could be clinically significant (i.e. give benefit to the patient in real terms) but a particular study might be too small to demonstrate this effect (not statistically significant).

P values

The p value represents how likely it is that a particular result in a study occurred by chance alone, assuming that the null hypothesis2 is in fact really true and there is, in reality, no difference between the drug and no treatment. For example, a study might suggest that a drug reduces mortality from 30% (the rate on no treatment) to 20%, and report a p value of p<0.05 as evidence of this. This statistic indicates that if the trial were performed repeatedly, a difference in outcome between the two study groups as big as this (i.e. 10%), or larger, could be expected to occur by chance alone in fewer than 5% of these studies. The smaller the p value, the lower the likelihood that the result happened by chance, and the more certain it is really is a difference between the two treatments being compared. The conventional cut-off p value for rejecting the null hypothesis is set at p = 0.05, which means a 1 in 20 chance that the result has occured by chance rather than because of a real difference; p = 0.01 is 1 in 100 chance and p = 0.001 is a 1 in 1,000 chance.

Confidence intervals

Confidence intervals (CIs) are often reported for studies, and are a measure of the certainty that can be assumed for results. For example, “mortality rate 10%, 95% CI 8% to 12%” means that in the study, a rate of 10% was found, but if it were repeated 100 times, in 95 of those studies, the result would be somewhere between 8% and 12%. This is a reminder that the trial result (10%) is derived from a sample (the people in the study), so is merely an estimate of the underlying population value. The bigger the sample, the more accurate the estimate, and the smaller the resulting confidence intervals. If the 95% confidence interval for the difference in absolute risk (see below) between two groups includes the value ‘0’, then this indicates that the trial has not identified any significant difference between the groups (i.e. the result is compatible with no difference). Similarly, if the 95% confidence interval for a relative risk or an odds ratio (see below) includes the value ‘1’, then, again, the result is compatible with no difference between the groups.

Effect size

One way of describing the magnitude of a difference in a variable over time is by using a statistic known as the effect size. For each group (or sample) in the trial, the effect size is based on the difference between the mean value of the variable at two time points (e.g. at baseline and at the end of the study). The effect size also takes into account the standard deviation2 of the variable values and the size of the groups.

Calculating effect size

An effect size can be derived by first calculating the ‘pooled’ standard deviation (SD). This is derived by putting the number in a group before (n1) and after (n2) treatment and the standard deviation before (s1) and after (s2) treatment into the following formula:

Embedded Image

The difference between means before (mean1) and after (mean2) treatment is then divided by the pooled SD to give the effect size:

Embedded Image

The effect size can range from 0 to 1; a value of 0.2 is regarded as a small effect size, 0.5 medium and 0.8 large (with ‘0’ indicating no effect). Since the effect size has no units, and has taken into account the means, variation and group size, it can be used for indirect comparison between studies.

Risk

This is the likelihood of an event happening compared with the total number of possible events (e.g. if 1 person died out of 100, the risk of death, the mortality rate, is 1% or 0.01).

Absolute risk and absolute risk reduction

The absolute risk (AR) is the probability that an individual will experience the specified outcome (risk) during a specified period. It lies in the range 0 (definitely will not occur) to 1 (definitely will occur), or is expressed as a percentage.

In the context of a randomised controlled trial, absolute risk reduction (ARR) is the amount by which a treatment reduces the risk of an event. For example, reducing the risk from 30% to 23% gives an absolute risk reduction of 7%.

Relative risk or risk ratio

In the context of a randomised controlled trial, the relative risk (RR) is the chance of an outcome (e.g. death) while on a specific treatment compared with the chance while on an alternative treatment (or no treatment). It is calculated by dividing the rate of the event (number of events divided by the total possible) in one group of patients within a study by the rate in another, comparative, group. The result is a proportion or a fraction. For example, if the risk of mortality on a drug is 1% and the baseline risk without treatment is 4%, the relative risk is 1% divided by 4%, that is, ¼ (or 25% or 0.25).

Odds

The ‘odds’ of an event occurring is another way of describing the chance of it happening. However, odds differ from a risk calculated by dividing the chance of an event by the total number of possible events. By contrast, the odds are derived by dividing the number of people with an event by the number who did not have the event. The difference can be seen by considering an example scenario: if 1 person died out of 100, the mortality rate is 1/100; the odds of dying are 1 person who died divided by the 99 who did not die. As it happens, in this example, the odds give a similar value as the rate (around 1%). When the outcome is more frequent than this, however, the rate and the odds may be quite different. For example, if tossing a coin 10 times gives 4 heads, then the rate of heads is 4 out of 10 (4/10; 40%; 0.4). However, the odds are 4 heads compared with 6 tails (4/6; 67%; 0.67).

Odds ratio

The odds ratio (OR) is similar to the relative risk but is particularly useful where an outcome is rare. It is calculated by dividing the odds of an outcome in one group by the odds of the outcome in another group.

Number needed to treat

If a treatment reduces the likelihood of an unwanted outcome (e.g. death), the size of this benefit can be calculated as the absolute risk reduction (ARR; see above). Calculating the reciprocal of the absolute risk reduction (1 divided by ARR) gives the number of patients that would need to be treated (number needed to treat; NNT) for a defined period in order to prevent one unwanted outcome (e.g. death). For example, if treatment with a drug for 1 year reduces the risk of death from 21% to 17%, this means that the ARR is 4% ( =  0.04); 1 divided by 0.04  =  25, so 25 patients would need to be treated for 1 year to prevent 1 death. The NNT can give an indication of the effectiveness of a treatment – something that gives a large reduction in risk will have a small NNT (i.e. fewer patients need to be treated to demonstrate benefit). NNT is also sometimes written as NNTB (where ‘B’ stands for ‘benefit’).

Number needed to harm

If a treatment increases the likelihood of an unwanted outcome (e.g. an adverse event), a number needed to harm (NNH) can be calculated in the same way as a number needed to treat (i.e. the reciprocal of the difference in risk; 1 divided by the additional amount of risk). This gives information on the likelihood of unwanted effects; ideally, a treatment would have a small NNT (a benefit is expected frequently) and a large NNH (i.e. many patients would need to take the treatment before one was harmed by it, such that harm is expected infrequently). NNH is also sometimes written as NNTH (where ‘H’ stands for ‘harm’).

REFERENCES

    [M=meta-analysis; R=randomised controlled trial]