Why p-values, effect sizes, and clinically meaningful change are not enough
When clinical trials are conducted, significance values are generally set at a p-value of 0.05. This means there is less than a 5 percent chance that the finding occurred by chance. Criticisms associated with this approach are that even though the finding was statistically significant, it does not mean the result was meaningful. For example, the intervention may only reduce symptom scores by 2 points. On a 100-point scale, this is unlikely to result in a noticeable change to a person.
To combat this problem, effect sizes were advocated by researchers and statisticians. An effect size is a value that provides a measure of the magnitude of differences between two interventions. Even though definitions vary, 0.2, 0.5, and 0.8 are considered small, medium and large effect sizes, respectively.
In addition to effect sizes, researchers also refer to ‘clinically- meaningful changes’. This is a change that can be defined as having clinical or practical importance. It may be a level of change that an individual finds noticeable and increases his/her quality of life. The magnitude of a clinically meaningful change is influenced by the condition or symptom being treated and the outcome measure used to assess change. It is often argued that if an intervention is not clinically-meaningful, then it is ineffective.
While effect sizes and clinically meaningful change make theoretical sense, they are based on the logic that treatments are delivered as stand-alone interventions. This is flawed logic, as most solutions to enhance physical and mental wellbeing comprise multiple steps. It is the collective implementation of all, or many of these strategies that enhances overall wellbeing. For example, eating 2 to 3 pieces of fruit daily is unlikely to result in miraculous changes to one’s wellbeing. However, if it forms a part of an overall wellbeing plan, it comprises an important part. The effect size of this behaviour is unlikely to be large and may in fact be less than 0.2. It will also not be noticeable, but I would argue that it is valuable.
Moreover, another problem with effect sizes and meaningful changes as metrics is that they do not consider the financial costs, time demands, required resources, treatment duration, and adverse effects associated with an intervention. For example, if an intervention has an effect size of 0.8 (large), costs tens of thousands of dollars, requires several hours a day to implement, takes 6 months to work, and is associated with several serious adverse reactions, can we really say that this intervention is more valuable than one that can be easily implemented with other changes, has an effect size of 0.2 (small), costs less than $100, is safe, only takes several minutes to implement every day, and works after 2 to 4 weeks?
Therefore, several metrics must be considered when we examine interventions and compare their value. These could include the following:
TREATMENT VALUE = Effect size ÷ cost ($) ÷ time (minutes) ÷ treatment duration ÷ risk and seveity of adverse reactions (possibly comprising some tye of risk severity rating)
Although this simplistic formula is unlikely to be easily calculated for most interventions, it comprises components that are essential to consider when determining an intervention’s overall ‘value’. Moving away from simplistic comparisons between interventions that do not consider important contextual variables is a practice that requires an overhaul.