Transformations, means, and confidence intervals.

JM Bland, DG Altman - BMJ: British Medical Journal, 1996 - ncbi.nlm.nih.gov
JM Bland, DG Altman
BMJ: British Medical Journal, 1996ncbi.nlm.nih.gov
When we use transformed data in analyses,'this affects the final estimates that we obtain.
Figure 1 shows some serum triglyceride measurements, which have a skewed distribution. A
logarithmic transformation is often useful for data which have positive skewness like this,
and here the approximation to a normal distribution is greatly improved. For the
untransformed data the mean is 0.51 mmol/l and the standard deviation 0.22 mmol/l. The
mean of the log1o transformed data is-0.33 and the standard deviation is 0.17. If we take the …
When we use transformed data in analyses,'this affects the final estimates that we obtain. Figure 1 shows some serum triglyceride measurements, which have a skewed distribution. A logarithmic transformation is often useful for data which have positive skewness like this, and here the approximation to a normal distribution is greatly improved. For the untransformed data the mean is 0.51 mmol/l and the standard deviation 0.22 mmol/l. The mean of the log1o transformed data is-0.33 and the standard deviation is 0.17. If we take the mean on the transformed scale and back transform by taking the antilog, we get 10-033= 0.47 mmol/l. We call the value estimated in this way the geometric mean. The geometric mean will be less than the mean of the raw data. When triglyceride is measured in mmol/l the log of a single observation is the log of a measurement in mmol/l. The average of n such transformed measurements is also the log of a number in mmol/l, so the antilog is back in the original units, mmol/l. The antilog of the standard deviation, however, is not measured in mmol/l. Calculation ofthe standard deviation of the log transformed data requires taking the difference between each log observation and the log geometric mean. The difference between the log of two numbers is the log of their ratio. 2 As a ratio is a dimensionless pure number, the units in which serum triglyceride was measured would not matter; the standard deviation on the log scale would be the same. As a result, we cannot transform the standard deviation back to the original scale. If we want to use the standard deviation or standard error it is easiest to do all calculations on the transformed scale and transform back, if necessary, at the end. For example, the 95% confidence interval for themean on the log scale is-0.35 to-0.31. To get back to the original scale we antilog the confidence limits on the log scale to give a
ncbi.nlm.nih.gov