Okay, here is my final deep dive post. I end up computing the 2-sided p-values for all 6 eligible FX-322 participants on the WR test-retest. Recall that 4/6 are statistically significant (and clinically) by the Thornton and Raffin 95% confidence intervals. Hence, we will have 4 individual p-values less than .05 and 2 greater than .05. Note that everyone saw improvement, even if it wasn't statistically significant. One non-significant person was right on the cusp of significant.
Background: I was thinking about the bear/bull divide and it occurred to me where it is. Roughly speaking, since the data was imbalanced by chance, the placebos started off closer to a ceiling effect. Hence, even though all group level differences were statistically significant, except PTA, the bears would argue that the placebos had a harder fight, with higher starting baselines -- probably some validity to this, even though it's percentage based.
But here's where the divide really is. Frequency Therapeutics have said (in the Tinnitus Talk Podcast) that there shouldn't
really be a placebo effect with hearing words. Looking at the placebo patients, this is what we would expect. They all land in the 95% confidence intervals and scattered above and below.
The bulls think even with the data imbalance, let's focus on the responders. Placebo control is a little less important to us because we think the placebo effect is less for hearing. We see the 4 WR responders with massive gains and think that even though it's only 4, there's no way that's by chance or just retesting a little better. I think so much of the bull/bear position is what you think of a placebo effect. However, I do want to reiterate that the FX-322 group did outperform the placebo group in every test except one so it's not
just zooming in on 4 ears.
Okay, here's my work and calculations from Thornton and Raffin. I am writing it out for completeness, but it's perfectly fine to skip right to the p-values. From the patent submission,
View attachment 44095
By Central Limit Theorem (justified), we apply angular transformations to baseline and day 90 scores. Then using the Freeman and Tukey uniform variance formula that was adjusted by Mosteller and Youtz (1961), we have the formula
Z=(arcsin(sqrt(x/(n+1)))+arcsin(sqrt((x+1)/(n+1)))-(arcsin(sqrt(y/(n+1)))+arcsin(sqrt((y+1)/(n+1))))/sqrt(2/(n+1/2)).
Here,
- x=# correct out of 50 at baseline,
- y = # correct out of 50 at day 90,
- n=50
This leads to the following Z-scores and corresponding 2-sided p-values. Note that alpha=0.05.
Patient 918: Z=-1.8972, p=2*P(Z<-1.8972)=
0.0578,
which is
not less than alpha (but close) so not statistically significant.
Patient 932: Z=-0.9795, p=2*P(Z<-0.9795)=
0.3273,
which is
not less than alpha and not close.
Patient 919: Z=-2.1268, p=2*P(Z<-2.1268)=
0.0334,
which
is less than alpha so statistically significant.
Patient 916: Z=-4.0547, p=2*P(Z<-4.0547)=
0.00005,
which is
way less than alpha so very statistically significant.
Patient 936: Z=-3.9133, p=2*P(Z<-3.9133)=
0.00009,
which is
way less than alpha so very statistically significant.
Patient 937: Z=-5.0387, p=2*P(Z<-5.0387)=
0.00000047,
which is
way less than alpha so very statistically significant.
So I will conclude with some commentary on these p-values. The last three (patients 916, 936, and 937) have
very small p-values. It's even crazier because this is two-sided, which produces
bigger p-values. The probability of that occurring by chance is practically impossible to believe.
Let's just say that I'm a bull and there are 3 patients in the Phase 1b study riding me hard.