-
Notifications
You must be signed in to change notification settings - Fork 2
Description
When performing the bootstrap, it is the case that multiple combinations of the hypergeometric parameters can produce the same minimum pvalue. In that case, all of those minimum pvalues should be returned and added to the empirical distribution.
Ex
This is our empirical distribution -- note that duplicate values have been included, in this case b/c in a single bootstrap iteration, there were multiple 'minimum' pvalues. This is possible b/c different combinations of the hypergeometric parameters can produce the same pvalue
> # Simulated minimum p-values from bootstrap samples
> empirical_distribution <- c(0.01, 0.01, 0.01, 0.02, 0.03, 0.03, 0.04, 0.05, 0.10, 0.15)We set a observed pvalue
> # Observed p-value from the original data
> observed_p <- 0.01And calculate the empirical pvalue by comparing the observed_p to the empirical distribution, including the duplicate minimum pvalues
> # Case 1: Including all ties
> empirical_all <- mean(empirical_distribution <= observed_p)And in case 2, we mimic what we're currently doing in DTO by choosing just 1 of those pvalues from each iteration
> # Case 2: Including only one of each tied min (simulate duplicate exclusion)
> # We'll assume we only count one of the 0.01s
> empirical_unique <- mean(unique(empirical_distribution) <= observed_p)The result is a different empirical pvalue
> cat("Observed p-value: ", observed_p, "\n")
Observed p-value: 0.01
> cat("Empirical p-value (all ties): ", empirical_all, "\n")
Empirical p-value (all ties): 0.3
> cat("Empirical p-value (one per tie):", empirical_unique, "\n")
Empirical p-value (one per tie): 0.1428571