In cases where multiple sets produce the same minimum pvalue, all of those should be included in the empirical dist

When performing the bootstrap, it is the case that multiple combinations of the hypergeometric parameters can produce the same minimum pvalue. In that case, all of those minimum pvalues should be returned and added to the empirical distribution.

Ex

This is our empirical distribution -- note that duplicate values have been included, in this case b/c in a single bootstrap iteration, there were multiple 'minimum' pvalues. This is possible b/c different combinations of the hypergeometric parameters can produce the same pvalue

```R
> # Simulated minimum p-values from bootstrap samples
> empirical_distribution <- c(0.01, 0.01, 0.01, 0.02, 0.03, 0.03, 0.04, 0.05, 0.10, 0.15)
```

We set a observed pvalue

```R
> # Observed p-value from the original data
> observed_p <- 0.01
```

And calculate the empirical pvalue by comparing the observed_p to the empirical distribution, including the duplicate minimum pvalues

```R
> # Case 1: Including all ties
> empirical_all <- mean(empirical_distribution <= observed_p)
```

And in case 2, we mimic what we're currently doing in DTO by choosing just 1 of those pvalues from each iteration

```R
> # Case 2: Including only one of each tied min (simulate duplicate exclusion)
> # We'll assume we only count one of the 0.01s
> empirical_unique <- mean(unique(empirical_distribution) <= observed_p)
```

The result is a different empirical pvalue

```R
> cat("Observed p-value: ", observed_p, "\n")
Observed p-value:  0.01 
> cat("Empirical p-value (all ties):   ", empirical_all, "\n")
Empirical p-value (all ties):    0.3 
> cat("Empirical p-value (one per tie):", empirical_unique, "\n")
Empirical p-value (one per tie): 0.1428571
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

In cases where multiple sets produce the same minimum pvalue, all of those should be included in the empirical dist #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

In cases where multiple sets produce the same minimum pvalue, all of those should be included in the empirical dist #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions