Bootstrap hypothesis testing

Performs a bootstrap hypothesis testing on the given dataset to find which elements have significant contributions. Null hypothesis is: Elements have on average no contributions, unless a reference mean is given. This can be used for both a dataset of Shapley values (Shapley table) or a dataset of lesions if there are many samples for each element, e.g., if lesioning an element significantly impacted some feature of the system. For more information, watch this brilliant tutorial: https://www.youtube.com/watch?v=isEcgoCmlO0&t=893s

Parameters:

Name	Type	Description	Default
`dataset`	`DataFrame`	Common case use is Shapley table but anything else works too, as long as it follows the same structure/format and you know what you're doing.	required
`p_value`	`float`	Default is 0.05 please first correct for multiple-comparisons easiest and most conservative way: Bonferroni correction, p_value/n_elements.	`0.05`
`bootstrap_samples`	`int`	Number of bootstraps, default is 10_000 and I seems like it's a common practice to use 10_000.	`10000`
`reference_mean`	`Optional[int]`	In case the means should not be compared with a zero-mean distribution.	`None`

Returns:

Type	Description
`DataFrame`	pd.DataFrame: Distributions that are significantly different from H0.

TODO: really needs some performance optimization, probably with Numba and a change in the algorithm!

Source code in msapy/utils.py

@typechecked
def bootstrap_hypothesis_testing(*,
                                 dataset: pd.DataFrame,
                                 p_value: float = 0.05,
                                 bootstrap_samples: int = 10_000,
                                 reference_mean: Optional[int] = None) -> pd.DataFrame:
    """
    Performs a bootstrap hypothesis testing on the given dataset to find which elements have significant contributions.
    Null hypothesis is: Elements have on average no contributions, unless a reference mean is given. This can be used
    for both a dataset of Shapley values (Shapley table) or a dataset of lesions if there are many samples for each
    element, e.g., if lesioning an element significantly impacted some feature of the system.
    For more information, watch this brilliant tutorial:
    https://www.youtube.com/watch?v=isEcgoCmlO0&t=893s

    Args:
        dataset (pd.DataFrame):
            Common case use is Shapley table but anything else works too,
            as long as it follows the same structure/format and you know what you're doing.

        p_value (float):
            Default is 0.05 **please first correct for multiple-comparisons** easiest and most conservative way:
            Bonferroni correction, p_value/n_elements.

        bootstrap_samples (int):
            Number of bootstraps, default is 10_000 and I seems like it's a common practice to use 10_000.

        reference_mean (Optional[int]):
            In case the means should not be compared with a zero-mean distribution.

    Returns:
        pd.DataFrame: Distributions that are significantly different from H0.

    #TODO: really needs some performance optimization, probably with Numba and a change in the algorithm!
    """
    mean_adjusted_distributions = pd.DataFrame()
    bootstrapped_distributions = pd.DataFrame()
    significants = pd.DataFrame()
    percentile = (1 - p_value) * 100

    if p_value <= 0.:
        raise ValueError("A zero/negative value for p_value? What?")

    if bootstrap_samples <= 0.:
        raise ValueError("A zero/negative value for bootstrap_samples? What?")

    elif 1 < bootstrap_samples < 1_000:
        warnings.warn("Bootstrap sample size is small, please go above 1000.", stacklevel=2)

    for distribution in tqdm(dataset.columns, total=len(dataset.columns), desc='Bootstrapping: '):
        if reference_mean:  # adjusting the distributions to have the same mean as the reference.
            mean_adjusted_distributions[distribution] = \
                dataset[distribution] - dataset[distribution].mean() + reference_mean

        else:  # adjusting the distributions to center around zero.
            mean_adjusted_distributions[distribution] = \
                dataset[distribution] - dataset[distribution].mean()

        resampled_means = []
        for sample in range(bootstrap_samples):  # resampling (with replacement) from the mean-adjusted distribution
            resampled_means.append(np.mean((np.random.choice(
                list(mean_adjusted_distributions[distribution]),
                len(mean_adjusted_distributions[distribution].values),
                replace=True))))

        bootstrapped_distributions[distribution] = resampled_means

    # checking if the means are significantly different.
    for bootstrapped_distribution in bootstrapped_distributions.columns:
        percentiles = np.percentile(bootstrapped_distributions[bootstrapped_distribution], [0, percentile])
        if not percentiles[0] <= dataset[bootstrapped_distribution].mean() <= percentiles[1]:
            significants[bootstrapped_distribution] = dataset[bootstrapped_distribution]

    significants = sorter(significants)
    return significants