Skip to content

Interface

A wrapper function to call other related functions internally and produces an easy-to-use pipeline.

Parameters:

Name Type Description Default
n_permutations int

Number of permutations (samples) per element.

required
elements list

List of the players (elements). Can be strings (names), integers (indicies), and tuples.

required
objective_function Callable

The game (in-silico experiment). It should get the complement set and return one numeric value either int or float. This function is just calling it as: objective_function(complement, **objective_function_params)

An example using networkx with some tips: (you sometimes need to specify what should happen during edge-cases like an all-lesioned network)

def local_efficiency(complements, graph): if len(complements) < 0: # the network is intact so: return nx.local_efficiency(graph)

elif len(complements) == len(graph):
    # the network is fully lesioned so:
    return 0.0

else:
    # lesion the system, calculate things
    lesioned = graph.copy()
    lesioned.remove_nodes_from(complements)
    return nx.local_efficiency(lesioned)
required
objective_function_params Dict

Kwargs for the objective_function.

{}
permutation_space Optional[list]

Already generated permutation space, in case you want to be more reproducible or something and use the same lesion combinations for many metrics.

None
pair Optional[Tuple]

pair of elements that will always be together in every combination

None
lesioned Optional[any]

leseioned element that will not be present in any combination

None
multiprocessing_method str

So far, two methods of parallelization is implemented, 'joblib' and 'ray' and the default method is joblib. If using ray tho, you need to decorate your objective function with @ray.remote decorator. Visit their documentations to see how to go for it. I guess ray works better on HPC clusters (if they support it tho!) and probably doesn't suffer from the sneaky "memory leakage" of joblib. But just by playing around, I realized joblib is faster for tasks that are small themselves. Remedies are here: https://docs.ray.io/en/latest/auto_examples/tips-for-first-time.html

Note: Generally, multiprocessing isn't always faster as explained above. Use it when the function itself takes some like each game takes longer than 0.5 seconds or so. For example, a function that sleeps for a second on a set of 10 elements with 1000 permutations each (1024 games) performs as follows:

- no parallel: 1020 sec
- joblib: 63 sec
- ray: 65 sec

That makes sense since I have 16 cores and 1000/16 is around 62.

'joblib'
rng Optional[Generator]

Numpy random generator object used for reproducable results. Default is None.

None
random_seed Optional[int]

sets the random seed of the sampling process. Only used when rng is None. Default is None.

None
n_parallel_games int

Number of parallel jobs (number of to-be-occupied cores), -1 means all CPU cores and 1 means a serial process. I suggest using 1 for debugging since things get messy in parallel!

-1
lazy bool

if set to True, objective function will be called lazily instead of calling it all at once and storing the outputs in a dict. Setting it to True saves a lot of memory and might even be faster in certain cases.

True
save_permutations bool

If set to True, the shapley values are calculated by calculating the running mean of the permutations instead of storing the permutations. This parameter is ignored in case the objective function returns a scaler.

False
dual_progress_bar bool

If set to true, you will have two progress bars. One parent that will track the permutations, other child that will track the elements. Its ignored in case the mbar is provided

required
mbar MasterBar

A Fastprogress MasterBar. Use it in case you're calling the interface multiple times to have a nester progress bar.

None

Returns:

Type Description
DataFrame

Tuple[pd.DataFrame, Dict, Dict]: shapley_table, contributions, lesion_effects

Note that contributions and lesion_effects are the same values, addressed differently. For example: If from a set of ABCD removing AC ends with some value x, you can say the contribution of BD=x and the effect of removing AC=x. So the same values are addressed differently in the two returned Dicts. Of course, it makes more sense to compare the lesion effects with the intact system but who am I to judge.

Source code in msapy/msa.py
@typechecked
def interface(*,
              n_permutations: int,
              elements: list,
              objective_function: Callable,
              objective_function_params: Dict = {},
              permutation_space: Optional[list] = None,
              pair: Optional[Tuple] = None,
              lesioned: Optional[any] = None,
              multiprocessing_method: str = 'joblib',
              rng: Optional[np.random.Generator] = None,
              random_seed: Optional[int] = None,
              n_parallel_games: int = -1,
              lazy: bool = True,
              save_permutations: bool = False,
              dual_progress_bars: bool = True,
              mbar: Optional[MasterBar] = None
              ) -> pd.DataFrame:
    """
    A wrapper function to call other related functions internally and produces an easy-to-use pipeline.

    Args:
        n_permutations (int):
            Number of permutations (samples) per element.

        elements (list):
            List of the players (elements). Can be strings (names), integers (indicies), and tuples.

        objective_function (Callable):
            The game (in-silico experiment). It should get the complement set and return one numeric value
            either int or float.
            This function is just calling it as: objective_function(complement, **objective_function_params)

            An example using networkx with some tips:
            (you sometimes need to specify what should happen during edge-cases like an all-lesioned network)

            def local_efficiency(complements, graph):
                if len(complements) < 0:
                    # the network is intact so:
                    return nx.local_efficiency(graph)

                elif len(complements) == len(graph):
                    # the network is fully lesioned so:
                    return 0.0

                else:
                    # lesion the system, calculate things
                    lesioned = graph.copy()
                    lesioned.remove_nodes_from(complements)
                    return nx.local_efficiency(lesioned)

        objective_function_params (Dict):
            Kwargs for the objective_function.

        permutation_space (Optional[list]):
            Already generated permutation space, in case you want to be more reproducible or something and use the same
            lesion combinations for many metrics.

        pair (Optional[Tuple]):
            pair of elements that will always be together in every combination

        lesioned (Optional[any]):
            leseioned element that will not be present in any combination

        multiprocessing_method (str):
            So far, two methods of parallelization is implemented, 'joblib' and 'ray' and the default method is joblib.
            If using ray tho, you need to decorate your objective function with @ray.remote decorator. Visit their
            documentations to see how to go for it. I guess ray works better on HPC clusters (if they support it tho!)
            and probably doesn't suffer from the sneaky "memory leakage" of joblib. But just by playing around,
            I realized joblib is faster for tasks that are small themselves. Remedies are here:
            https://docs.ray.io/en/latest/auto_examples/tips-for-first-time.html

            Note: Generally, multiprocessing isn't always faster as explained above. Use it when the function itself
            takes some like each game takes longer than 0.5 seconds or so. For example, a function that sleeps for a
            second on a set of 10 elements with 1000 permutations each (1024 games) performs as follows:

                - no parallel: 1020 sec
                - joblib: 63 sec
                - ray: 65 sec

            That makes sense since I have 16 cores and 1000/16 is around 62.

        rng (Optional[np.random.Generator]): Numpy random generator object used for reproducable results. Default is None.

        random_seed (Optional[int]):
            sets the random seed of the sampling process. Only used when `rng` is None. Default is None.

        n_parallel_games (int):
            Number of parallel jobs (number of to-be-occupied cores),
            -1 means all CPU cores and 1 means a serial process.
            I suggest using 1 for debugging since things get messy in parallel!

        lazy (bool): if set to True, objective function will be called lazily instead of calling it all at once and storing the outputs in a dict.
            Setting it to True saves a lot of memory and might even be faster in certain cases.

        save_permutations (bool): If set to True, the shapley values are calculated by calculating the running mean of the permutations instead of
            storing the permutations. This parameter is ignored in case the objective function returns a scaler.

        dual_progress_bar (bool): If set to true, you will have two progress bars. One parent that will track the permutations, other child that
            will track the elements. Its ignored in case the mbar is provided

        mbar (MasterBar): A Fastprogress MasterBar. Use it in case you're calling the interface multiple times to have a nester progress bar.


    Returns:
        Tuple[pd.DataFrame, Dict, Dict]: shapley_table, contributions, lesion_effects

    Note that contributions and lesion_effects are the same values, addressed differently. For example:
    If from a set of ABCD removing AC ends with some value x, you can say the contribution of BD=x and the
    effect of removing AC=x. So the same values are addressed differently in the two returned Dicts.
    Of course, it makes more sense to compare the lesion effects with the intact system but who am I to judge.
    """

    # create a numpy random number generator if one is not passed
    if not rng:
        rng = np.random.default_rng(
            random_seed) if random_seed else np.random.default_rng()

    # create a permutation_space if one is not passed
    if not permutation_space:
        permutation_space = make_permutation_space(elements=elements,
                                                   n_permutations=n_permutations,
                                                   pair=pair,
                                                   rng=rng)
    else:
        warnings.warn("A Permutation space is given so n_permutations will fall back to what's specified there.",
                      stacklevel=2)

    if lazy:
        shapley_table = get_shapley_table(permutation_space=permutation_space,
                                          lesioned=lesioned,
                                          lazy=True,
                                          objective_function=objective_function,
                                          objective_function_params=objective_function_params,
                                          dual_progress_bars=dual_progress_bars,
                                          save_permutations=save_permutations,
                                          mbar=mbar)[elements]
        return shapley_table

    combination_space = make_combination_space(permutation_space=permutation_space,
                                               pair=pair,
                                               lesioned=lesioned)
    complement_space = make_complement_space(combination_space=combination_space,
                                             elements=elements,
                                             lesioned=lesioned)

    if n_parallel_games == 1:
        contributions, _ = take_contributions(elements=elements,
                                              complement_space=complement_space,
                                              combination_space=combination_space,
                                              objective_function=objective_function,
                                              objective_function_params=objective_function_params,
                                              mbar=mbar)
    else:
        contributions, _ = ut.parallelized_take_contributions(
            multiprocessing_method=multiprocessing_method,
            n_cores=n_parallel_games,
            complement_space=complement_space,
            combination_space=combination_space,
            objective_function=objective_function,
            objective_function_params=objective_function_params,
            mbar=mbar)

    shapley_table = get_shapley_table(contributions=contributions,
                                      permutation_space=permutation_space,
                                      dual_progress_bars=dual_progress_bars,
                                      save_permutations=save_permutations,
                                      lesioned=lesioned, mbar=mbar)[elements]
    return shapley_table