Parallelized take contributions
Same as the take_contribution function but parallelized over CPU cores to boost performance. I'd first try the single msapy version on a toy example to make sure everything makes sense then go for this because debugging parallel jobs is a disaster. Also, you don't need this if your game is happening on GPU. For HPC systems, I guess either dask or ray will be better options.
Note on returns: Contributions and lesion effects are virtually the same thing it's just about how you're looking at them. For example, you might want to use lesion effects by conditioning elements' length and see the effect of single lesions, dual, triple,... so, for contributions we have a value contributed by the intact coalition, the same result can be compared to the intact system to see how big was the impact of lesioning the complements. "Same same, but different, but still same!" - James Franco
Parameters:
Name | Type | Description | Default |
---|---|---|---|
multiprocessing_method |
str
|
So far, two methods of parallelization is implemented, 'joblib' and 'ray' and the default method is joblib. If using ray tho, you need to decorate your objective function with @ray.remote decorator. Visit their documentations to see how to go for it. I guess ray works better on HPC clusters (if they support it tho!) and probably doesn't suffer from the sneaky "memory leakage" of joblib. But just by playing around, I realized joblib is faster for tasks that are small themselves. Remedies are here: https://docs.ray.io/en/latest/auto_examples/tips-for-first-time.html Note: Generally, multiprocessing isn't always faster as explained above. Use it when the function itself takes some like each game takes longer than 0.5 seconds or so. For example, a function that sleeps for a second on a set of 10 elements with 1000 permutations each (1024 games) performs as follows:
That makes sense since I have 16 cores and 1000/16 is around 62. TODO: allow more flexibility in ray method. Scaling up to a cluster? |
'joblib'
|
n_cores |
int
|
Number of parallel games. Default is -1, which means all cores so it can make the system freeze for a short period, if that happened then maybe go for -2, which means one msapy is left out. Or really just specify the number of threads you want to use! |
-1
|
complement_space |
OrderedSet
|
The actual targets for lesioning. Shapley values are the added contributions of elements while in MSA we calculate them by perturbation so although it's intuitive to think the combination in combination space is the element that will be lesioned, it is not the case, it will be everything else but the coalition, i.e., the target coalition are the only intact elements. |
required |
combination_space |
OrderedSet
|
The template, will be copied, filled by the objective_function, and returned. |
required |
objective_function |
Callable
|
The game, it should get the complement set and return one numeric value either int or float. This function is just calling it as: objective_function(complement, **objective_function_params) so design accordingly. An example using networkx with some tips: (you sometimes need to specify what should happen during edge-cases like an all-lesioned network)
|
required |
objective_function_params |
Optional[Dict]
|
Kwargs for the objective_function. |
None
|
Returns:
Type | Description |
---|---|
Tuple[Dict, Dict]
|
|
Source code in msapy/utils.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
|