Get shapley table
Calculates Shapley values based on the filled contribution_space. Briefly, for a permutation (A,B,C) it will be:
(A,B,C) - (B,C) = Contribution of A to the coalition (B,C). (B,C) - (C) = Contribution of B to the coalition formed with (C). (C) = Contribution of C alone.
This will repeat over all permutations. and the result is a distribution of Shapley values for each element, note that the estimation method we're using here is an "unbiased estimator" so the variance is fairly large.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
contributions |
Dict
|
Filled Dictionary of coalition:result |
None
|
permutation_space |
list
|
Should be the same passed to make_combination_space. |
required |
lesioned |
Optional[any]
|
leseioned element that will not be present in any combination |
None
|
objective_function |
Callable
|
The game (in-silico experiment). It should get the complement set and return one numeric value either int or float. This function is just calling it as: objective_function(complement, **objective_function_params) An example using networkx with some tips: (you sometimes need to specify what should happen during edge-cases like an all-lesioned network) def local_efficiency(complements, graph): if len(complements) < 0: # the network is intact so: return nx.local_efficiency(graph)
|
None
|
objective_function_params |
Dict
|
Kwargs for the objective_function. |
None
|
lesioned |
Optional[any]
|
leseioned element that will not be present in any combination |
None
|
multiprocessing_method |
str
|
So far, two methods of parallelization is implemented, 'joblib' and 'ray' and the default method is joblib. If using ray tho, you need to decorate your objective function with @ray.remote decorator. Visit their documentations to see how to go for it. I guess ray works better on HPC clusters (if they support it tho!) and probably doesn't suffer from the sneaky "memory leakage" of joblib. But just by playing around, I realized joblib is faster for tasks that are small themselves. Remedies are here: https://docs.ray.io/en/latest/auto_examples/tips-for-first-time.html Note: Generally, multiprocessing isn't always faster as explained above. Use it when the function itself takes some like each game takes longer than 0.5 seconds or so. For example, a function that sleeps for a second on a set of 10 elements with 1000 permutations each (1024 games) performs as follows:
That makes sense since I have 16 cores and 1000/16 is around 62. |
required |
rng |
Optional[Generator]
|
Numpy random generator object used for reproducable results. Default is None. |
required |
random_seed |
Optional[int]
|
sets the random seed of the sampling process. Only used when |
required |
n_parallel_games |
int
|
Number of parallel jobs (number of to-be-occupied cores), -1 means all CPU cores and 1 means a serial process. I suggest using 1 for debugging since things get messy in parallel! |
required |
lazy |
bool
|
if set to True, objective function will be called lazily instead of calling it all at once and storing the outputs in a dict. Setting it to True saves a lot of memory and might even be faster in certain cases. |
False
|
save_permutations |
bool
|
If set to True, the shapley values are calculated by calculating the running mean of the permutations instead of storing the permutations. This parameter is ignored in case the objective function returns a scaler. |
False
|
dual_progress_bar |
bool
|
If set to true, you will have two progress bars. One parent that will track the permutations, other child that will track the elements. Its ignored in case the mbar is provided |
required |
mbar |
MasterBar
|
A Fastprogress MasterBar. Use it in case you're calling the interface multiple times to have a nester progress bar. |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: Shapley table or a dict of Shapely tables, columns will be |
DataFrame
|
elements and indices will be samples (permutations). |
DataFrame
|
It will be a Multi-Index DataFrame if the contributions are a timeseries. |
DataFrame
|
The index at |
Source code in msapy/msa.py
287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 |
|