Estimate causal influences
Estimates the causal contribution (Shapley values) of each node on the rest of the network. Basically, this function performs MSA iteratively on each node and tracks the changes in the objective_function of the target node. For example we have a chain A -> B -> C, and we want to know how much A and B are contributing to C. We first need to define a metric for C (objective_function) which here let's say is the average activity of C. MSA then performs a multi-site lesioning analysis of A and B so for each we will end up with a number indicating their contributions to the average activity of C.
VERY IMPORTANT NOTES:
1. The resulting causal contribution matrix does not necessarily reflect the connectome. In the example above
there's no actual connection A -> C but there might be one in the causal contribution matrix since A is causally
influencing C via B.
2. Think twice (even three times) about your objective function. The same everything will result in different
causal contribution matrices depending on what are you tracking and how accurate it's capturing the effect of
lesions. Also don't forget the edge-cases. There will be weird behaviors in your system, for example, what it
does if every node is perturbed?
3. The metric you track is preferred to be non-negative and bounded (at least practically!)
4. Obviously this will take N times longer than a normal MSA with N is the number of nodes. So make sure your
process is as fast as it can be for example use Numba and stuff, but you don't need to implement any parallel
processes since it's already implemented here. Going below 1000 permutations might be an option depending on
your specific case but based on experience, it's not a good idea
5. Shapley values sum up (or will be close) to the value of the intact coalition. So for example if the
mean activity of node C here is 50 then causal_contribution_matrix.sum(axis=0) = 50 or close to 50. If not it
means:
1. the number of permutations are not enough
2. there is randomness somewhere in the process
3. your objective function is not suitable
Parameters:
Name | Type | Description | Default |
---|---|---|---|
elements |
list
|
List of the players (elements). Can be strings (names), integers (indicies), and tuples. |
required |
objective_function |
Callable
|
The game (in-silico experiment). It should get the complement set and return one numeric value either int or float. This function is just calling it as: objective_function(complement, **objective_function_params) An example using networkx with some tips: def lesion_me_senpai(complements, network, index): # note "index", your function should be able to track the effects on the target and the keyword for that is "index"
(you sometimes need to specify what should happen during edge-cases like an all-lesioned network) |
required |
objective_function_params |
Optional[Dict]
|
Kwargs for the objective_function. A dictionary pair of {'index': index} will be added to this during the process so your function can track the lesion effect. |
None
|
target_elements |
Optional[list]
|
list of elements that you want to calculate the causal influence of. |
None
|
multiprocessing_method |
str = 'joblib'
|
So far, two methods of parallelization is implemented, 'joblib' and 'ray' and the default method is joblib. If using ray tho, you need to decorate your objective function with @ray.remote decorator. Visit their documentations to see how to go for it. |
'joblib'
|
n_cores |
int = -1
|
Number of parallel games. Default is -1, which means all cores so it can make the system freeze for a short period, if that happened then maybe go for -2, which means one msapy is left out. Or really just specify the number of threads you want to use! |
-1
|
n_permutations |
int = 1000
|
Number of permutations per node. Didn't check it systematically yet but just based on random explorations I'd say something around 1000 is enough. |
1000
|
permutation_seed |
Optional[int] = None
|
Sets the random seed of the sampling process. Default is None so if nothing is given every call results in a different orderings. |
None
|
parallelize_over_games |
bool = False
|
Whether to parallelize over games or parallelize over elements. Parallelizing over the elements is generally faster. Defaults to False |
False
|
Returns:
Type | Description |
---|---|
DataFrame
|
causal_influences (pd.DataFrame) |
Source code in msapy/msa.py
879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 |
|