“…To study the impact of our learning rule on network performance and dissect the effects of its different components, we train RSNNs using five different approaches for each task. These, illustrated in 3, are as follows: (i) BPTT, which updates weights using exact gradients shown in Figure 3ai; (ii) E-prop [23], the state-of-the-art method for biologically plausible training of RSNNs, shown Figure 3aii; (iii) TRTRL, the truncated RTRL given in (7) without the cell-type approximation, shown in Figure 3aiii; (iv) MDGL, which incorporates the cell type approximation given in (11) and (12) using only two cell types, shown in Figure 3aiv; (v) NL-MDGL, a nonlocal version of MDGL, where the gain is replaced by w αβ =< w jp > j∈α,p∈β even for w jp = 0 so that the modulatory signal diffuses to all cells in the network, shown in Figure 3av. We note that factor ∂E ∂zj,t , which depends on future errors as mentioned earlier, participates in the generation of all training results pertaining to MDGL in the main text ( Figures 4-7); in supplementary materials, we derive an online approximation to MDGL and demonstrate (via simulation) that it does not lead to significant performance degradation ( Figure S3, Section S3.2).…”