Learning turns experience into better decisions. A key problem in learning is credit assignment-knowing how to change parameters, such as synaptic weights deep within a neural network, in order to improve behavioral performance. Artificial intelligence owes its recent bloom largely to the error-backpropagation algorithm 1 , which estimates the contribution of every synapse to output errors and allows rapid weight adjustment. Biological systems, however, lack an obvious mechanism to backpropagate errors. Here we show, by combining high-throughput volume electron microscopy 2 and automated connectomic analysis [3][4][5] , that the synaptic architecture of songbird basal ganglia supports local credit assignment using a variant of the node perturbation algorithm proposed in a model of songbird reinforcement learning 6,7 . We find that key predictions of the model hold true: first, cortical axons that encode exploratory motor variability terminate predominantly on dendritic shafts of striatal spiny neurons, while cortical axons that encode song timing terminate almost exclusively on spines. Second, synapse pairs that share a presynaptic cortical timing axon and a postsynaptic spiny dendrite are substantially more similar in size than expected, indicating Hebbian plasticity 8,9 . Combined with numerical simulations, these findings provide strong evidence for a biologically plausible credit assignment mechanism 6 . Neural circuits that control decisions and actions are recurrently connected and involve many network layers from sensory inputs to motor output. Yet, as we learn, some mechanism specifies precisely which synapses, out of trillions, are to be modified and in what way. The backpropagation algorithm 10 is powerful because it directly calculates, based on the network architecture, the derivative of output errors with respect to every synaptic weight, providing an efficient method to update synaptic strengths. However, it remains unclear whether backpropagation or its variants are biologically implemented, or even plausible [11][12][13] . An alternative approach to implement gradient-based learning is weight-or node-perturbation 14,15 , in which the activity of a specific synapse or neuron is stochastically varied to determine its contribution to the output. Here we use a connectomic approach to study the biological implementation of stochastic gradient descent, which requires as-yet unknown circuit structures to inject variability, correlate variability with reward signals, and correctly assign credit to relevant synapses.Node perturbation is conceptually similar to behavioral trial-and-error reinforcement learning (RL) 16,17 . In the vertebrate brain RL is thought to involve the basal ganglia 18 , where a multitude of sensory and other context and state signals converge with action and outcome signals to determine which actions in which state lead to the best outcomes. In the songbird, the basal ganglia circuit dedicated to song learning 19 , Area X, receives synaptic input from two cortical areas 20 : the voc...