We analyze the energy and training data requirements for supervised learning of an $M$-mode linear optical circuit by minimizing an empirical risk defined solely from the action of the circuit on coherent states. When the linear optical circuit acts non-trivially only on $k<M$ unknown modes (i.e., a linear optical $k$-junta), we provide an energy-efficient, adaptive algorithm that identifies the junta set and learns the circuit. We compare two schemes for allocating a total energy, $E$, to the learning algorithm. In the first scheme, each of the $T$ random training coherent states has energy $E/T$. In the second scheme, a single random $MT$-mode coherent state with energy $E$ is partitioned into $T$ training coherent states. The latter scheme exhibits a polynomial advantage in training data size sufficient for convergence of the empirical risk to the full risk due to concentration of measure on the $(2MT-1)$-sphere. Specifically, generalization bounds for both schemes are proven, which indicate that for $\epsilon$-approximation of the full risk by the empirical risk with high probability, $O(E^{2/3}M^{2/3}/\epsilon^{2/3})$ training states are sufficient for the first scheme and $O(E^{1/3}M^{1/3}/\epsilon^{2/3})$ training states are sufficient for the second scheme.