This paper studies a multiuser mobile edge computing (MEC) system, in which one base station (BS) serves multiple users with intensive computation tasks. We exploit the multi-antenna non-orthogonal multiple access (NOMA) technique for multiuser computation offloading, such that different users can simultaneously offload their computation tasks to the multi-antenna BS over the same time/frequency resources, and the BS can employ successive interference cancellation (SIC) to efficiently decode all users' offloaded tasks for remote execution. In particular, we pursue energy-efficient MEC designs by considering two cases with partial and binary offloading, respectively. We aim to minimize the weighted sum-energy consumption at all users subject to their computation latency constraints, by jointly optimizing the communication and computation resource allocation as well as the BS's decoding order for SIC. For the case with partial offloading, the weighted sum-energy minimization is a convex optimization problem, for which an efficient algorithm based on the Lagrange duality method is presented to obtain the globally optimal solution. For the case with binary offloading, the weighted sum-energy minimization corresponds to a mixed Boolean convex problem that is generally more difficult to be solved. We first use the branch-and-bound (BnB) method to obtain the globally optimal solution, and then develop two low-complexity algorithms based on the greedy method and the convex relaxation, respectively, to find suboptimal solutions with high quality in practice. Via numerical results, it is shown that the proposed NOMA-based computation offloading design significantly improves the energy efficiency of the multiuser MEC system as compared to other benchmark schemes. It is also shown that Part of this paper has been presented at on the corresponding task models [5]. For example, partial offloading and binary offloading are two widely adopted computation offloading models in the MEC literature [5], in which the tasks at each user are fully partitionable and non-partitionable, respectively. Next, the performance optimization of computation offloading in MEC systems critically relies on the joint design of both communication and computation resource allocations [7]- [11]. For example, consider that one BS serves one single actively-computing user. In order to minimize the energy consumption for task execution, it is crucial for the user to jointly optimize the communication power (for offloading) and the central processing unit (CPU) frequencies for local computing to balance their energy consumption tradeoff. For the case of partial offloading, the user needs to properly partition the computation task into two parts for offloading and local computing, respectively; for the case of binary offloading, the user needs to properly choose the operation mode between offloading and local computing for energy minimization. Furthermore, future wireless networks are expected to consist of massive IoT devices, and each BS generally needs to ...