Recent developments of popular programming models, namely MapReduce, have raised the interest of running MapReduce applications over the large scale Internet. However, current data distribution techniques used in Internet wide computing platforms to distribute the high volumes of information, which are needed to run MapReduce jobs, are naive, and therefore need to be re-thought.Thus, we present a computing platform called SCADA-MAR that runs MapReduce jobs over the Internet and provides two new main contributions: i) improves data distribution by using the BitTorrent protocol to distribute all data, and ii) improves intermediate data availability by replicating tasks or data through nodes in order to avoid losing intermediate data and consequently preventing big delays on the MapReduce overall execution time.Along with the design of our solution, we present an extensive set of performance results which confirm the usefulness of the above mentioned contributions, improved data distribution and availability, thus making our platform a feasible approach to run MapReduce jobs.