Distributed Data Processing on Microcomputers with Ascheduler and Apache Spark

Vladimir Korkhov, Ivan Gankevich, Oleg Iakushkin, Dmitry Gushchanskiy, Dmitry Khmel, Andrey Ivashchenko, Alexander Pyayt, Sergey Zobnin, Alexander Loginov

Modern architectures of data acquisition and processing often consider low-cost and low-power devices that can be bound together to form a distributed infrastructure. In this paper we overview possibilities to organize a distributed computing testbed based on microcomputers similar to Raspberry Pi and Intel Edison. The goal of the research is to investigate and develop a scheduler for orchestrating distributed data processing and general purpose computations on such unreliable and resource-constrained hardware. Also we consider integration of the scheduler with well-known distributed data processing framework Apache Spark. We outline the project carried out in collaboration with Siemens LLC to compare different configurations of the hardware and software deployment and evaluate performance and applicability of the tools to the testbed.

  title={Distributed Data Processing on Microcomputers with Ascheduler and Apache Spark},
  author={Vladimir Korkhov and Ivan Gankevich and Oleg Iakushkin and Dmitry Gushchanskiy and Dmitry Khmel and Andrey Ivashchenko and Alexander Pyayt and Sergey Zobnin and Alexander Loginov},
  howpublished={Proceedings of ICCSA'17},
  editor={Gervasi, Osvaldo and Murgante, Beniamino and Misra, Sanjay and Borruso, Giuseppe and Torre, Carmelo M. and Rocha, Ana Maria A.C. and Taniar, David and Apduhan, Bernady O. and Stankova, Elena and Cuzzocrea, Alfredo},

Publication: Proceedings of ICCSA'17
Publisher: Springer