Fast Analysis of Sensor Data over MapReduce using Spark

Authors

  • Mansi Shah M. Tech. Scholar, Computer Science and Engineering Department, N.S.I.T, Jetalpur, Gujarat
  • Vatika Tayal Assistant Professor, Computer Science and Engineering Department, N.S.I.T, Jetalpur, Gujarat

Keywords:

big data, Resilient Distributed Datasets, Spark, MapReduce, Hadoop

Abstract

Big data analysis is emerging rapidly due to the tremendous volume of data, velocity at which the data is
flowing in the organizations and the variety of data. In recent years due to the spurt in Internet of Things (IoT), data
generated by the sensors is growing exponentially thus transforming into big data. Thus data collection, processing and
extracting useful information from such increasing high velocity and high volume of sensor data poses a challenge for
the researchers. Apache Spark is an open source, a general purpose engine for rapid large -scale data processing. To
overcome the data replication and disk I/O overhead of sharing data between parallel operations in Hadoop, Spark uses
the primitive called Resilient Distributed Datasets (RDD’s) which provides the programmers a fault tolerant and in -
memory data storage across cluster nodes without replication that increases the processing speed of the applications to
several magnitudes. We propose a method to analyze the sensor data using the Spark.

Published

2015-05-25

How to Cite

Mansi Shah, & Vatika Tayal. (2015). Fast Analysis of Sensor Data over MapReduce using Spark. International Journal of Advance Engineering and Research Development (IJAERD), 2(5), 1071–1075. Retrieved from https://ijaerd.com/index.php/IJAERD/article/view/1139