Fast Analysis of Sensor Data over MapReduce using Spark
Keywords:
big data, Resilient Distributed Datasets, Spark, MapReduce, HadoopAbstract
Big data analysis is emerging rapidly due to the tremendous volume of data, velocity at which the data is
flowing in the organizations and the variety of data. In recent years due to the spurt in Internet of Things (IoT), data
generated by the sensors is growing exponentially thus transforming into big data. Thus data collection, processing and
extracting useful information from such increasing high velocity and high volume of sensor data poses a challenge for
the researchers. Apache Spark is an open source, a general purpose engine for rapid large -scale data processing. To
overcome the data replication and disk I/O overhead of sharing data between parallel operations in Hadoop, Spark uses
the primitive called Resilient Distributed Datasets (RDD’s) which provides the programmers a fault tolerant and in -
memory data storage across cluster nodes without replication that increases the processing speed of the applications to
several magnitudes. We propose a method to analyze the sensor data using the Spark.