I continue writing your about my journey to the world of Big Data and today I want to talk about one of the most popular Hadoop frameworks called Spark. Though Spark has one of the largerst army of contributors and defacto at this moment it is the most popular tool for processing large volumes of data in Hadoop, still sometimes it could be quite challenging to find certain information regarding different parts of this product. Last few months I had to do some deep diving into one of the key components of Spark application called Receiver. This is a starting point of any streaming flow implemented using this framework. That is why it is very important to know how it works and to understand the ways for its proper configuration and tuning. In this article I’d like tell you about how Spark receiver works and to share some knowledge regarding its configuration and tuning.
Continue reading “Spark receiver as a key part of your streaming application”