Coordination of distributed applications through Zookeeper

When I started to work in scope of Big Data, one of the most challenging things for me was to understand the distributed nature of Hadoop applications. Usually most part of software developers think about their products in terms of single program components which are represented by standalone applications. Every program is an independent monolith unit which is located in own machine, runs in its own process and responsible for custom range of tasks. Big data introduces another level of composition, where every single program can be distributed across the nodes of the cluster as set of standalone services driven by some master service. The development of such model creates new challenges related to the synchronization these services and handling the consistent state of the overall application. This problem is common for every distributed system and instead of reinventing custom solution for each particular product of Hadoop family, community created a universal tool called Zookeeper. In this article I want to give you overview of this application and show some examples of working with it.

Continue reading “Coordination of distributed applications through Zookeeper”