Scheduling jobs in Hadoop through Oozie

One of the common problems which software engineers can meet at different stages of application development are the tasks relating to the scheduling of jobs and processes on periodical bases. For this purpose Windows OS family provides a special component called Task Scheduler. Linux world proposes its own alternative approach – embedded daemon called Cron. Hadoop distributed ecosystem which is working on the top of mentioned underlying operational systems introduces another set of challenges related to scheduling problem different from the typical tasks. Here we need to deal with a category of jobs which are running though a number of physical machines and are flowing between Hadoop services. In order to simplify the implementation of such workflows Big Data introduces a special component called Oozie. In this article I would like to give you an overview of this product.

Continue reading “Scheduling jobs in Hadoop through Oozie”