ezmob_search

adport_banner728x90

coinads

7search

zerads_728x90

ayelads_728x90

multiwall_728x90

Search This Blog

Friday 28 February 2014

10 Essential Hadoop Tools To Work With Big Data!

Hadoop is no longer just a teenie-weenie stack of code. The future, friends, is indeed 'big'!
Over the years, Hadoop has grown to be of prime importance, so useful that a humongous collection of projects now orbit around it. So much so that Hadoop is no longer just a teenie-weenie stack of code. The Hadoop community is evolving rapidly each day, and more conveniently newer set of tools are being added every now and then to aid the same.Hadoop Tools, Big Data, Hive, Sqoop, Pig, Avro, Oozie, GIS tools, Flume, SQL on Hadoop, Clouds








1. Hadoop

-Java-based Hadoop synchronises other worker nodes in executing a function on data stored locally. Results are then aggregated and reported: map and reduce.

-While programmers concentrate on writing code for analysing data, Hadoop handles the rest by providing thin abstraction over local data storage. 

-Designed to work around faults by individual machines.

-The code is available here

2. Hive

-Regularises the process of extracting bits from all of the files in HBase. 

-Offers SQL-like language and pulls out snippets from files.

-Turns standard format data into a stash for querying.

-The code is available here

3. Sqoop

-Command-line tool that moves large tables full of information for Hive or HBase.

-Controls mapping between the tables and the data storage layer and translating the tables into a configurable combination for HDFS, HBase, or Hive.

-The code is available here

4. Pig

-Running code written in its own language, called Pig Latin, Pig steers users toward algorithms that are easy to run in parallel across the cluster.

-Functions include averaging data, working with dates, or finding differences between strings.

-The code is available here.

5. Avro

-Avro bundles the data together with a schema for understanding it.

-Comes with a JSON data structure explaining how the data can be parsed.

-The code is available here.

6. Oozie

-Starts multiple Hadoop jobs stemming out from a single job, in the right sequence.

-Manages a workflow specified as a DAG (directed acyclic graph).

7. GIS tools

-The GIS (Geographic Information Systems) tools for Hadoop are Java-based tools for understanding geographic information to run with Hadoop. 

-The code is available here.

8. Flume

-Flume dispatches 'agents' to gather information to be stored in the HDFS.

-These agents are triggered by events and can be chained together.

-The code is available here.

9. SQL on Hadoop

-All of these offer a faster path to answers, for instance an ad-hoc query of data on a huge cluster. Of course you could write a ne Hadoop job for the same, however that can be time consuming. Further with SQL answers are provided in simpler language. 

-Some of them include: HAWQ, Impalla, Drill, Stinger, and Tajo.

10. Clouds

-Companies like Amazon, are adding another layer of abstraction by accepting just the JAR file filled with software routines. Everything else is then done by the cloud.

Featured post

What should students, parents, and teachers know about AI?

AI education will help people understand the risks, limitations, and opportunities Former judge Kay Firth-Butterfield began to think about h...

multiwall_300x250

multiwall_426x240