TECHNOSAVVIE: 10 Essential Hadoop Tools To Work With Big Data!

Hadoop is no longer just a teenie-weenie stack of code. The future, friends, is indeed 'big'!

Over the years, Hadoop has grown to be of prime importance, so useful that a humongous collection of projects now orbit around it. So much so that Hadoop is no longer just a teenie-weenie stack of code. The Hadoop community is evolving rapidly each day, and more conveniently newer set of tools are being added every now and then to aid the same.

1. Hadoop

-Java-based Hadoop synchronises other worker nodes in executing a function on data stored locally. Results are then aggregated and reported: map and reduce.

-While programmers concentrate on writing code for analysing data, Hadoop handles the rest by providing thin abstraction over local data storage.

-Designed to work around faults by individual machines.

-The code is available here.

2. Hive

-Regularises the process of extracting bits from all of the files in HBase.

-Offers SQL-like language and pulls out snippets from files.

-Turns standard format data into a stash for querying.

-The code is available here.

3. Sqoop

-Command-line tool that moves large tables full of information for Hive or HBase.

-Controls mapping between the tables and the data storage layer and translating the tables into a configurable combination for HDFS, HBase, or Hive.

-The code is available here.

4. Pig

-Running code written in its own language, called Pig Latin, Pig steers users toward algorithms that are easy to run in parallel across the cluster.

-Functions include averaging data, working with dates, or finding differences between strings.

-The code is available here.

5. Avro

-Avro bundles the data together with a schema for understanding it.

-Comes with a JSON data structure explaining how the data can be parsed.

-The code is available here.

6. Oozie

-Starts multiple Hadoop jobs stemming out from a single job, in the right sequence.

-Manages a workflow specified as a DAG (directed acyclic graph).

7. GIS tools

-The GIS (Geographic Information Systems) tools for Hadoop are Java-based tools for understanding geographic information to run with Hadoop.

-The code is available here.

8. Flume

-Flume dispatches 'agents' to gather information to be stored in the HDFS.

-These agents are triggered by events and can be chained together.

-The code is available here.

9. SQL on Hadoop

-All of these offer a faster path to answers, for instance an ad-hoc query of data on a huge cluster. Of course you could write a ne Hadoop job for the same, however that can be time consuming. Further with SQL answers are provided in simpler language.

-Some of them include: HAWQ, Impalla, Drill, Stinger, and Tajo.

10. Clouds

-Companies like Amazon, are adding another layer of abstraction by accepting just the JAR file filled with software routines. Everything else is then done by the cloud.

Pages

ezmob_search

coinads

7search

zerads_728x90

multiwall_728x90

7search text

ads-bitcoin_728x90

mondiad_728x90

adbits_728x90

advertica_728x90

Search This Blog

Friday, 28 February 2014

10 Essential Hadoop Tools To Work With Big Data!

leadsleap

ezmob_inpage

Featured post

What should students, parents, and teachers know about AI?

multiwall_300x250

ads-bitcoin_300x250

parteners_house_3

hilltop_multitagban_1

coinads

Multiwall_320x180

hilltop_multitaginp

parterners.house

hilltop_popund

hilltop_vids

cpx24_tabup

hilltop_multitagban_2

adz

Multiwall_534x300

hilltop_inppush

7search_popund

7search_push

coinads

popads_online

advertica_redirect