The question posed at this week’s San Francisco Hadoop User Group is a common one: “what factors justify the use of an Apache Hadoop cluster vs. traditional approaches?” The answer you receive depends on who you ask.
Relational database authors and advocates have two criticisms of Hadoop. First, that most users have little need for Big Data. Second, that MapReduce is more complex than traditional SQL queries.
Both of these criticisms are valid.
In a post entitled “Terabytes is not big data, petabytes is,” Henrik Ingo argued that the gigabytes and terabytes I referenced as Big Data did not justify that term. He is correct. Further, it is true that the number of enterprises worldwide with petabyte scale data management challenges is limited.
MapReduce, for its part, is in fact challenging. Challenging enough that there are two separate projects (Hive and Pig) that add SQL-like interfaces as a complement to the core Hadoop MapReduce functionality. Besides being more accessible, SQL skills are an order of magnitude more common from a resource availability standpoint.
Hadoop supporters, meanwhile, counter both of those concerns.
To Continue Reading: Click Here
-------------------------------------------
Source: RedMonk
By Stephen O'Grady
Tuesday, January 18, 2011
Subscribe to:
Post Comments (Atom)

0 comments:
Post a Comment