Groupon: The Latest Hot Company to Implement Hadoop/Cloudera for Big Data Tasks

by Sam Dean - Apr. 19, 2011Comments (0)

We've covered the open source Apache project Hadoop before from many angles, and it continues to make its way into many enterprises and smaller businesses who want to sift and analyze large data sets. We've also covered Cloudera, a startup that focuses on support and services surrounding Hadoop. Now, Cloudera has announced that Groupon--the hot daily deals site--is using its Cloudera Distribution for Apache Hadoop (CDH) to get more value out of the massive data sets it maintains. It's yet another sign of Hadoop's success as a cutting-edge, sophisticated open source phenomenon.

Groupon has more than 70 million registered users in more than 500 global markets, and is a very fast-rising company that has inspired a lot of copycats. According to a statement from Cloudera:

"Data is one of Groupon's most strategic assets; Groupon relies on information from both vendors and customers to make daily deal transactions run smoothly. Prior to deploying CDH, Groupon realized that they needed better ways to organize and make sense of the data generated by their massive user base for the long term."

Groupon is hardly the only company that is sending Hadoop after Big Data tasks. Hadoop can be thought of as an open source variation of MapReduce, Google's secret weapon in sifting and mining huge data sets. Businesses are interested in it for data mining and other applications, where it consistently yields diamonds in the rough found in massive amounts of data. Organizations ranging from Yahoo to The New York Times have found mission-critical tasks for it.

"We were eager to try Hadoop based on the technology's promise to make sense of massive amounts of data, and it hasn't disappointed," said Mark Johnson, chief data officer, Groupon, in a statement. "Cloudera's distribution and support have been instrumental in helping Groupon deliver on our goal to be a technology leader." 

As Hadoop grows in significance, it is good to see a commercially viable company like Cloudera backing it with support, and with its very own Hadoop distribution. Cloudera has helped Groupon implement Hadoop in such a way that it can feed data sets that it has sifted into relational database frameworks designed to simplify access to key customer and business-focused data. "Groupon will use Hadoop as a staging area for all of their extreme data," according to the Cloudera/Hadoop announcement. 

Only a few years ago, Hadoop was an open source obscurity that many organizations wouldn't have understood well, but it is quickly becoming mainstream. It's a true open source success story.


Shailesh Patel uses OStatic to support Open Source, ask and answer questions and stay informed. What about you?


Share Your Comments

If you are a member, to have your comment attributed to you. If you are not yet a member, Join OStatic and help the Open Source community by sharing your thoughts, answering user questions and providing reviews and alternatives for projects.

Promote Open Source Knowledge by sharing your thoughts, listing Alternatives and Answering Questions!