Tag Archives: wiki.apache.org/hadoop

Use Cases for Hadoop

The Apache “Powered by Hadoop” page lists a long list of companies that use Hadoop. Some only list the company name. Others have a sentence or two about what they’re using Hadoop for. And some, like LinkedIn, list the specs of the hardware that they have Hadoop running on. There’s also a link to a great article about how the NYTimes used Hadoop  (in 2007!!!) running on Amazon’s AWS cloud to generate PDFs of 11 million articles in 24 hours running on 100 machines. One of the things I find interesting about the NYTimes use case is that they used Hadoop for a one time batch process. A lot what we read about Hadoop assumes that the use case is an ongoing, maybe multi-year application.