Tag Archives: michael-noll.com

Discussion in LinkedIn group on load testing Hadoop with large public domain datasets

Discussion thread on LinkedIn Group:

Benchmarking and Stress Testing an Hadoop Cluster With TeraSort, TestDFSIO & Co.

Project Gutenberg (approximately 30,000 books)

Wikipedia (full download)

Datasets available through Amazon, such as the Human Genome Project and US Census Database