Skip to content

Latest commit

 

History

History
30 lines (17 loc) · 1.22 KB

hadoop.md

File metadata and controls

30 lines (17 loc) · 1.22 KB

Hadoop Ecosystem

Choosing between HDInsight, various 3rd-party Hadoop distributions for IaaS, and other Hadoop ecosystem decisions.

Disclaimer - this is a work in progress.

Note: This guidance is still under development

###Questions to think about:

  • When should you strongly consider using HDInsight instead of managing my own Hadoop cluster?

  • Are you planning to run your Hadoop cluster in the cloud, on-premises, or both/hybrid?

  • Will your Hadoop cluster be always-on or will you use it periodically?

  • Where will you store the data? (e.g. on premises, in Azure blob storage, in another cloud)

  • What are your performance targets? Data volume, number of nodes, etc.

  • What are you security requirements?

  • Are you coming from a Windows or Linux background?

  • If you want to run and manage my own Hadoop cluster, how do you decide which Hadoop distribution to use (e.g. Cloudera, Hortonworks, or MapR)?

  • What are the key differentiators between different Hadoop distributions? For example, MapR offers its own unique file system.

  • When should you use Hive and when should you look into Spark?