Quickstart with Hadoop & Hadoop on Azure


There was an awesome Hadoop session last night (sponsored by Microsoft) that explains the basic of Hadoop and then goes into Hadoop on Azure. If you are remotely interested in cloud, big data, or even Business Intelligence, it's an interesting watch.

Recording can be found here https://skydrive.live.com/redir.aspx?cid=9f92dce9238e14a8&resid=9F92DCE9238E14A8!154&parid=root Get it while it's hot

Some gems that I note down from the session:

  • RDBMS support giga to tera, Big Data peta to exa
  • Typical DBA:RDBMS ratio 1:40,
    DBA:Haddop ratio 1:3,000
  • BI pattern: extract, structure and store ‘important information’
    Big Data: ‘store it all’, maybe information becomes important ‘later’. Structure only on demand.
  • Common pattern: store all data in Hadoop. structure them on demand for serving more traditional BI tools
  • Hadoop on Azure is "naturally" integrated with Windows Azure Blob Storage but can be configured to use other storage providers like Amazon S3 (for migration)
  • With PowerPivot, one can easily pull in Hadoop/Hive map reduce result into Excel