I learned Amazon EMR ( Elastic Map Reduce ) which is AWS big data hadoop as a service offering. As product manager, I rallied our development team to port our existing on-prem big data protector product to AWS. It was good learning curve. You can find more details about EMR here. I appreciated AWS cloud features lot and went ahead to prepare for AWS Certification examination. I am glad to let you know that I passed the exam with 82% score. Here is link to my AWS Cloud certification examination. Here is my aws_certified_cloud_practitioner_certificate.pdf
Like EMR, Google Cloud also has Hadoop offering. It is called Google DataProc. Because of customer demand, I had to learn it and rallied my team again to port our on-prem product to Google Cloud. We will soon support Google DataProc 1.2 which is built on Apache Hadoop 2.7. Google Dataproc is still not polished as AWS EMR. But, I am confident that offering will catch up. You can find details here.
Last but not least, I went ahead and learned Azure HDInsight which is Hadoop offering by Microsoft AzureCloud. We are observing Azure demand rising and have few customer requests for same. Being a Microsoft Certified Professional ( MCP ), I signed up for AzureFree trial with following login.
{
"cloudName": "AzureCloud",
"id": "3056023d-31ed-43cd-b5d2-3dca30f14421",
"isDefault": true,
"name": "Free Trial",
"state": "Enabled",
"tenantId": "b53ed3af-54d8-4618-bdb2-ecd92fcfc552",
"user": {
"name": "[email protected]",
"type": "user"
}
}
]
I went ahead and built a cluster using Azure HDInsight ( Linux ) which is a fully-managed cloud service that makes it easy, fast, and cost-effective to process massive amounts of data. Use popular open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, R & more. Azure HDInsight enables a broad range of scenarios such as ETL, Data Warehousing, Machine Learning, IoT and more. I also created SQL data warehouse cluster and Azure data lake store to share data among three clusters. Documentation guided me. You can find HDInsight details here.
In summary, this year has been a BIG DATA CLOUD year for me. All three major public cloud providers have built infrastructure to meet rising Hadoop workload in cloud. It is up to you to choose an offering. With learning and motivation, you can use any of the cloud offerings for big data and be successful. I would recommend you start with AWS and then move on to Google Cloud and Azure.
Wishing you happy holidays and very successful 2018.