1 BUILD BIG DATA ENTERPRISE SOLUTIONS FASTER ON AZURE HDINSIGHTPranav Rastogi Microsoft @rustd
2 Agenda Azure HDInsight Overview HDInsight Application PlatformBig data solutions
3 Big data is hard Success Buy Servers Install OSS Secure ConfigureOptimize Debug Scale up
4 HDInsight makes it easy100% open source Optimized Highly available Secure Scalable Dedicated Managed Certified ISVs Customizable Browse to Azure Portal Provide Cluster details HDInsight Cluster
5 Open Source for the EnterpriseMicrosoft Build 2017 12/9/ :20 PM Open Source for the Enterprise Managed Open Source Analytics for the cloud with a 99.9% SLA. 100% Open Source Clusters up and running in minutes 63% lower TCO than deploy your own Hadoop on-premises Separation of compute and store allows you to scale clusters to exponentially reduce costs © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
6 Deploy Globally Within MinutesMicrosoft Build 2017 12/9/ :20 PM Deploy Globally Within Minutes Multi Region Availability Available in >25 regions world-wide Launched most recently in US West 2, and UK regions Available in China, Europe and US Government clouds © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
7 Security and Compliance to Enable OSS for EnterprisesMicrosoft Build 2017 12/9/ :20 PM Security and Compliance to Enable OSS for Enterprises Authentication Azure Active Directory Kerberos authentication Perimeter Level Security Virtual Networks Network Security Groups (firewalls) Authorization Apache Ranger RBAC for Admin POSIX ACLs for Data Plane Data Security Server-Side encryption at rest HTTPS/TLS In-transit © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
8 Rich Developer EcosystemMicrosoft Build 2017 12/9/ :20 PM Rich Developer Ecosystem Plugins for HDI available for most popular IDEs for agile development and debugging Rich support for powerful notebooks used by data scientists Develop in C#, deploy on Linux in Java via HDI developed SCP.Net technology © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
9 Recognized by Top AnalystsForrester Wave for Big Data Hadoop Cloud Named industry leader by Forrester with the most comprehensive, scalable, and integrated platforms* Recognized for its cloud-first strategy that is paying off* *The Forrester WaveTM: Big Data Hadoop Cloud Solutions, Q
10 HDInsight Application Platform1-click deploy experience for apps
11 HDInsight Application PlatformDiscover and install apps from ecosystem One-click deploy experience Access to entire cluster and secure by default Install apps on new or existing clusters Ease of authoring and deployment for ISV’s
12 HDInsight architectureGateway nodes Head Worker Edge Zookeeper nodes Hive meta store Azure SQL database Azure Storage or Data Lake Store Client machines HDInsight cluster
13 StreamSets – Conquer dataflow chaos
14 Why StreamSets Data Movement from disparate sourcesKPI’s on data transfer Manage Data drift (unexpected changes) at scale. Run at enterprise grade & scale on HDInsight
15 Deep Integration with Microsoft AzureSTREAMSETS DATA COLLECTOR Deep Integration with Microsoft Azure Information Management Big Data Stores Machine Learning and Analytics Intelligence Data Sources People * Machine Learning Cognitive Services Data Lake Store Data Lake Store SQL Data Warehouse Data Lake Analytics Bot Framework Apps Web Mobile Bots Data Catalog Apps * HDInsight (Hadoop and Spark) Event Hubs Cortana * Kafka for HDInsight Stream Analytics Dashboards & Visualizations Sensors and devices * Automated Systems Power BI Intelligence Data Action *
16 Datameer: Self-Service Data prep/ analytics
17 Why Datameer Self-service data prep/ analytics for non - Hadoop usersNo Hadoop expertise required Ease to use excel-like interface for analysts Run at enterprise grade & scale on HDInsight
18 AtScale (OLAP on Hadoop)No data movement
19 Why AtScale No data movement for BI scenarios Use any BI toolScale out OLAP with Hadoop & Spark No Hadoop/ Spark expertise required Target audience: Analysts Run at enterprise grade & scale on HDInsight
20 Dataiku – Data science at Enterprise grade scale
21 Why Dataiku Collaborative data science experienceSingle tool for data scientists/ beginner analyst data ingestion (all data types, sizes, format etc.), data preparation, data processing, machine learning, visualization and operationalizing the solution. Run at enterprise grade & scale on HDInsight
22 H2O.ai– Deep learning at Enterprise grade scale
23 Why Open Source deep learning platformRun at enterprise grade & scale on HDInsight
24 Cask – Unified Platform for Big data
25 Enabling Customers’ Journey from Data Ingestion to ActionCask and Microsoft: Enabling Customers’ Journey from Data Ingestion to Action Ingest, explore & serve data Enable governed self-service Capture, track and analyze (meta)data Develop, test, debug & deploy apps Information Management Event Hubs Data Catalog Data Factory Big Data Stores Machine Learning and Analytics Intelligence Dashboards & Visualizations Cortana Bot Framework Cognitive Services Power BI Data Sources People Machine Learning Data Lake Store SQL Data Warehouse Data Lake Analytics Apps Web Mobile Bots Apps HDInsight (Hadoop and Spark) Stream Analytics Sensors and devices Automated Systems Data Intelligence Action
26 Azure HDInsight Big Data made easyAnalytics on any data, any size Easier and more productive for all users Enterprise-ready
27 Resources Azure HDInsight application platformInstall custom HDInsight applications Try SDC from StreamSets on HDInsight Try Datameer on HDInsight Try AtScale on HDInsight Try DSS from Dataiku on HDInsight Try H2O.ai on HDInsight Try CDAP from Cask on HDInsight
28 Thank You Pranav Rastogi Microsoft @rustd