DriveScale Technical Deep Dive Presentation

1 DriveScale Technical Deep Dive PresentationSoftware Def...
Author: Oliver McKenzie
0 downloads 4 Views

1 DriveScale Technical Deep Dive PresentationSoftware Defined Infrastructure for Hadoop and Big Data DriveScale Technical Deep Dive Presentation April 24, 2017

2 Presentation OverviewDriveScale Solution Components Solution Architecture Minimal Proof of Concept Requirements Performance Testing results (SDI vs DAS) ©2017 DriveScale Inc. All Rights Reserved.

3 DriveScale Solution Overview and Components©2017 DriveScale Inc. All Rights Reserved.

4 DriveScale Components shown in typical rack deploymentCloud Software Hardware Top of Rack Switches: port 10GbE (Cisco, Arista, HPE, Dell, Quanta, etc) DSC DriveScale Central 1 per customer Customer Support Remote upgrade Remote licensing DMS DriveScale Management System 1-3 per customer Linux RPM Runs on VMs Inventory Cluster config Node config Out-of-band Management Switch: 1GbE Node Server Compute Pool Rack Servers: 2U, 1U, 1/2U or 1/4U Dell, HPE, Cisco, SuperMicro, Quanta, Foxconn, etc DriveScale Adapter Ethernet to SAS Bridge 1 per JBOD DSN DriveScale Server Node Agent 1 on each node Inventory discovery Storage Pool – JBOD (Just a Bunch of Drives) Dell, HPE, SuperMicro, Sanmina, Quanta, Foxconn, etc 4

5 The DriveScale System DriveScale Adapter (DSA)Highly Automated Infrastructure Provisioning and Management DriveScale Adapter (DSA) DriveScale Agent DriveScale Management System (DMS) DriveScale Cloud Central (DSC) DriveScale Agent DriveScale Server Node (DSN)

6 DriveScale Solution Components DescriptionThe 4 principal Components: 1. DriveScale Management Server (DMS) Data repository consists of: Inventory (DMS’s, DS Adapters, Switches, JBOD Chassis, Disks, Server Nodes) Configuration (Node Templates, Cluster Templates, Configured Clusters) Typical deployment consists of 3 DMS Systems DMS Database is used as a message bus to communicate with the end points 2. DriveScale Adapter (DSA) DSA Agent Discovery provides inventory for hardware Creates mappings for Server Nodes to consume disks 3. DriveScale Server Node (DSN) DS Server Agent provides inventory for server hardware Consumes mapped disks via DSA 4. DriveScale Central (DSC) Cloud-based Portal where DriveScale repos are stored for software distribution to subscribers.

7 The DriveScale AdapterEnables SAS connected drives to be mounted over Ethernet 4 DriveScale Ethernet to SAS Adapters in 1u Chassis Dual Redundant Power Supplies With 80 Gb throughput, a single chassis can comfortably support simultaneous access to 80 drives w/ equivalent performance to Direct Attached Storage 2x 10GbE Interfaces per Adapter 2x 12Gb 4 Lane SAS Interfaces per Adapter The DriveScale system that performs the first part of the magic trick, is a 1U hardware appliance that contains four distinct and separate adapters that connect to servers on one side via 10GbE Ethernet ports, and to JBOD’s on the other side with standard SAS ports. The hardware is designed to be highly resilient with dual power supplies and a passive backplane. The adapters have out of band management ports as well and can comfortably support simultaneous access to 80 drives in JBODs with equivalent performance to direct-attached storage.

8 DriveScale Adapter Component FunctionsDriveScale Proprietary Information © 2016 DriveScale Adapter Component Functions DriveScale Adapter is broken into two major components DSA Dataplane Processor XLP-II 208 Chipset 2 x 12Gb SAS interfaces (LSI SAS HBA) – Provides SAS connectivity between JBOD(s) and DSA 2 x 10GbE – Provides network conduit for the disks and compute elements of the composed servers DSA Supervisor Provides baseboard management for each adapter within a DSA chassis. Location where major firmware updates are staged for each DSA Based on a TI AM3352 Chipset Both DSA Dataplane Processor and Supervisor run Debian 2.7 OS DMS provides update mechanisms for DSA functions: Majority of software updates Debian package updates via apt hosted software repository Major firmware upgrades will require a reflash of the Debian OS hosted on the DSA

9 DriveScale Management Server Components

10 DriveScale Inventory and DiscoveryDriveScale Agent Common Inventory (DMS, Server Nodes) Server Details Server (Manufacturer, Model, Serial No., UUID, BIOS) Processor Details (Make, Model, Arch., Cores, Frequency) Memory Details (Total Memory, Total Swap, etc) Network (MAC, IP Addr., MTU, Speed, BMC / IPMI Details) Connectivity Details (Port mappings, Upstream Switch information, Network Segments) Operating System (OS Version, Kernel Version, Arch.) Server Node Specific Inventory Software Details DS Software Details (DS SW Version) Ingested Drive Details iSCSI IQN Details (UUID, Target IP / Port mapping for each ingested disk and associated paths) DriveScale Adaptor Inventory DSA Details Server (Manufacturer, Model, Serial No., UUID, BIOS) Processor Details (Make, Model, Arch., Cores, Frequency) Memory Details (Total Memory, Total Swap, etc) Network (MAC, IP Addr., MTU, Speed, BMC / IPMI Details) Connectivity Details (Port mappings, Upstream Switch information, Network Segments) Operating System (OS Version, Kernel Version, Arch.) DS Software Details (DSA SW Version) Storage Details (SCSI Adapters / Driver Details, SAS Connectivity Details, Destination IP / Port mapping for each disk being served) JBOD Details JBOD - Manufacturer, Model, Serial No., UUID, BIOS SAS Controller (SAS Addr., Port Details, Port to Slot Mappings) Disks (Vendor, Model, Serial No., Form Factor, Speed, Capacity) DriveScale Proprietary Information © 2016

11 DriveScale Solution Architecture©2017 DriveScale Inc. All Rights Reserved.

12 DriveScale PoC physical network Adapter 4 Adapter 3 Adapter 2DSA Management Plane Switch Switch 1 Switch 2 MLAG DMS & Ambari VM Server Data Node 4  Controller 2 Controller 1 Data Node 3 Data Node 2 1GE (mngt) Data Node 1 10GE SAS Name Node JBOD

13 CD Director or HD CloudBreak orDriveScale API Communication Flow Drivescale Central CD Director or HD CloudBreak or 3rd party deployer Kerberos/ LDAP HTTPS HTTPS HTTPS (80) SSH (22) HTTPS (443) Zookeeper (2281 TCP) HTTPS (8444,443) Zookeeper (2281 TCP) HTTPS (8443) Zookeeper (2888, 3888 TCP) MongoDB (27017) HTTPS HTTPS/SSH SAS JBODs iSCSI (wide port range) Ethernet HTTP (38202)

14 DriveScale Proprietary Information © 2017Communication Flow User -> DMS - Management interface access (HTTPS , HTTP redirect to HTTPS on 80) if using haproxy for HA, then HTTPS of individual server (not haproxy) is on 8443 User -> Adapter - Network configuration - HTTPS Adapter -> DMS - software updates (HTTPS on 8444) DMS -> DriveScale Central - software updates (HTTPS) DMS -> DriveScale Central - statistics / logs upload (HTTPS) DMS -> DMS - haproxy health check (HTTPS on 8443) DMS -> DMS - VRRP for HA setup DMS -> DMS - Zookeeper replication (2888, 3888 TCP) DMS -> DMS - MongoDB replication + access (27017 TCP) Server/Adapter/DMS -> DMS - Zookeeper access (2181 TCP) Server -> Adapter - iSCSI (wide port range, one iSCSI portal per drive used) Server -> Adapter, HTTP on alternate port for balancing (Live Data load Monitoring Reported from Adapter). Server -> DriveScale Central - software install (HTTPS) Server/Adapter -> DMS Log Report (HTTPS) DriveScale Proprietary Information © 2017

15 DriveScale Adapter Management Network TopologyDSA Systems Management Network designed to be fully fault tolerant via onboard switching All four DSA adapters can communicate with each other via embedded network switching within DSA Chassis Only one Sys Mgmt port required to be connected during deployment Best practice is to utilize cabling multiple Sys Mgmt ports for management network redundancy Management Plane Switch Internal Switching between controller XLP controller 4 Controller 3 Controller 2 Controller 1 DriveScale Proprietary Information © 2016

16 DSA / JBOD Connectivity ScenariosDSA provides flexibile cabling of JBODs Base Config – Single DSA dual connected to a single or multiple controller(s) on JBOD Redundant Config – Two DSAs dual connected to a single / multiple controllers on JBOD High IO / Redundant Config – Multiple DSAs connected to multiple controller on JBOD Daisy Chaining JBOD(s) – DriveScale provides support for more than one JBOD to be daisy chained behind a single DSA chassis. Base Config: Single Device Controller Redundant Config: Dual Device Controller High IO, Redundant config: Quad Device Controller  Adapter 4 Adapter 3 Adapter 2 Adapter 1 Adapter JBOD

17 DriveScale Disk to Server MappingWhat the DSA sees in its device tree: dsa # ls -al /dev/drivescale | grep iqn com.drivescale:wwn:0x5000c50058d4d533 ... sda -> iqn com.drivescale:wwn:0x5000c50058d4d533 ... sdb -> iqn com.drivescale:wwn:0x5000c50058d4d533 ... mpatha -> iqn com.drivescale:wwn:0x5000c50058d4d533 ... iqn com.drivescale:wwn:0x5000c50058d4d533 -> /dev/mapper/mpatha ... info.iqn com.drivescale:wwn:0x5000c50058d4d533    Server Node Server – eth2 IP: Server – eth1 IP: SCSI Device to IQN Mapping 10GbE Switch (Data Plane) Multipath Device to IQN Mapping Physical Disk Device Details Served IQN: iqn com.drivescale:wwn:0x5000c50058d4d533 {"size": , "rotationRate": "7200", "product": "ST1000NM0023", "vendor": "SEAGATE", "uid": "iqn com.drivescale:wwn:0x5000c50058d4d533", "formFactor": "3.5", "serial": "Z1W2AT5D0000C445GLG4", "wwn": "5000c50058d4d533", "isRotational": "1", "revision": "0003"} DSA - 10gPort1 IP: iSCSI Target Port: 1001 DSA - 10gPort2 IP: iSCSI Target Port: 1001 How the DSA maps the IQN to a destination server node: # cat /var/lib/drivescale/portals iqn com.drivescale:wwn:0x5000c50058d4d533 P iqn com.drivescale:wwn:0x5000c50058d4d533 S Adapter 1 Single Adapter in DSA Chassis Multipath Device: mpatha Disk IQN Port Destination IP DSA Source IP Disk: sda Disk: sdb What the Server Node sees in its device tree: # ls -al /dev/drivescale | grep iqn com.drivescale:wwn:0x5000c50058d4d533 ... iqn com.drivescale:wwn:0x5000c50058d4d533 -> /dev/mapper/mpatha # multipath -ll mpatha (35000c50058c915a3) dm-1 SEAGATE,ST1000NM0023 size=932G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='queue-length 0' prio= status=active | `- 14:0:0:0 sdc 8:80 active ready running |-+- policy='queue-length 0' prio= status=enabled | `- 11:0:0:0 sdd 8:32 active ready running JBOD Controllers Controller 1 Controller 2 Physical Disk Device Details Single Disk in JBOD DriveScale Proprietary Information © 2016

18 DriveScale Troubleshooting©2017 DriveScale Inc. All Rights Reserved.

19 Key locations to look for more information on DMSDriveScale Software Base Directory: /opt/drivescale /var/log/drivescale: DriveScale log file directory Most logs are named with the DriveScale process name in the filename /var/lib/drivescale: DMS Database base directory /var/run/drivescale: Running processes directory. Server inventory information in JSON format Compute Information (CPU, Memory, Manufacturer, Model, etc.) Networking Information (NIC’s, IP Addresses, Link information, etc.) DriveScale Management Server Software Version Information DMS Database (Zookeeper): Database location: /var/lib/drivescale-zookeeper/data/version-2 Tools Location: /opt/drivescale/tools/zookeeper zk_ls.py – Provides a path listing of all data objects stored in DMS DB zk_cat.py – Displays stored JSON data for a particular data object in DMS DB DriveScale Proprietary Information © 2016

20 Key locations to look for more information on Server NodesDriveScale Software Base Directory: /opt/drivescale /var/log/drivescale: DriveScale log file directory Most logs are named with the DriveScale process name in the filename /var/lib/drivescale: Information for mapped and ingested drives Details on actively running configuration /var/run/drivescale: Running processes directory. Stores information about filesystems mounted on server nodes utilizing disks mapped by DriveScale Server inventory information in JSON format Compute Information (CPU, Memory, Manufacturer, Model, etc.) Networking Information (NIC’s, IP Addresses, Link information, etc.) DriveScale Server Software Version Information DriveScale Proprietary Information © 2016

21 Key DriveScale DMS commandsKey commands: ds-cluster ds-help ds-list ds-prepare ds-reconnect ds-release ds-repair ds-servers ds-shrink ds-speed DriveScale Proprietary Information © 2016

22 DriveScale Proof Of Concept©2017 DriveScale Inc. All Rights Reserved.

23 POC Goals The Proof of Concept (POC) testing is designed to demonstrate all the key aspects of the value proposition of the solution, namely: Flexibility: Define multiple forms of server nodes to include in a cluster, and compose servers into clusters as needed. Tear down and re-compose clusters on demand, and add server nodes into existing clusters. Integration: Demonstrate that DriveScale’s solution works seamlessly with existing best in class server and JBOD technology. Performance: Demonstrate that the DriveScale technology works well under load.

24 Dell lab Architecture with 1 x DSA, 5x Data Nodes per ClusterGENE Okay, so what is DriveScale? DriveScale is Data Center Infrastructure Designed for Scale-out. DriveScale provides an integrated combination of software, hardware, and cloud based services that provide scale out server admins and operators with flexible, responsive infrastructure. This allows those admins and operators to cope with rapidly changing software stacks and business needs. This is done without requiring changes to the app stack and without resorting to overprovisioning.

25 Equipment Description Quantity DriveScale Adapter 2Servers (1 Name Node + 5 Data nodes) DELL PowerEdge R730xd 6 DriveScale Management Server (DMS) 1 JBOD DELL EN-8435A-E6EBD HDD SEAGATE Constellation ES3 60 (for JBOD) 60 (for Direct Attached) DELL S4048-ON Dell Application SW Version: 9.8(0.0P5) TINA This back to the future is a problem! Operators and admins have lost flexibility from independent compute and storage resource scaling that exists in the scale up world and that loss is painful. The scale out ecosystem evolves so quickly that independent resource scaling can be even more important than before. The current rack server with trapped local disks model pushes admins to choose between wasting resources (basically overprovisioning to be safe or buying more servers with both compute and storage, regardless of what is needed) or saying no to new projects and workloads that are important to the business.

26 Tests Performed Logical Server and Cluster CreationPhysical Infrastructure Inventory Resiliency of Solution (Cable Pull / Disk Removal) Functional Tests with Hadoop Performance Tests TINA DriveScale value propositions are: First and foremost, alleviating pain around rigid infrastructure. We provide flexible and responsive physical infrastructure so that admins can provision exactly what’s needed and rebalance resources on demand Second, we provide a solution that is simple to deploy whether small or large - we provide a solution that is functionally and performance-wise equivalent to the standard rack server model, without changes required to an app stack. We provide a comprehensive set of REST APIs to make this system fully automatable, which is table stakes at scale. Finally, this has been designed from the start to be an enterprise grade solution, from high availability throughout the system to investments in security and an acknowledgement that people would want to bring their own server and JBOD hardware to the table.

27 Performance Results – testDFSIONote: Hadoop testDFSIO runs for 100G*32 files per node over a non-blocking topology with 5 Data Nodes mounted with 6 Disks each. Below test results are provided for both DriveScale (iSCSI) attached and direct attached disks clusters.

28 Performance Results – MRBENCHNote: Hadoop mrbench runs for 10 times the job with each 335 maps and 335 reduces per run and generates 100 input lines over 5 Data Nodes mounted with 6 Disks each. Below test results are provided for both DriveScale attached and direct attached disks clusters.

29 Performance Results – FIONote: File system i/o tests performed on 6 disks mounted on Data Node1 for both DriveScale (iSCSI) attached and direct attached. These were run against block devices (rather than file in file system) for repeatability. Below test results show aggregate I/O bandwidth (data transfer rate):

30 Conclusions – All Tests PASSEDFlexibility: Define multiple forms of server nodes to include in a cluster, and compose servers into clusters as needed. Tear down and re-compose clusters on demand, and add server nodes into existing clusters. PASS Integration: Demonstrate that DriveScale’s solution works seamlessly with existing best in class server and JBOD technology. PASS Performance: Demonstrate that the DriveScale technology works well under load. PASS – All workload tests through DriveScale Adapter completed within very closed execution time compared the DAS tests

31 PoC Minimal HW Requirements (with resiliency)1x DSA chassis (includes 4x DS Adapters) 1x JBOD to be loaded with the 60 drives 3 (minimum) x Servers (Data Nodes) with 1 direct attached drive each (4-12 drives will be remotely attached via DSA) 1x Server (Name Node) with 1 direct attached drive 1x Server (can be VM) for DMS and Ambari 2x 10GE switches for Data Flow Management Switch (1GE) DriveScale Proprietary Information © 2017