1 Next-Gen Decision Making in <2ms @ChanceFood
2
3 VS.
4
5
6
7 Likelihood of millionaireX (predictor) Spend amount Y (response) Likelihood of millionaire Simple Velocity Advanced
8
9
10
11
12 Hard Metrics Goal Latency < 40ms Ideally < 16ms Throughput Goal of 2000 events / second Durability No loss, every message gets exactly one response Availability 99.5% uptime (downtime of 1.83 days / year); Ideally % uptime (downtime of 5.26 minutes / year) Scalability Can add resources, still meet latency requirements Integration Transparently connected to existing systems – Hardware, Messaging, HDFS Soft Metrics Open Source All components licensed as open source Extensibility Rules can be updated, model is regularly refreshed
13
14 Onyx
15 Performance Roadmap Enterprise Readiness Community
16
17
18 Spark streaming is missing – nonstarter due to microbatch, lack of dynamic dag reconfiguration
19 Fast Easy to use Mature *********** Failures are not independent, nimbus, no dynamic topologies, 1 sec ack, resource usage Community stagnating, only Horton Roadmap still to bring it to enterprise (Integration with YARN, elastic topologies, high availability)
20 Easy to Use Fast Support for SQL-like queries ----- Meeting Notes (10/27/15 10:14) ----- Reset to upstream data source, Shared JVM, No dynamic topologies Young community Roadmap – fine grained fault tolerance, in-memory store integration, off-heap memory, full SQL
21
22 YARN Veterans from Yahoo! Finance and HadoopBuilt for Enterprise stability and durability before performance ***************** Phu Hoang - CEO and Co-Founder head of engineering at Yahoo Amol Kekre - Led Yahoo! Finance and Led YARN Chetan - Lead architect from Yahoo Finance Thomas Weise - Hadoop Veteran from Yahoo YARN
23 Dynamic topologies Downstream components do not affect upstream
24 Fine grained control of localityNo single point of failure Operators are independent
25 Independence of partitionsAuto-scaling (throughput and latency)
26 Batch, micro-batch, and true streaming
27 Performance Avg. 0.25ms, @70k records/sec, w/ 600GB RAMThread Local on ~54M events Percentiles (in ms) Throughput Count Avg (ms) 90% 95% 99% 99.9% 4 9’s 5 9’s 6 9’s 70k/sec 54,126,122 0.19 1 2 5 6
28 Durability Two physically independent pipelines on the same cluster processing identical data For the same tuple, we find the best-case time between two pipelines 39 records out of 5.2M exceeded 16ms 173 out of 5.2M exceeded 16ms in one pipeline but succeeded in the other % success rate – “Five Nines” Average Latency of ms
29
30 @ChanceFood slideshare.net/ilganeli/
31 Appendix
32 Streaming Technologies EvaluatedSpark Streaming Samza Storm Feedzai Infosphere Streams Flink Ignite VoltDB Cassandra Apex Of all evaluated technologies, Apache Apex is the only technology that is ready to bring the decision making solution to production based on: Maturity Fault-tolerance Enterprise-readiness Performance Focus on open source Drive Roadmap Competitive Advantage for C1
33 Stream Processing – Apache StormAn open-source, distributed, real-time computation system Logical operators (spouts and bolts) form statically parallelizable topologies Very high throughput of messages with very low latency Can provide <10ms latency end-end under normal operation Basic abstractions provide an at-least-once processing guarantee Limitations Nimbus is a single point of failure Rectified by Hortonworks, but not yet available to the public (no timeline for release) Upstream bolt/spout failure triggers re-compute on entire tree Can only create parallel independent stream by having separate redundant topologies Bolts/spouts share JVM Hard to debug Failed tuples cannot be replayed quicker than 1s No dynamic topologies Cannot add or remove applications without service interruption
34 Stream Processing – Apache FlinkAn open-source, distributed, real-time computation system Logical operators are compiled into a DAG of tasks executed by Task Managers Supports streaming, micro-batch, batch compute Supports aggregate operations on streams (reduce, join, groupBy) Capable of <10 ms end-end latency with streaming under normal operation Can provide exactly-once processing guarantees Limitations Failures trigger reset of ALL operators to last checkpoint Depends on upstream message broker to track state Operators share JVM Failure in one brings down all tasks sharing that JVM Hard to debug No dynamic topologies Young community, young product
35 Stream Processing – Apache ApexAn open-source, distributed, real-time computation system on YARN Apex is the core system powering DataTorrent, released under ASF Demonstrated high throughput with low latency running a next-generation C1 model (avg. 0.25ms, max 70k records/sec) w/ 600GB RAM True YARN application developed from principles of Hadoop and YARN at Yahoo! Mature product (derived from proven solutions in Yahoo! Finance and Hadoop) Built by team under Phu Hoang (CEO of DataTorrent, Head of Engineering at Yahoo) who built Hadoop Amol (CTO of DataTorrent) led the team that built YARN DataTorrent (Apex) is executing on production clusters at Fortune 100 companies. Owner: Dongming
36 Stream Processing – Apache ApexMaturity Designed to process and manage global data for Yahoo! Finance Primary focus is on stability, fault-tolerance and data management Only OSS streaming technology considered designed explicitly for the financial world Data or computation could never be lost or replicated Architecture had to never go down Goal was to make it rock-solid and enterprise-ready before worrying about performance Data flow across countries – perfect for use-case that requires cross-cluster interaction Enterprise Readiness Advanced support for: Encryption, authentication, compression, administration, and monitoring Deployment at scale in the cloud and on-prem – AWS, Google Cloud, Azure Integrates with huge set of existing tools: HDFS, Kafka, Cassandra, MongoDB, Redis, ElasticSearch, CouchDB, Splunk, etc. Owner: Dongming
37 Apex Platform – SummaryApex Architecture Networks of physically independent, parallelizable operators that scale dynamically Dynamic topology modification and deployment Self-healing, fault tolerant, & recoverable Durable messaging queues between operators, check-pointed in memory and on disk Resource manager is a replicated YARN process, monitors and restarts downed operators No single point of failure, highly modular design Can specify locality of execution (avoids network and inter-process latency) Guarantees at-least-once, at-most-once, or exactly-once processing Directed Acyclic Graph (DAG) er Operator Filtered Stream Enriched Stream er Operator er Operator Output Stream Tuple Tuple Enriched Stream Filtered Stream er Operator
38 Apex Platform – Overview
39 Apex Platform – Malhar
40 Apex Platform – Cluster ViewHadoop Node YARN Container RTS App Master Hadoop Edge Node DT RTS Management Server REST API CLI Hadoop Node YARN Container Thread1 Op2 Op1 Thread-N Op3 Streaming Container Hadoop Node YARN Container Thread1 Op2 Op1 Thread-N Op3 Streaming Container Part of Community Edition DT RTS Management Server REST API
41 Apex Platform – OperatorsOperators can be dynamically scaled Flexible stream configuration Parallel Redis / HDHT DAGs Separate visualization DAG Parallel partitioning Durability of data Scalability Organization for in-memory store Unifiers Combine statistics from physical partitions
42 Dynamic Topology ModificationCan redeploy new operators and models at run-time! Can reconfigure settings on the fly
43 Apex Platform – Failure RecoveryPhysical independence of partitions is critical Redundant STRAMs Configurable window size and heartbeat for low-latency recovery Downstream failures do not affect upstream components Snapshotting only depends on previous operator, not all previous operators Can deploy parallel DAGs with same point of origin (simpler from a hardware and deployment perspective)
44 Apex Platform – WindowingSliding window and tumbling window Window based on checkpoint No artificial latency Used for stats measurement
45 Enterprise Readiness ApexGreat UI to monitor, debug, and control system performance Fault-tolerance and recovery out of the box - no additional setup, or improvement needed YARN is still a single point of failure, a name node failure can still impact the system Built-in support for dynamic and automatic scaling to handle larger throughputs Native integration with Hadoop, YARN, and Kafka – next-gen standard at C1 Mature product Apex is derived from the principles of Hadoop and YARN over the course of many years Built and planned by chief Hadoop architects Proven performance in production at Fortune 100 companies
46 Enterprise Readiness Storm FlinkWidely used but abandoned by creators at Twitter for Heron in production Storm debug-ability - topology components are bundled in one process Resource demands Need dedicated hardware Can’t scale on demand or share usage Topology creation/tear-down is expensive, topologies can’t share cluster resources Have to manually isolate & de-commission machines Performance in failure scenarios is insufficient for this use-case Flink Operational performance has not been proven Only one company (ResearchGate) officially uses Flink in production Architecture shares fundamental limitations of Storm with regards to dynamically scaling operators & topologies and debugability
47 Performance Storm Flink ApexMeets latency and throughput requirements only when no failures occur. Resilience to failures only possible by running fully independent clusters Difficult to debug and operationalize complex systems (due to shared JVM and poor resource management) Flink Broader toolset than Storm or Apex – ML, batch processing, and SQL-like queries Failures reset ALL operators back to the source – resilience only possible across clusters Difficult to debug and operationalize complex systems (due to shared JVM) Apex Supports redundant parallel pipelines within the same cluster Outstanding latency and throughput even in failure scenarios Self-healing independent operators (simple to isolate failures) Only framework to provide fine-grained control over data and compute locality
48 Roadmap – Storm Commercial support from from Hortonworks but limited code contributions Twitter - Storm’s largest user - has completely abandoned Storm for Heron Business Continuity Enhance Storm’s enterprise readiness with high availability (HA) and failover to standby clusters Eliminate Nimbus as a single point of failure Operations Apache Ambari support for Nimbus HA node setup Elastic topologies via YARN and Apache Slider. Incremental improvements to Storm UI to easily deploy, manage and monitor topologies. Enterprise readiness Declarative writing of spouts, bolts, and data-sources into topologies
49 Roadmap – Flink Fine-grained fault tolerance (avoid rollback to data source) – Q2 2015 SQL on Flink – Q3/Q4 2015 Integrate with distributed memory storage – No ECD Use off-heap memory – Q1 2015 Integration with Samoa, Tez, Mahout DSL – No ECD
50 Roadmap – Apex Roadmap for next 6 monthsSupport creation of reusable pluggable modules (topologies) Add additional operators to connect to existing technology Databases Messaging Modeling systems Add additional SQL-like operations Join Filter GroupBy Caching Add ability to create cycles in graph Allows re-use of data for ML algorithms (similar to Spark’s caching)
51 Road Map Comparison Storm Flink ApexRoadmap is intended to bring Storm to enterprise readiness Storm is not enterprise ready today according to Hortonworks Flink Roadmap brings Flink up to par with Spark and Apex, does not create new capabilities relative to either Spark is more mature for batch-processing and micro-batch and Apex is more mature from a streaming standpoint. Apex No need to improve core architecture, focus is instead on adding functionality Better support for ML Better support for wide variety of business use cases Better integration with existing tools Stated commitment to letting the community dictate direction. From incubator proposal: “DataTorrent plans to develop new functionality in an open, community-driven way”
52 Community Vendor and community involvement drive roadmap and project growth Storm Limited improvements to core components of Storm in recent months Limited focused and active committers Actively promoted and supported in public by Hortonworks Flink Some adoption in Europe, growing response in U.S. 11 active committers, 10 are from Data Artisans (company behind Flink) Community is very young, but there is substantial interest Apex Wide support network around Apex due to its evolution from Hadoop and YARN Young but actively growing community: Opportunity for C1 to drive growth and define the direction of this product
53 Streaming Solutions ComparisonApex Ideal for this use case, meets all performance requirements and is ready for out-of-the-box enterprise deployment Committer status from C1 allows us to collaboratively drive roadmap and product evolution to fit our business need. Storm Great for many streaming use cases but not the right fit for this effort Performance in failure scenarios does not meet our requirements Community involvement is waning and there is a limited road map for substantial product growth Flink Poised to compete with Spark in the future based on community activity and roadmap Not ready for enterprise deployment: Technical limitations around fault-tolerance and failure recovery Lack of broad community involvement Roadmap only brings it up to par with existing frameworks
54 New Capabilities Provided by Proposed ArchitectureMillisecond Level Streaming Solution Fault Tolerant & Highly Available Parallel Model Scoring for Arbitrary Number of Models Quick Model Generation & Execution Dynamic Scalability based on Latency or Throughput Live Model Refresh A/B Testing of Models in Production System is Self Healing upon failure of components (**)
55 Decisioning System Architecture - StrengthsInternal Capital One software, running on Capital One hardware, designed by Capital One Open source Internally maintainable code Living Model Can be re-trained on current data & updated in minutes, not years Offline models can expanded and re-developed and deployed to production at will Extensible Modular architecture with swappable components A/B Model Testing in Production Dynamic Deployment / Refresh of Models
56 Hardware MDC Hardware Specifications Server Quantity – 15Server Model – Supermicro CPU – Intel Xeon E5-2695v2 2.4Ghz 12Cores Memory – 256GB HDD – (5) 4TB Seagate SATA Network Switch – Cisco Nexus GB NIC – 2port SFP+ 10GbE MDC Software Specifications Hadoop – v2.6.0 Yarn – v2.6.0 Apache Apex – v3.0 Linux OS – RHEL v6.7 Linux OS Kernel el6.x86_64
57 Performance Comparison - Redis vs. Apex-HDHTApex-HDHT - Thread Local on ~2M events Stats Percentiles (in ms) Throughput Count Avg (ms) 90% 95% 99% 99.9% 4 9’s 5 9’s 6 9’s 70k/sec 1,807,283 0.253 1 2 Apex-HDHT Thread Local on ~54M events 54,126,122 0.19 5 6 Apex-HDHT No locality on ~2M events 40k/sec 2,214,777 51.651 98 126 381 489 494 495 Redis Thread local on ~2M events 8.5k/sec 2,018,057 13.654 16 18 20 21 22