By Samiya Khan Research Scholar, Jamia Millia Islamia

1 By Samiya Khan Research Scholar, Jamia Millia IslamiaCl...
Author: Winfred Harrington
0 downloads 0 Views

1 By Samiya Khan Research Scholar, Jamia Millia IslamiaCloud-based Big Data Analytics – A Survey of Current Research and Future Directions By Samiya Khan Research Scholar, Jamia Millia Islamia

2 Data – An Ever-increasing Resource2.5 quintillion bytes of data is created every day [38] 90% of the total data created in last two years [38]

3 Data Sources

4 Need for Big Data AnalyticsHow is this data important? Identify which data needs to be kept, discarded and further analyzed Facilitate decision making for organizations Provide insights, trends, patterns and associations What is so challenging about this data? Generation, storage and sharing Irrelevance, insecurity, high complexity Applications of Big Data Analytics Medical and Scientific Research Businesses Issue management and decision making in socio-economic and environmental sector

5 Big Data Definition Collection of large, complex or required dataIt is difficult or impossible to store, process or analyze this data using standard methodologies, data management solutions and analytical solutions. Data in Big data can be classified into three categories – Structured Data Unstructured Data Semi-structured data

6 Models For Big Data Multi-V Model [34] HACE Theorem [26]Large volume of complex data from different and heterogeneous sources Data is decentralized and distributed in nature.

7 Where Does Cloud Fit In? By 2016, half of the data will be on the cloud [20]. Fundamental Requirements of Big Data Analytics High Storage Requirement High Performance Computing Rise of the concept of Big Data-as-a-service Cloud allows effective processing of large data sets Cloud offers a low cost storage solution for strong large data sets One such platform is Google BigQuery [12] Infrastructure-as-a-service (IaaS) model provides computing and storage resources as a service Issue Facing The Use of This Synergistic Model Information Security and Data Privacy

8 Frameworks for Cloud-based Big Data AnalyticsMapReduce for data processing on cluster of computers Hadoop - an open-source implementation of the MapReduce framework Cloud-based big data analytics frameworks - Google MapReduce, Spark, Haloop, Twister, Hadoop Reduce and Hadoop++

9 Cloud Computing Usage In Big Data

10 Recent Breakthroughs And MilestonesLee, Lee, Choi, Chung, and Moon [16] - Advantages and limitations of MapReduce in parallel data analytics. Starfish [13] - Performance improvement of the clusters throughout the cycle of data analytics. Borthakur et. al. [5] – Optimization of HBase and HDFS implementation for better responsiveness. Strambei [23] – Viability evaluation of OLAP Web Services for cloud-based architecture Khan, Naqvi, Alam and Rizvi [35] – Proposed a data model and schema for big data in the cloud. Ortiz, Oneto and Anguita [28] – Evaluate the use and efficiency of a proposed integrated Hadoop and MPI/OpenMP system. Li et. al. [17] - Analytical application used for modeling and predicting the outbreak of Dengue in Singapore. Chen, Hsu and Zeller [7] and Demirkan and Delen [10] - Investigated the concept of Continuous Analytics As A Service. Real-time Big Data Analysis AWS Kinesis [2] - AWS based-solution for real-time stream processing is. Frameworks and software systems - Apache S4 [3], IBM InfoSphere Streams [14] and Storm [22].

11 Issues/Challenges in Data ManagementHandling ever-increasoing volumes and varieties of data Storing data more efficiently Integration and porting of data between different data centers. Data integration for data coming from different sources and of diverse types. Optimization of energy consumption and resource usage.

12 Issues/Challenges in Model Building and ScoringExploring the elasticity and scalability potential of the cloud. Visualization and User Interaction Finding better data processing techniques for real-time visualization. Exploring options that can lead to more cost-effective devices, particularly for large scale visualization.

13 Other Issues/ChallengesFrom the perspective of business-related applications, striking a balance between generality and usefulness is a challenge. Finding techniques for better interactivity in the cloud to improve usability of the solution from the data analysts point of view. Debugging and checking the validity of the developed solutions. Other non-technical challenges also exist, which include lack of staffing, skills and business support.

14 Future Research DirectionsEvolution of analytics and information management with respect to cloud-based analytics. Adaptation and evolution of techniques and strategies to improve efficiency and mitigate risks. Formulate strategies and techniques to deal with the privacy and security concerns. Analysis and adaptation of legal and ethical practices to suit the changing viewpoint, impact and effects of technological advances in this regard.

15 References Agarwal, D., Das, S. and Abbadi, A.: Big Data and Cloud Computing: Current State and Future Opportunities. ACM /11/0003. (2011). Amazon Kinesis: Developer Resources, Apache S4: Distributed Stream Computing Platform, Assuncao, M. D., Calheiros, R. N., Bianchi, S. and Netto, M. A. S.: Big Data Computing and Clouds: Trends and Future Directions. J. Parallel Distrib. Computing, pp , (2015). Borthakur, D., Gray, J., Sarma, J. S., Muthukkaruppan, K., Spiegelberg, N., Kuang, H., Ranganathan, K., Molkov, D., Menon, A., Rash, S., Schmidt, R. and Aiyer, A.: Apache Hadoop Goes Real-time at Facebook, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, ACM, New York, USA, pp. 1071–1080. (2011). Calheiros, R. N., Vecchiola, C., Karunamoorthy, D. and Buyya, R.: The Aneka platform and QoS-driven resource provisioning for elastic applications on hybrid Clouds, Future Gener. Comput. Syst. 28 (6), pp. 861–870. (2012) Chen, Q., Hsu, M. and Zeller, H.: Experience in Continuous analytics as a Service (CaaaS), in: Proceedings of the 14th International Conference on Extending Database Technology, ACM, New York, USA, pp. 509–514. (2011). Chen, H., Chiang, R. H. L. and Storey, V. C.: Business Intelligence and Analytics: From Big Data to Big Impact. MIS Quarterly. Special Issue: Business Intelligence Research. (2012). Dean, J. and Ghemawat, S.: OSDI 2004, Demirkan, H. and Delen, D.: Leveraging the capabilities of service-oriented decision support systems: Putting analytics and big data in cloud. Decision Support Systems 55, pp (2013). GigaSpaces: Big Data Survey, Google Cloud Platform: Big Query, https://cloud.google.com/bigquery/ Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B. and Babu, S.: Starfish: A Self-tuning System for Big Data Analytics, in: Proceedings of the 5th Biennial Conference on Innovative Data Systems Research (CIDR 2011), pp. 261–272. (2011). IBM InfoSphere Streams: InfoSphere Streams, infosphere-streams. Kim, H., Abdelbaky, M. and Parashar, M.: CometPortal: A Portal for Online Risk Analytics Using CometCloud. 17th International Conference on Computer Theory and Applications (ICCTA2009). (2009). Lee, K. H., Lee, Y. J., Choi, H., Chung, Y.D. and Moon, B.: Parallel Data Processing with MapReduce: A Survey, SIGMOD Record 40 (4), pp. 11–20. (2011). Li, X., Calheiros, R. N., Lu, S., Wang, L., Palit, H., Zheng, Q. and Buyya, R.: Design and Development of an Adaptive Workflow-Enabled Spatial-Temporal Analytics Framework, in: Proceedings of the IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS 2012), IEEE Computer Society, Singapore, pp. 862–867. (2012). Manekar, A. and Pradeepini, G. A Review on Cloud-based Big Data Analytics. ICSES Journal on Computer Networks and Communication (IJCNC), Vol. 1, No, 1. (2015). Neaga, I. and Hao, Y. A Holistic Analysis of Cloud Based Big Data Mining. International Journal of Knowledge, Innovation and Entrepreneurship. Volume 2 No. 2, pp. 56—64. (2014). NESSI: Big Data: A New World of Opportunities, (2012). Schouten, E: Big Data As A Service, (2012). Storm: Apache Storm: Distributed and fault-tolerant real-time computation, Strambei, C.: OLAP Services on Cloud Architecture. IBIMA Publishing. Journal of Software and Systems Development. DOI: / (2012). Talia, D.: Clouds for Scalable Big Data Analytics. Published by IEEE Computer Society, (2013). Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G. J., Ng, A, Liu, B., Yu, P. S., Zhou, Z. H., Steinbach, M., Hand, D. J. and Steinberg, D.: Top 10 algorithms in Data Mining. Knowl Inf. Syst, 14:1–37. DOI: /s (2008). Wu, X., Zhu, X., Wu, G. and Ding, W.: Data Mining with Big Data. Retrieved from: (2013). Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A. and Khan, S. U.: The rise of “big data” on cloud computing: Review and open research issues. Information Systems 47, pp (2015). Ortiz, J. L. R., Oneto, L. and Anguita, D.: Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf. P2015 INNS Conference on Big Data. Published in Procedia Computer Science. Volume 53, pp (2015). Baker, T., Al-Dawsari, B., Tawfik, H., Reid, D. and Nyogo, Y.: GreeDi: An energy efficient routing algorithm for big data on cloud. Ad Hoc Networks 000, pp (2015). Chen, C. L. P. and Zhang, C. Y.: Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. Information Sciences 275, pp (2014). Liu, C., Yang, C., Zhang, X. and Chen, J.: External Integrity Verification for Outsourced Big Data in cloud and IoT: A Big Picture. Future Generation Computer System 49, pp (2015). O’Driscoll, A., Daugelaite, J. and Sleator, R. D.: ‘Big Data’, Hadoop and Cloud Computing in Genomics. Journal of Biomedical Informatics. Volume 46, Issue 5, pp (2013). Jackson, J. C., Vijayakumar, V., Quadir, M. A. and Bharathi, C.: Survey on Programming Models and Environments for Cluster, Cloud and Grid Computing that defends Big Data. 2nd International Symposium on Big Data and Cloud Computing (ISBCC ’15). Procedia Computer Science 50 (2015), pp (2015). Elragal, A.: ERP and Big Data: The Inept Couple. Procedia Technology, 16, pp (2014). Khan, I., Naqvi, S.K. Alam, M. Rizvi, S.N.A.: Data model for Big Data in cloud environment. Computing for Sustainable Global Development (INDIACom), 2nd International Conference, pp. 582 – 585. (2015). Zhao, J., Wang, L., Tao, J., Chen, J., Sun, W., Ranjan, R., Kołodziej, J., Streit, A. and Georgakopoulos, D.: A security framework in G-Hadoop for big data computing across distributed Cloud data centers. Journal of Computer and System Sciences 80, pp (2014). Shakil, K.A.; Sethi, S.; Alam, M.: An effective framework for managing university data using a cloud based environment, Computing for Sustainable Global Development (INDIACom), 2nd International Conference, Vol. no. 1262, 1266, pp (2015). Alam, M., & Shakil, K. A.: Cloud Database Management System Architecture. UACEE International Journal of Computer Science and its Applications, 3 (1), pp (2013). IBM. (n.d.). Bringing Big Data to the Enterprise,

16 Thank You