With the drastic development of computing technologies, there is an ever-increasing trend in the growth of data. Data scientists are overwhelmed with such a large and ever-increasing amount of data, as this now requires more processing channels. The big concern arising here for large-scale data is to provide support for the decision making process. Here in this study, the MapReduce programming model is applied, an associated implementation introduced by Google. This programming model involves the computation of two functions; Map and Reduce. The MapReduce libraries automatically parallelize the computation and handle complex tasks including big data distribution, loads and fault tolerance. This MapReduce implementation with the source formation of Google and the open-source mechanism, Hadoop has an objective of handling computation of large clusters of commodities. Our implication of MapReduce and Hadoop framework is aimed at discussing terabytes and petabytes of storage with thousands of machines parallel to every machine and process at identical times. This way, large processing and manipulation of big data are maintained with effective result orientations. This study will show up the basics of MapReduce programming and open-source Hadoop structure application. The Hadoop system can speed up the handling of big data and respond very fast.
Published in | Advances in Applied Sciences (Volume 6, Issue 3) |
DOI | 10.11648/j.aas.20210603.11 |
Page(s) | 43-48 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2021. Published by Science Publishing Group |
Google MapReduce Processes, Hadoop, Parallel Data Processing, HDFS, Cloud Computing, Large Cluster Data Processing
[1] | G. Z. &. C. B. Jason R Swedlow, "Channeling the data deluge," Nature methods, vol. 8, p. 463–465, 2011. |
[2] | J. Maitrey S, "An Integrated Approach for CURE Clustering using Map-Reduce Techniques," In Proceedings of Elsevier, vol. 2, 2013. |
[3] | D. D, "MapReduce: A major step backwards," The Database Column, 2011. |
[4] | Y. Kim and K. Shim, "Parallel Top-K Similarity Join Algorithms Using MapReduce," Arlington, VA, USA, 2012. |
[5] | J. Shafer, S. Rixner and A. L. Cox, "The Hadoop distributed filesystem: Balancing portability and performance," White Plains, NY, USA, 2010. |
[6] | S. M. CA Moturi, "Use of MapReduce for Data Mining and Data Optimization on a Web Portal," International Journal of Computer, vol. 56, no. 7, 2012. |
[7] | C. J. Seema Maitreya, "MapReduce: Simplified Data Analysis of Big Data," Procedia Computer Science, vol. 57, pp. 563-571, 2015. |
[8] | S. G. Jeffrey Dean, "MapReduce: Simplified Data Processing on Large Clusters," USENIX Association OSDI, vol. 4, pp. 137-149, 2004. |
[9] | R. M. Yoo, A. Romano and C. Kozyrakis, "Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system," Austin, TX, USA, 2009. |
[10] | H. C. Y. D. C. B. M. Kyong-Ha Lee, "Parallel data processing with MapReduce: a survey," ACM SIGMOD Record, vol. 40, no. 4, 2012. |
[11] | B. P. J. S. H. S. B. R. J. Bayardo, "PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce," PVLDB, vol. 2, no. 2, pp. 1426-1437, 2009. |
[12] | S. G. Jeffrey Dean, "MapReduce: simplified data processing on large clusters," Communications of the ACM, vol. 51, no. 1, 2008. |
[13] | J. Ekanayake, S. Pallickara and G. Fox, "MapReduce for Data Intensive Scientific Analyses," Indianapolis, IN, USA, 2008. |
[14] | A. Alam and J. Ahmed, "Hadoop Architecture and Its Issues," Las Vegas, NV, USA, 2014. |
[15] | R. K. R. R. Vijayakumari, "Comparative analysis of Google File System and Hadoop Distributed File System," International Journal of Advanced Trends in Computer Science and Engineering, vol. 3, no. 1, pp. 553-558, 2014. |
[16] | J. J. B. X. Y. F. Wang, "Hadoop high availability through metadata replication”, in Proc," The first international workshop on Cloud data management, pp. 37-44, 2009. |
[17] | A. D. R.-L. H. D. S. P. Hung-chih Yang, "Map-reduce-merge: simplified relational data processing on large clusters," 2007. |
APA Style
Abdiaziz Omar Hassan, Abdulkadir Abdulahi Hasan. (2021). Simplified Data Processing for Large Cluster: A MapReduce and Hadoop Based Study. Advances in Applied Sciences, 6(3), 43-48. https://doi.org/10.11648/j.aas.20210603.11
ACS Style
Abdiaziz Omar Hassan; Abdulkadir Abdulahi Hasan. Simplified Data Processing for Large Cluster: A MapReduce and Hadoop Based Study. Adv. Appl. Sci. 2021, 6(3), 43-48. doi: 10.11648/j.aas.20210603.11
AMA Style
Abdiaziz Omar Hassan, Abdulkadir Abdulahi Hasan. Simplified Data Processing for Large Cluster: A MapReduce and Hadoop Based Study. Adv Appl Sci. 2021;6(3):43-48. doi: 10.11648/j.aas.20210603.11
@article{10.11648/j.aas.20210603.11, author = {Abdiaziz Omar Hassan and Abdulkadir Abdulahi Hasan}, title = {Simplified Data Processing for Large Cluster: A MapReduce and Hadoop Based Study}, journal = {Advances in Applied Sciences}, volume = {6}, number = {3}, pages = {43-48}, doi = {10.11648/j.aas.20210603.11}, url = {https://doi.org/10.11648/j.aas.20210603.11}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.aas.20210603.11}, abstract = {With the drastic development of computing technologies, there is an ever-increasing trend in the growth of data. Data scientists are overwhelmed with such a large and ever-increasing amount of data, as this now requires more processing channels. The big concern arising here for large-scale data is to provide support for the decision making process. Here in this study, the MapReduce programming model is applied, an associated implementation introduced by Google. This programming model involves the computation of two functions; Map and Reduce. The MapReduce libraries automatically parallelize the computation and handle complex tasks including big data distribution, loads and fault tolerance. This MapReduce implementation with the source formation of Google and the open-source mechanism, Hadoop has an objective of handling computation of large clusters of commodities. Our implication of MapReduce and Hadoop framework is aimed at discussing terabytes and petabytes of storage with thousands of machines parallel to every machine and process at identical times. This way, large processing and manipulation of big data are maintained with effective result orientations. This study will show up the basics of MapReduce programming and open-source Hadoop structure application. The Hadoop system can speed up the handling of big data and respond very fast.}, year = {2021} }
TY - JOUR T1 - Simplified Data Processing for Large Cluster: A MapReduce and Hadoop Based Study AU - Abdiaziz Omar Hassan AU - Abdulkadir Abdulahi Hasan Y1 - 2021/07/09 PY - 2021 N1 - https://doi.org/10.11648/j.aas.20210603.11 DO - 10.11648/j.aas.20210603.11 T2 - Advances in Applied Sciences JF - Advances in Applied Sciences JO - Advances in Applied Sciences SP - 43 EP - 48 PB - Science Publishing Group SN - 2575-1514 UR - https://doi.org/10.11648/j.aas.20210603.11 AB - With the drastic development of computing technologies, there is an ever-increasing trend in the growth of data. Data scientists are overwhelmed with such a large and ever-increasing amount of data, as this now requires more processing channels. The big concern arising here for large-scale data is to provide support for the decision making process. Here in this study, the MapReduce programming model is applied, an associated implementation introduced by Google. This programming model involves the computation of two functions; Map and Reduce. The MapReduce libraries automatically parallelize the computation and handle complex tasks including big data distribution, loads and fault tolerance. This MapReduce implementation with the source formation of Google and the open-source mechanism, Hadoop has an objective of handling computation of large clusters of commodities. Our implication of MapReduce and Hadoop framework is aimed at discussing terabytes and petabytes of storage with thousands of machines parallel to every machine and process at identical times. This way, large processing and manipulation of big data are maintained with effective result orientations. This study will show up the basics of MapReduce programming and open-source Hadoop structure application. The Hadoop system can speed up the handling of big data and respond very fast. VL - 6 IS - 3 ER -