Big Data Tools: Advantages and Disadvantages

Maria Ijaz Baig, Liyana Shuib, Elaheh Yadegaridehkordi


Big data tools have increasingly become crucial requirements of managing the complex and voluminous data. The selection of right tool requires an in-depth knowledge of existing big data tools and their potentiality. This paper provides a review of big data tools by selecting 34 articles from 2011 to 2018. This study provides pertinent information about most popular big data tools. The various big data tools related advantages and disadvantages are also discussed in detail. The findings of this study categorized the big data tools according to their potentiality. This study is beneficial for researchers to explore the big data sets according to its potentiality. It also provides deep insight of big data tools applicability to apply in real-time environment. This research is also helpful for practitioners to select the right big data tool according to requirement.


Big data tools, Big data advantages and disadvantages, Big data potentiality

Full Text:

Abstract PDF


Abramova, V., & Bernardino, J. (2013). NoSQL databases: MongoDB vs cassandra. In Proceedings of the international conference on computer science and software engineering (pp. 14-22). ACM.

Bakshi, K. (2012). Considerations for big data: Architecture and approach. In 2012 Aerospace Conference (pp. 1-7). IEEE.

Bende, S., & Shedge, R. (2016). Dealing with small files problem in hadoop distributed files system. Procedia ComputerScience, 79(1), 1001-1012.

Bhardwaj, A., Kumar, A., Narayan, Y., & Kumar, P. (2015). Big data emerging technologies: A Case Study with analyzing twitter data using apache hive. In 2015 2nd International Conference on Recent Advances in Engineering & Computational Sciences (RAECS) (pp. 1-6). IEEE.

Chandarana, P., & Vijayalakshmi, M. (2014). Big data analytics tools. In 2014 International Conference on Circuits,Systems, Communication and Information Technology Applications (CSCITA) (pp. 430-434). IEEE.

Chebotko, A., Kashlev, A., & Lu, S. (2015). A big data modeling methodology for Apache Cassandra. In International Congress on Big Data (pp. 238-245). IEEE.

Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Mobile networks and applications, 19(2), 171-209.

Condie, T., Mineiro, P., Polyzotis, N., & Weimer, M. (2013). Machine learning on big data. In 29th International Conference on Data Engineering (ICDE) (pp. 1242-1244). IEEE.

Dhyani, B., & Barthwal, A. (2014). Big data analytics using Hadoop. International Journal of Computer Applications, 108(12), 265-270.

Fan, W., & Bifet, A. (2013). Mining big data: current status and forecast to the future. ACM sIGKDD Explorations Newsletter, 14(2), 1-5.

Fontugne, R., Mazel, J., & Fukuda, K. (2014). Hashdoop: A MapReduce tool for network anomaly detection. In conference on computer communications workshops (INFOCOM WKSHPS) (pp. 494-499). IEEE.

Gupta, R., Gupta, H., & Mohania, M. (2012). Cloud computing and big data analytics: what is new from databases perspective. In International Conference on Big Data Analytics (pp. 42-61). Springer, Berlin, Heidelberg.

Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Khan, S. U. (2015). The rise of big data on cloud computing: Review and open research issues. Information systems Elsevier, 47, 98-115.

Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F. B., & Babu, S. (2011). Starfish: A Self-tuning System for Big Data Analytics. In Cidr, 11(2), 261-272.

Islam, M., Huang, A. K., Battisha, M., Chiang, M., Srinivasan, S., Peters, C., & Abdelnur, A.(2012). Oozie: Towards a scalable workflow management system for hadoop. In Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies (p. 4). ACM.

Jin, X., Wah, B. W., Cheng, X., & Wang, Y. (2015). Significance and challenges of big data research. Big Data Research, 2(2), 59-64.

Katal, A., Wazid, M., & Goudar, R. H. (2013). Big data: issues, challenges, tools and good practices. In Sixth international conference on contemporary computing (IC3) (pp. 404-409). IEEE.

Landset, S., Khoshgoftaar, T. M., Richter, A. N., & Hasanin, T. (2015). A survey of open source tools for machine learning with big data in the Hadoop ecosystem. Journal of Big Data, 2(1), 24.

Lee, Y., Kang, W., & Lee, Y. (2011). A Hadoop-based packet trace processing tool. In International Workshop on Traffic Monitoring and Analysis (pp. 51-63). Springer, Berlin, Heidelberg.

Liu, X., Iftikhar, N., & Xie, X. (2014). Survey of real-time processing systems for big data. In Proceedings of the 18th International Database Engineering & Applications Symposium (pp. 356-361). ACM.

Loganathan, A., Sinha, A., Muthuramakrishnan, V., & Natarajan, S. (2014). A systematic approach to Big Data. International Journal of Information & Computation Technology, 4(09), 869-878.

Manogaran, G., Thota, C., Lopez, D., Vijayakumar, V., Abbas, K. M., & Sundarsekar, R. (2017). Big data knowledge system in healthcare. In Internet of things and big data technologies for next generation healthcare (pp. 133-157). SpringerLink.

Marchal, S., Jiang, X., State, R., & Engel, T. (2014). A big data architecture for large scale security monitoring. In International Congress on Big Data (pp. 56-63). IEEE.

Marz, N., & Warren, J., (2015). Big Data: Principles and best practices of scalable real-time data systems, 1st ed., (pp. 59-79). Manning Publications, America.

Mayer-Schnberger, V., & Cukier, K., (2013). Big data: A revolution that will transform how we live, work, and think, 1st ed., Houghton Mifflin, New Zealand.

Moon, S., Lee, J., & Kee, Y. S. (2014). Introducing ssds to the hadoop mapreduce tool. In 7th International Conference on Cloud Computing (pp. 272-279). IEEE.

Mukherjee, A., Datta, J., Jorapur, R., Singhvi, R., Haloi, S., & Akram, W. (2012). Shared disk big data analytics with apache hadoop. In 19th International Conference on High Performance Computing (pp. 1-6). IEEE.

Odriscoll, A., Daugelaite, J., & Sleator, R. D. (2013). Big data, Hadoop and cloud computing in genomics. Journal of biomedical informatics, 46(5), 774-781.

Oancea, B., & Dragoescu, R. M. (2014). Integrating R and hadoop for big data analysis. arXiv preprint arXiv:1407.4908.

Oussous, A., Benjelloun, F. Z., Lahcen, A. A., & Belfkih, S. (2018). Big Data technologies: A survey. Journal of King Saud University-Computer and Information Sciences, 30(4), 431-448.

Patel, A. B., Birla, M., & Nair, U. (2012). Addressing big data problem using Hadoop and Map Reduce. In Nirma University International Conference on Engineering (NUiCONE) (pp. 1-5). IEEE.

Ranjan, R. (2014). Streaming big data processing in datacenter clouds. IEEE Cloud Computing, 21(1), 78-83.

Rathee, S. (2013). Big data and Hadoop with components like Flume, Pig, Hive and Jaql. In International conference on cloud, big data and trust (Vol. 15).

Sharma, P. P., & Navdeti, C. P. (2014). Securing big data hadoop: a review of security issues, threats and solution. Int. J. Comput. Sci. Inf. Technol, 5(2), 2126-2131.

Shoro, A. G., & Soomro, T. R. (2015). Big data analysis: Apache spark perspective. Global Journal of Computer Science and Technology, 14(4), 47-59.

Shukla, P., Radadiya, B., & Akotiya, K. (2015). An emerging trend of big data for high volume and varieties of data to search of agricultural data. Oriental journal of computer science & technology, 8(2), 121-129.

Simovic, A. (2018). A Big Data smart library recommender system for an educational institution. Library Hi Tech, 36(3), 498-523.

Singh, K., Guntuku, S. C., Thakur, A., & Hota, C. (2014). Big data analytics tool for peer-to-peer botnet detection using random forests. Information Sciences, 278(2), 488-497.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.