Proteomics Data Classification Using Advanced Machine Learning Algorithm

Preethi Kolluru Ramanaiah

doi:doi:10.11648/j.ajai.20240801.13

Research Article |

| Peer-Reviewed

Proteomics Data Classification Using Advanced Machine Learning Algorithm

Preethi Kolluru Ramanaiah^*

Published in American Journal of Artificial Intelligence (Volume 8, Issue 1)

Received: 21 April 2024 Accepted: 3 May 2024 Published: 17 May 2024

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Proteomics, the study of proteins and their functions within biological systems, has become increasingly data-intensive, presenting both opportunities and challenges. This project addresses the need for advanced data analytics and data integrity in proteomics research. Leveraging the power of machine learning (ML) and blockchain technology, this attempt aims to transform proteomics research. This work encompasses three key objectives. First, collect, clean, and integrate proteomics data from diverse sources, ensuring data quality and consistency. Second, employ ML algorithms to analyze this data, revealing crucial insights, identifying proteins, and predicting their functions. Third, implement blockchain technology to safeguard the authenticity and integrity of the proteomics data, providing an auditable and tamper-proof record. Implemented a user-friendly web interface, facilitating collaboration among researchers and scientists by granting access to shared data and results. This study included various classification methods for the investigation of protein classification, namely, random forests, logistic regression, neural networks, support vector machines, and decision trees. In conclusion, the proposed work is poised to revolutionize proteomics research by enhancing data analytics capabilities and securing data integrity, thereby enabling scientists to make more informed and confident discoveries in this critical field.

Published in	American Journal of Artificial Intelligence (Volume 8, Issue 1)
DOI	10.11648/j.ajai.20240801.13
Page(s)	13-21
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Proteomics, Computational Biology, Bioinformatics, Machine Learning, Blockchain

References

[1]	J. Bernardes and C. Pedreira, (2013), "A Review of Protein Function Prediction Under Machine Learning Perspective," Recent Patents on Biotechnology, vol. 7, no. 2, pp. 122–141. http://dx.doi.org/10.2174/18722083113079990006
[2]	Aggarwal, Divyanshu & Hasija, Yasha. (2022). A Review of Deep Learning Techniques for Protein Function Prediction. https://doi.org/10.48550/arXiv.2211.09705
[3]	Karunapala, 2015. Karunapala, E. (2015). Protein Function Prediction Using Machine Learning. PhD thesis.
[4]	Piovesan et al., 2015. Piovesan, D., Giollo, M., Leonardi, E., Ferrari, C., and Tosatto, S. C. (2015). Inga: protein function prediction combining interaction networks, domain assignments and sequence similarity. Nucleic acids research, 43(W1): W134–W140. http://dx.doi.org/10.1093/nar/gkv523
[5]	Kotlyar et al., 2014. Kotlyar, M., Pastrello, C., Pivetta, F., Sardo, A. L., Cumbaa, C., Li, H., Naranian, T., Niu, Y., Ding, Z., Vafaee, F., et al. (2014). In silico prediction of physical protein interactions and characterization of interactome orphans. Nature methods, 12(1): 79 https://doi.org/10.1038/nmeth.3178
[6]	Rentzsch and Orengo, 2013. Rentzsch, R. and Orengo, C. A. (2013). Protein function prediction using domain families. In BMC bioinformatics, volume 14, page S5. BioMed Central. https://doi.org/10.1186/1471-2105-14-S3-S5
[7]	Z. He, S. Zhang, F. Gu, and J. Wu, (2019). Mining conditional discriminative sequential patterns, Inf. Sci., vol. 478, pp. 524–539. http://dx.doi.org/10.1016/j.ins.2018.11.043
[8]	Singh and Tripathi, 2016. Singh, U. and Tripathi, S. (2016). Protein classification using hybrid feature selection technique. In International Conference on Smart Trends for Information Technology and Computer Communications, pages 813–821. Springer. ISBN: 978-981-10-3432-9.
[9]	Zhang, Y., Li, X., & Wang, Y. (2023). Proteomics data analysis using machine learning on AWS. Bioinformatics, 40(10), 1839-1846.
[10]	Goodfellow I.; Pouget-Abadie J.; Mirza M.; Xu B.; Warde-Farley D.; Ozair S.; Courville A.; Bengio Y. (2020). Generative adversarial networks. Communications of the ACM 2020, 63 (11), 139–144. https://doi.org/10.1145/3422622
[11]	Z. He, G. Xu, C. Sheng, B. Xu and Q. Zou, "Reference-Based Sequence Classification," in IEEE Access, vol. 8, pp. 218199-218214, 2020, https://doi.org/10.1109/ACCESS.2020.3042757
[12]	Agarwal, Ankita & Singh, Kunal & Kaushik, Shri Kant & Bahadur, Ranjit. (2022). A comparative analysis of machine learning classifiers for predicting protein-binding nucleotides in RNA sequences. Computational and Structural Biotechnology Journal. 20. https://doi.org/10.1016/j.csbj.2022.06.036
[13]	Structural Protein Sequences. (2018). Kaggle. Source: https://www.kaggle.com/datasets/shahir/protein-data-set last accessed 2023/11/15.
[14]	Research Collaboratory for Structural Bioinformatics. Source: https://www.rcsb.org/ last accessed 2023/11/15
[15]	Jiang, Mudi & Wang, Jiaqi & Hu, Lianyu & He, Zengyou. (2023). Random forest clustering for discrete sequences. Pattern Recognition Letters. 174. 10.1016/j.patrec.2023.09.001. http://dx.doi.org/10.1016/j.patcog.2024.110388
[16]	Chaudhary, M. (2021). TF-IDF Vectorizer scikit-learn - Mukesh Chaudhary - Medium. Medium. Source: https://medium.com/@cmukesh8688/tf-idf-vectorizer-scikit-learn-dbc0244a911a last accessed 2023/11/05.
[17]	Preethi Kolluru (2023) Breast Cancer Classification Using Transfer Learning with Ensemble https://doi.org/10.15680/IJIRCCE.2024.1202006
[18]	Sklearn. ensemble. Random Forest Classifier. (n.d.). Scikit-learn. Source: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html last accessed 2023/11/01
[19]	What is Logistic Regression? - Logistic Regression Model Explained - AWS. (n.d.). Amazon Web Services, Inc. Source: https://aws.amazon.com/what-is/logistic-regression/ last accessed 2023/11/01
[20]	Liang and Bose, 1996. Liang, P. and Bose, N. (1996). Neural network fundamentals with graphs, algorithms, and applications. McGraw-Hiil, New York.
[21]	Support Vector Machines. (n.d.). Scikit-learn. Source: https://scikit-learn.org/stable/modules/svm.html last accessed 2023/11/01

Cite This Article

Plain Text BibTeX RIS

APA Style

Ramanaiah, P. K. (2024). Proteomics Data Classification Using Advanced Machine Learning Algorithm. American Journal of Artificial Intelligence, 8(1), 13-21. https://doi.org/10.11648/j.ajai.20240801.13

Copy | Download

ACS Style

Ramanaiah, P. K. Proteomics Data Classification Using Advanced Machine Learning Algorithm. Am. J. Artif. Intell. 2024, 8(1), 13-21. doi: 10.11648/j.ajai.20240801.13

Copy | Download

AMA Style

Ramanaiah PK. Proteomics Data Classification Using Advanced Machine Learning Algorithm. Am J Artif Intell. 2024;8(1):13-21. doi: 10.11648/j.ajai.20240801.13

Copy | Download

@article{10.11648/j.ajai.20240801.13,
  author = {Preethi Kolluru Ramanaiah},
  title = {Proteomics Data Classification Using Advanced Machine Learning Algorithm
},
  journal = {American Journal of Artificial Intelligence},
  volume = {8},
  number = {1},
  pages = {13-21},
  doi = {10.11648/j.ajai.20240801.13},
  url = {https://doi.org/10.11648/j.ajai.20240801.13},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajai.20240801.13},
  abstract = {Proteomics, the study of proteins and their functions within biological systems, has become increasingly data-intensive, presenting both opportunities and challenges. This project addresses the need for advanced data analytics and data integrity in proteomics research. Leveraging the power of machine learning (ML) and blockchain technology, this attempt aims to transform proteomics research. This work encompasses three key objectives. First, collect, clean, and integrate proteomics data from diverse sources, ensuring data quality and consistency. Second, employ ML algorithms to analyze this data, revealing crucial insights, identifying proteins, and predicting their functions. Third, implement blockchain technology to safeguard the authenticity and integrity of the proteomics data, providing an auditable and tamper-proof record. Implemented a user-friendly web interface, facilitating collaboration among researchers and scientists by granting access to shared data and results. This study included various classification methods for the investigation of protein classification, namely, random forests, logistic regression, neural networks, support vector machines, and decision trees. In conclusion, the proposed work is poised to revolutionize proteomics research by enhancing data analytics capabilities and securing data integrity, thereby enabling scientists to make more informed and confident discoveries in this critical field.
},
 year = {2024}
}

Copy | Download

TY - JOUR
T1 - Proteomics Data Classification Using Advanced Machine Learning Algorithm

AU - Preethi Kolluru Ramanaiah
Y1 - 2024/05/17
PY - 2024
N1 - https://doi.org/10.11648/j.ajai.20240801.13
DO - 10.11648/j.ajai.20240801.13
T2 - American Journal of Artificial Intelligence
JF - American Journal of Artificial Intelligence
JO - American Journal of Artificial Intelligence
SP - 13
EP - 21
PB - Science Publishing Group
SN - 2639-9733
UR - https://doi.org/10.11648/j.ajai.20240801.13
AB - Proteomics, the study of proteins and their functions within biological systems, has become increasingly data-intensive, presenting both opportunities and challenges. This project addresses the need for advanced data analytics and data integrity in proteomics research. Leveraging the power of machine learning (ML) and blockchain technology, this attempt aims to transform proteomics research. This work encompasses three key objectives. First, collect, clean, and integrate proteomics data from diverse sources, ensuring data quality and consistency. Second, employ ML algorithms to analyze this data, revealing crucial insights, identifying proteins, and predicting their functions. Third, implement blockchain technology to safeguard the authenticity and integrity of the proteomics data, providing an auditable and tamper-proof record. Implemented a user-friendly web interface, facilitating collaboration among researchers and scientists by granting access to shared data and results. This study included various classification methods for the investigation of protein classification, namely, random forests, logistic regression, neural networks, support vector machines, and decision trees. In conclusion, the proposed work is poised to revolutionize proteomics research by enhancing data analytics capabilities and securing data integrity, thereby enabling scientists to make more informed and confident discoveries in this critical field.

VL - 8
IS - 1
ER -

Copy | Download

Author Information

Preethi Kolluru Ramanaiah

Ernest & Young LLP, New York, USA

Contact Email

http://orcid.org/0009-0001-0570-4715

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Ramanaiah, P. K. (2024). Proteomics Data Classification Using Advanced Machine Learning Algorithm. American Journal of Artificial Intelligence, 8(1), 13-21. https://doi.org/10.11648/j.ajai.20240801.13

Copy | Download

ACS Style

Ramanaiah, P. K. Proteomics Data Classification Using Advanced Machine Learning Algorithm. Am. J. Artif. Intell. 2024, 8(1), 13-21. doi: 10.11648/j.ajai.20240801.13

Copy | Download

AMA Style

Ramanaiah PK. Proteomics Data Classification Using Advanced Machine Learning Algorithm. Am J Artif Intell. 2024;8(1):13-21. doi: 10.11648/j.ajai.20240801.13

Copy | Download

@article{10.11648/j.ajai.20240801.13,
  author = {Preethi Kolluru Ramanaiah},
  title = {Proteomics Data Classification Using Advanced Machine Learning Algorithm
},
  journal = {American Journal of Artificial Intelligence},
  volume = {8},
  number = {1},
  pages = {13-21},
  doi = {10.11648/j.ajai.20240801.13},
  url = {https://doi.org/10.11648/j.ajai.20240801.13},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajai.20240801.13},
  abstract = {Proteomics, the study of proteins and their functions within biological systems, has become increasingly data-intensive, presenting both opportunities and challenges. This project addresses the need for advanced data analytics and data integrity in proteomics research. Leveraging the power of machine learning (ML) and blockchain technology, this attempt aims to transform proteomics research. This work encompasses three key objectives. First, collect, clean, and integrate proteomics data from diverse sources, ensuring data quality and consistency. Second, employ ML algorithms to analyze this data, revealing crucial insights, identifying proteins, and predicting their functions. Third, implement blockchain technology to safeguard the authenticity and integrity of the proteomics data, providing an auditable and tamper-proof record. Implemented a user-friendly web interface, facilitating collaboration among researchers and scientists by granting access to shared data and results. This study included various classification methods for the investigation of protein classification, namely, random forests, logistic regression, neural networks, support vector machines, and decision trees. In conclusion, the proposed work is poised to revolutionize proteomics research by enhancing data analytics capabilities and securing data integrity, thereby enabling scientists to make more informed and confident discoveries in this critical field.
},
 year = {2024}
}

Copy | Download

TY - JOUR
T1 - Proteomics Data Classification Using Advanced Machine Learning Algorithm

VL - 8
IS - 1
ER -

Copy | Download