DEA-Based Ensemble Learning for Breast Cancer Analysis
Abstract
Breast cancer remains one of the most prevalent and fatal malignancies among women worldwide, where timely and accurate diagnosis plays a critical role in effective treatment. This study presents an innovative ensemble learning framework that incorporates Data Envelopment Analysis (DEA) as an independent, active algorithm alongside conventional machine learning classifiers such as Random Forest and Support Vector Machine (SVM). Unlike previous approaches that used DEA merely for feature extraction, the proposed model integrates DEA directly into the collective decision-making process. The DEA component employs a radial, output-oriented BCC model under Variable Returns to Scale (VRS) technology to assess the efficiency of each patient considered as a Decision-Making Unit (DMU). Efficiency scores are then treated as standalone classification outputs and used as part of a majority voting scheme alongside predictions from the other classifiers. Implemented on the Wisconsin Breast Cancer Dataset (WBCD), the framework demonstrates enhanced performance in detecting borderline and uncertain cases. The results suggest that integrating DEA as a decision-making agent significantly improves interpretability and diagnostic accuracy. This hybrid system bridges productivity analysis with ensemble learning, offering a novel and interpretable decision support approach for clinical breast cancer classification.
Keywords:
Amari error, Clustering, Cumulative distribution function, Dependence criteria, Independent components analysisReferences
- [1] Charnes, A., Cooper, W. W., & Rhodes, E. (1978). Measuring the efficiency of decision making units. European journal of operational research, 2(6), 429–444. https://doi.org/10.1016/0377-2217(78)90138-8
- [2] Banker, R. D., Charnes, A., & Cooper, W. W. (1984). Some models for estimating technical and scale inefficiencies in data envelopment analysis. Management science, 30(9), 1078–1092. https://doi.org/10.1287/mnsc.30.9.1078
- [3] Färe, R., Grosskopf, S., Lovell, C. A. K., & Pasurka, C. (1989). Multilateral productivity comparisons when some outputs are undesirable: A nonparametric approach. The review of economics and statistics, 90–98. https://doi.org/10.2307/1928055
- [4] Cooper, W. W., Seiford, L. M., & Zhu, J. (2011). Handbook on data envelopment analysis. Springer. https://doi.org/10.1007/b105307%0A%0A
- [5] Plasseraud, K. M., Cook, R. W., Tsai, T., Shildkrot, Y., Middlebrook, B., Maetzold, D., & Aaberg, T. M. (2016). Clinical performance and management outcomes with the decisiondx‐UM gene expression profile test in a prospective multicenter study. Journal of oncology, 2016(1), 5325762. https://doi.org/10.1155/2016/5325762
- [6] Mirmozaffari, M., Yazdani, M., Boskabadi, A., Ahady Dolatsara, H., Kabirifar, K., & Amiri Golilarz, N. (2020). A novel machine learning approach combined with optimization models for eco-efficiency evaluation. Applied sciences, 10(15), 5210. https://doi.org/10.3390/app10155210
- [7] Zheng, Z., & Padmanabhan, B. (2007). Constructing ensembles from data envelopment analysis. INFORMS journal on computing, 19(4), 486–496. https://doi.org/10.1287/ijoc.1060.0180
- [8] Zhu, D. (2010). A hybrid approach for efficient ensembles. Decision support systems, 48(3), 480–487. https://doi.org/10.1016/j.dss.2009.06.007