Improving Breast Cancer Classification using (SMOTE) Technique and Pectoral Muscle Removal in Mammographic Images

  • Srwa Hasan Abdulla Sulaimani University
  • Ali Makki Sagheer Computer Science and Information Technology, Al-Qalam University College, Iraq
  • Hadi Veisi Computer Engineering, Tehran University, Iran
Keywords: Breast Cancer, Mammogram, Pectoral Muscle, K-means Clustering, SMOTE, Random Forest


Computer-aided diagnosis methods are being developed to assist radiologists to improve the interpretation of mammograms for the detection and diagnose of breast cancer, reduce the errors and mistakes made by human beings. In addition, it provides a more accurate and reliable classification of benign and malignant abnormalities. In the mammogram diagnosis, the pectoral muscle appears in Mediolateral oblique views (MLO) of the right and left of the breast. Considering that, the pectoral muscle has the same density as the small suspicious masses in the image and can affect/bias the results of image processing methods. This paper presents a diagnosis method to detect an abnormality in mammograms automatically. Before abnormality identification, image-processing techniques are used to correctly segment the suspicious region-of-interest (ROI). The background of the mammograms has been darkened to distinguish the breast area from any blemishes or writings that will be removed. Then the breast area has been extracted after ignoring the empty regions around the breast in mammogram images. After that, the mammogram image is inverted and the inverted image is then subtracted from the initial image. For pectoral muscle removal, a region growing method using the K-means clustering method is used. Afterward, suspicious ROI is segmented utilizing the K-means with thresholding technique. To detect abnormalities in mammograms, shape-based features, moment invariants, and also fractal dimensions are extracted from the segmented ROI. The Mini-MIAS dataset is used to evaluate the proposed method and is predominately composed of benign samples with only a tiny percent of malignant samples. To accomplish far better classifier efficiency, the SMOTE algorithm is used to present new samples from the minority classes to get a balanced dataset. Random forest classifier utilized to classify the segmented region as benign and malignant. The experimental results obtained an accuracy of 97.1%, the sensitivity of 95.1%, and the achieved specificity is 98.5%.


How common is breast cancer?: Breast cancer statistics. Accessed: 2021-09-10.

Biau, G., and Scornet, E. A random forest guided tour. Test 25, 2 (2016), 197–227.

Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321–357.

Chen, Q., Petriu, E., and Yang, X. A comparative study of fourier descriptors and hu’s seven moment invariants for image recognition. In Canadian conference on electrical and computer engineering 2004 (IEEE Cat. No. 04CH37513) (2004), vol. 1, IEEE, pp. 103–106.

Fern´andez, A., Garcia, S., Herrera, F., and Chawla, N. V. Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research 61 (2018), 863–905.

Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., and Herrera, F. A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42, 4 (2011), 463–484.

Gowri, D. S., and Amudha, T. A review on mammogram image enhancement techniques for breast cancer detection. In 2014 International Conference on Intelligent Computing Applications (2014), IEEE, pp. 47–51.

Hazarika, M., and Mahanta, L. B. A novel region growing based method to remove pectoral muscle from mlo mammogram images. In Advances in Electronics, Communication and Computing. Springer, 2018, pp. 307–316.

Hoo, Z. H., Candlish, J., and Teare, D. What is an roc curve?, 2017.

Hussein, A. S., Li, T., Yohannese, C. W., and Bashir, K. A-smote: A new preprocessing approach for highly imbalanced datasets by improving smote. International Journal of Computational Intelligence Systems 12, 2 (2019), 1412–1422.

Ishwaran, H., and O’Brien, R. Commentary: the problem of class imbalance in biomedical data. The Journal of thoracic and cardiovascular surgery 161, 6 (2021), 1940.

Jaleel, J. A., Salim, S., and Archana, S. Textural features based computer aided diagnostic system for mammogram mass classification. In 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT) (2014), IEEE, pp. 806–811.

Jesneck, J. L., Lo, J. Y., and Baker, J. A. Breast mass lesions: computer-aided diagnosis models with mammographic and sonographic descriptors. Radiology 244, 2 (2007), 390–398.

Kamil, M. Y., and Jassam, A.-L. A. Analysis of tissue abnormality in mammography images using gray level co-occurrence matrix method. In Journal of Physics: Conference Series (2020), vol. 1530, IOP Publishing, p. 012101.

Kashyap, K. L., Bajpai, M. K., and Khanna, P. Breast cancer detection in digital mammograms. In 2015 IEEE international conference on imaging systems and techniques (IST) (2015), IEEE, pp. 1–6.

Kaur, P., Singh, G., and Kaur, P. Intellectual detection and validation of automated mammogram breast cancer images by multi-class svm using deep learning classification. Informatics in Medicine Unlocked 16 (2019), 100151.

Kelly, K. M., Dean, J., Comulada, W. S., and Lee, S.-J. Breast cancer detection using automated whole breast ultrasound and mammography in radiographically dense breasts. European radiology 20, 3 (2010), 734–742.

Khosravy, M., Gupta, N., Marina, N., Sethi, I. K., and Asharif, M. R. Morphological filters: An inspiration from natural geometrical erosion and dilation. In Nature-inspired computing and optimization. Springer, 2017, pp. 349–379.

Krawczyk, B. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence 5, 4 (2016), 221–232.

Makandar, A., and Halalli, B. Combined segmentation technique for suspicious mass detection in mammography. In 2015 International Conference on Trends in Automation, Communications and Computing Technology (I-TACT-15) (2015), IEEE, pp. 1–5.

Oshiro, T. M., Perez, P. S., and Baranauskas, J. A. How many trees in a random forest? In International workshop on machine learning and data mining in pattern recognition (2012), Springer, pp. 154–168.

Patil, R. S., and Biradar, N. Improved region growing segmentation for breast cancer detection: progression of optimized fuzzy classifier. International Journal of Intelligent Computing and Cybernetics (2020).

Pratiwi, M., Harefa, J., Nanda, S., et al. Mammograms classification using gray-level occurrence matrix and radial basis function neural network. Procedia Computer Science 59 (2015), 83–91.

Punitha, S., Amuthan, A., and Joseph, K. S. Benign and malignant breast cancer segmentation using optimized region growing technique. Future Computing and Informatics Journal 3, 2 (2018), 348–358.

Quist, J., Taylor, L., Staaf, J., and Grigoriadis, A. Random forest modelling of highdimensional mixed-type data for breast cancer classification. Cancers 13, 5 (2021), 991.

Radhi, E. A., and Kamil, M. Y. Breast tumor detection via active contour technique. International Journal of Intelligent Engineering and Systems 14, 4 (2021), 561–570.

Ranganath, A., Senapati, M. R., and Sahu, P. K. Estimating the fractal dimension of images using pixel range calculation technique. The Visual Computer 37, 3 (2021), 635–650.

Roemer, V., and Walden, R. Sensitivity, specificity, receiver-operating characteristic (roc) curves and likelihood ratios for electronic foetal heart rate monitoring using new evaluation techniques. Zeitschrift f¨ur Geburtshilfe und Neonatologie 214, 03 (2010), 108–118.

Sharma, S., and Khanna, P. Roi segmentation using local binary image. In 2013 IEEE International Conference on Control System, Computing and Engineering (2013), IEEE, pp. 136–141.

Sharma, S., and Mehra, R. Conventional machine learning and deep learning approach for multi-classification of breast cancer histopathology images—a comparative insight. Journal of digital imaging 33, 3 (2020), 632–654.

Shiraishi, J., Li, Q., Appelbaum, D., and Doi, K. Computer-aided diagnosis and artificial intelligence in clinical imaging. In Seminars in nuclear medicine (2011), vol. 41, Elsevier, pp. 449–462.

Suckling J, P. The mammographic image analysis society digital mammogram database. Digital Mammo (1994), 375–386.

Tou, J. T. Computer-based particle shape analysis for classification, recognition, utilization. In Advanced Particulate Morphology. CRC Press, 2018, pp. 165–170.

Xie, W., Li, Y., and Ma, Y. Breast mass classification in digital mammography based on extreme learning machine. Neurocomputing 173 (2016), 930–941.

Zhi, H., Ou, B., Luo, B.-M., Feng, X., Wen, Y.-L., and Yang, H.-Y. Comparison of ultrasound elastography, mammography, and sonography in the diagnosis of solid breast lesions. Journal of ultrasound in medicine 26, 6 (2007), 807–815.

How to Cite
Abdulla, S., Sagheer, A. and Veisi, H. 2021. Improving Breast Cancer Classification using (SMOTE) Technique and Pectoral Muscle Removal in Mammographic Images. MENDEL. 27, 2 (Dec. 2021), 36-43. DOI: