Effort estimation for software products targeted at the manufacturing sector using machine learning algorithms

Diane Lenhart; Matheus Henrique Ribeiro; Flavio Trojan

doi:10.1590/0103-6513.20240092

Research Article

Effort estimation for software products targeted at the manufacturing sector using machine learning algorithms

Diane Lenhart; Matheus Henrique Ribeiro; Flavio Trojan

http://dx.doi.org/10.1590/0103-6513.20240092 Production, vol.35, e20240092, 2025

PDF

Downloads: 2

Abstract

Paper aims: This study seeks to investigate the accuracy of machine learning algorithms for estimation of the effort required for software development in the manufacturing sector to identify the most effective algorithms according to the nature and complexity of the data and the number of available attributes.

Originality: This work distinguishes itself from other studies in the field of effort prediction by utilizing a data repository that consists exclusively of projects from the manufacturing sector. This approach ensures that the specific characteristics of manufacturing projects are reflected in the predictions, addressing a gap in the existing literature. Another notable contribution of this study is the comparative analysis of various machine learning algorithms assessed under different dimensionality scenarios (three and five variables). Although this factor is crucial for enhancing effort estimation accuracy, it has received limited attention in the literature.

Research method: The investigated techniques in this work were (i) Support Vector Regression, (ii) Gradient Boosting Machines (GBM), (iii) eXtreme Gradient Boosting (XGBoost), (iv) Random Forest (RF), (v) Extreme Learning Machine (ELM); and (vi) Linear Regression (LR). Performance measures such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination (R²) were used to compare the results achieved by each model, considering a dataset of 230 records originating from various countries.

Main findings: The comparison among machine learning models revealed significant performance variations depending on the number of variables and the evaluation metrics adopted. GBM stood out for its robustness in complex scenarios, while SVR achieved the lowest mean absolute error. ELM, in turn, proved effective with fewer variables but showed sensitivity to outliers and less stability in more complex contexts. Among all the techniques evaluated, XGB yielded the worst performance across all parameters.

Implications for theory and practice: This study contributes by applying these models to the manufacturing sector and comparing scenarios with three and five variables. The results support a more informed selection of models based on project complexity and data dimensionality. The more research conducted in this area, the stronger the theoretical and practical conclusions can be drawn.

Keywords

Software effort estimation, Software in the manufacturing sector, Software project management, Machine learning

References

Al-Betar, M. A., Kassaymeh, S., Makhadmeh, S. N., Fraihat, S., & Abdullah, S. (2023). Feedforward neural network-based augmented salp swarm optimizer for accurate software development cost forecasting. Applied Soft Computing, 149, 111008. http://doi.org/10.1016/j.asoc.2023.111008.

Bontempi, G., Ben Taieb, S., & Le Borgne, Y. A. (2013). Machine learning strategies for time series forecasting. In: M.-A. Aufaure, E. Zimányi (Eds.), Business intelligence (pp. 62-77). Berlin: Springer. http://doi.org/10.1007/978-3-642-36318-4_3

Box, G. E., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time series analysis: forecasting and control. Hoboken: John Wiley & Sons.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. http://doi.org/10.1023/A:1010933404324.

Chen, T., & Guestrin, C. (2016). Xgboost: a scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794). New York: Association for Computing Machinery. http://doi.org/10.1145/2939672.2939785.

Chou, J. S., Cheng, M. Y., Wu, Y. W., & Wu, C. C. (2012). Forecasting enterprise resource planning software effort using evolutionary support vector machine inference model. International Journal of Project Management, 30(8), 967-977. http://doi.org/10.1016/j.ijproman.2012.02.003.

Freedman, D. A. (2009). Statistical models: theory and practice. Cambridge: Cambridge University Press. http://doi.org/10.1017/CBO9780511815867.

Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5), 1189-1232. http://doi.org/10.1214/aos/1013203451.

Huang, G.-B., Wang, D. H., & Lan, Y. (2011). Extreme learning machines: a survey. International Journal of Machine Learning and Cybernetics, 2(2), 107-122. http://doi.org/10.1007/s13042-011-0019-y.

Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2006). Extreme learning machine: theory and applications. Neurocomputing, 70(1–3), 489-501. http://doi.org/10.1016/j.neucom.2005.12.126.

Hyndman, R. J., & Koehler, A. B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4), 679-688. http://doi.org/10.1016/j.ijforecast.2006.03.001.

Jabeur, S. B., Mefteh-Wali, S., & Viviani, J. L. (2024). Forecasting gold price with the XGBoost algorithm and SHAP interaction values. Annals of Operations Research, 334(1), 679-699. http://doi.org/10.1007/s10479-021-04187-w.

Jiang, J., Chen, Z., Wang, Y., Peng, T., Zhu, S., & Shi, L. (2019). Parameter estimation for PMSM based on a back propagation neural network optimized by chaotic artificial fish swarm algorithm. International Journal of Computers, Communications & Control, 14(6), 615-632. http://doi.org/10.15837/ijccc.2019.6.3705.

Kassaymeh, S., Alweshah, M., Al-Betar, M. A., Hammouri, A. I., & Al-Ma’aitah, M. A. (2024). Software effort estimation modeling and fully connected artificial neural network optimization using soft computing techniques. Cluster Computing, 27(1), 737-760. http://doi.org/10.1007/s10586-023-03979-y.

Kaushik, A., Tayal, D. K., & Yadav, K. (2020). A comparative analysis on effort estimation for agile and non-agile software projects using DBN-ALO. Arabian Journal for Science and Engineering, 45(4), 2605-2618. http://doi.org/10.1007/s13369-019-04250-6.

Kumar, P. S., Behera, H. S., Kumari, A., Nayak, J., & Naik, B. (2020). Advancement from neural networks to deep learning in software effort estimation: perspective of two decades. Computer Science Review, 38, 100288. http://doi.org/10.1016/j.cosrev.2020.100288.

Lavingia, K., Patel, R., Patel, V., & Lavingia, A. (2024). Software effort estimation using machine learning algorithms. Scalable Computing: Practice and Experience, 25(2), 1276-1285. http://doi.org/10.12694/scpe.v25i2.2213.

López-Martín, C. (2022). Machine learning techniques for software testing effort prediction. Software Quality Journal, 30(1), 65-100. http://doi.org/10.1007/s11219-020-09545-8.

Myers, R. H., Montgomery, D. C., Vining, G. G., & Robinson, T. J. (2012).Generalized linear models: with applications in engineering and the sciences. Hoboken: John Wiley & Sons.

Rahman, M., Roy, P. P., Ali, M., Gonçalves, T., & Sarwar, H. (2023). Software effort estimation using machine learning technique. International Journal of Advanced Computer Science and Applications, 14(4), 822-827. http://doi.org/10.14569/IJACSA.2023.0140491.

Rankovic, N., Rankovic, D., Ivanovic, M., & Lazic, L. (2021). A new approach to software effort estimation using different artificial neural network architectures and Taguchi orthogonal arrays. IEEE Access: Practical Innovations, Open Solutions, 9, 26926-26936. http://doi.org/10.1109/ACCESS.2021.3057807.

Rao, K. E., Terlapu, P. R. V., Naidu, P. A., Kumar, T. R., & Pydi, B. M. (2024). Feature importance for software development effort estimation using multi level ensemble approaches. Bulletin of Electrical Engineering and Informatics, 13(2), 1090-1102. http://doi.org/10.11591/eei.v13i2.5531.

Ribeiro, M. H. D. M. (2021). Time series forecasting based on ensemble learning methods applied to agribusiness, epidemiology, energy demand, and renewable energy (Doctoral dissertation). Pontifícia Universidade Católica do Paraná, Curitiba.

Sharma, S., & Vijayvargiya, S. (2020). Applying soft computing techniques for software project effort estimation modelling. InNanoelectronics, Circuits and Communication Systems: Proceeding of NCCS 2019(pp. 211-227). Singapore: Springer.

Van Hai, V., Nhung, H. L. T. K., Prokopova, Z., Silhavy, R., & Silhavy, P. (2022a). Toward improving the efficiency of software development effort estimation via clustering analysis. IEEE Access: Practical Innovations, Open Solutions, 10, 83249-83264. http://doi.org/10.1109/ACCESS.2022.3185393.

Varshini, A. G. P., & Kumari, K. A. (2024). Software effort estimation using stacked ensemble technique and hybrid principal component regression and multivariate adaptive regression splines. Wireless Personal Communications, 134(4), 2259-2278. http://doi.org/10.1007/s11277-024-11010-9.

Submitted date:
09/05/2024

Accepted date:
09/22/2025

Effort estimation for software products targeted at the manufacturing sector using machine learning algorithms

Diane Lenhart; Matheus Henrique Ribeiro; Flavio Trojan

Abstract

Keywords

References

Links

Share

Production