TY - GEN
T1 - ML-Based System Failure Prediction Using Resource Utilization
AU - Rassameeroj, Ittipon
AU - Khajohn-udomrith, Naphat
AU - Ngamjaruskotchakorn, Mangkhales
AU - Kirdsaeng, Teekawin
AU - Khongchuay, Piyorot
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Digital transactions are growing exponentially on a regular basis without any sign of interruption. Unfortunately, most online service providers were not able to operate 24/7 every day due to the limitation of system components. Online service providers can be highly benefited if an accurate system failure prediction can be obtained. The adverse effect of computer failure might be mitigated if a proper prediction could be made beforehand. In this paper, we propose a simple model training approach to detect the failure that might arise in a system by parsing the log files and conducting a probabilistic analysis of future performance values in advance. We utilize a Recurrent Neural Networks (RNN), namely, Long Short-Term Memory (LSTM) to provide the optimal solution for predicting system failure by reckoning the hardware performance utilization value and utilizing the prediction of log data system for representing the system benchmark. Consequently, the significant constituents that are considered for calculation are all utilization of CPU, MEM, DISK, and NET. Apart from utilization, another essential constituent that needs to examine is System Callout, which is a representative for displaying the alert signal to inform whenever the information system should ignore the incoming transactions to maintain several server systems.
AB - Digital transactions are growing exponentially on a regular basis without any sign of interruption. Unfortunately, most online service providers were not able to operate 24/7 every day due to the limitation of system components. Online service providers can be highly benefited if an accurate system failure prediction can be obtained. The adverse effect of computer failure might be mitigated if a proper prediction could be made beforehand. In this paper, we propose a simple model training approach to detect the failure that might arise in a system by parsing the log files and conducting a probabilistic analysis of future performance values in advance. We utilize a Recurrent Neural Networks (RNN), namely, Long Short-Term Memory (LSTM) to provide the optimal solution for predicting system failure by reckoning the hardware performance utilization value and utilizing the prediction of log data system for representing the system benchmark. Consequently, the significant constituents that are considered for calculation are all utilization of CPU, MEM, DISK, and NET. Apart from utilization, another essential constituent that needs to examine is System Callout, which is a representative for displaying the alert signal to inform whenever the information system should ignore the incoming transactions to maintain several server systems.
KW - Failure prediction
KW - Log files
KW - Machine learning
KW - NMON
KW - Recurrent neural networks
KW - Resources utilization
UR - http://www.scopus.com/inward/record.url?scp=85151065399&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-27470-1_5
DO - 10.1007/978-3-031-27470-1_5
M3 - Conference contribution
AN - SCOPUS:85151065399
SN - 9783031274695
T3 - Lecture Notes in Networks and Systems
SP - 40
EP - 50
BT - Applied Systemic Studies
A2 - Selvaraj, Henry
A2 - Fujimoto, Takayuki
PB - Springer Science and Business Media Deutschland GmbH
T2 - 29th International Conference on Systems Engineering, ICSEng 2022
Y2 - 23 August 2022 through 25 August 2022
ER -