ML-Based System Failure Prediction Using Resource Utilization

Ittipon Rassameeroj, Naphat Khajohn-udomrith, Mangkhales Ngamjaruskotchakorn, Teekawin Kirdsaeng, Piyorot Khongchuay

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Digital transactions are growing exponentially on a regular basis without any sign of interruption. Unfortunately, most online service providers were not able to operate 24/7 every day due to the limitation of system components. Online service providers can be highly benefited if an accurate system failure prediction can be obtained. The adverse effect of computer failure might be mitigated if a proper prediction could be made beforehand. In this paper, we propose a simple model training approach to detect the failure that might arise in a system by parsing the log files and conducting a probabilistic analysis of future performance values in advance. We utilize a Recurrent Neural Networks (RNN), namely, Long Short-Term Memory (LSTM) to provide the optimal solution for predicting system failure by reckoning the hardware performance utilization value and utilizing the prediction of log data system for representing the system benchmark. Consequently, the significant constituents that are considered for calculation are all utilization of CPU, MEM, DISK, and NET. Apart from utilization, another essential constituent that needs to examine is System Callout, which is a representative for displaying the alert signal to inform whenever the information system should ignore the incoming transactions to maintain several server systems.

Original languageEnglish
Title of host publicationApplied Systemic Studies
EditorsHenry Selvaraj, Takayuki Fujimoto
PublisherSpringer Science and Business Media Deutschland GmbH
Pages40-50
Number of pages11
ISBN (Print)9783031274695
DOIs
Publication statusPublished - 2023
Event29th International Conference on Systems Engineering, ICSEng 2022 - Tokyo, Japan
Duration: 23 Aug 202225 Aug 2022

Publication series

NameLecture Notes in Networks and Systems
Volume611 LNNS
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389

Conference

Conference29th International Conference on Systems Engineering, ICSEng 2022
Country/TerritoryJapan
CityTokyo
Period23/08/2225/08/22

Keywords

  • Failure prediction
  • Log files
  • Machine learning
  • NMON
  • Recurrent neural networks
  • Resources utilization

Fingerprint

Dive into the research topics of 'ML-Based System Failure Prediction Using Resource Utilization'. Together they form a unique fingerprint.

Cite this