Abstract: | Telecommunication operators play a vital role in connecting individuals and businesses worldwide, facilitating seamless communication and adopting global connectivity. However, the telecommunications industry is vulnerable to various malicious activities and challenges by fraudsters seeking to exploit weaknesses in the system. One such form of fraud that has emerged as a significant challenge for telecom operators is SIM-box fraud.
This thesis work is targeted to develop a model that helps to detect SIM-box fraudulent subscribers in a near real-time manner. To achieve this, we have set up API integration with Ethio telecom CRM and CBS environment to retrieve call detail record flat files (textual DB) on an hourly basis. Then, we developed a function using ASP.net C# that enables us to preprocess the textual raw data and store it in a database that has been configured using an SQL server to store call detail records, voice, SMS, and Data tuples.
SQL view has been created that joins the CDR, Voice, SMS, and Data tables to combine all the required attributes in one place to facilitate further data analysis. Next, we aggregated different tuples using SQL query and created a C# function that can derive additional attributes that help to track the behaviors of available calls. Once, we analyzed and compiled data of call detail records that incorporate the Voice, SMS, and Data utilization of each subscriber, we split the dataset into 1_hour, 1_day, and 7_day datasets and fed them into selected machine learning algorithms.
Finally, we experimented by feeding the preprocessed, aggregated, and analyzed dataset to machine learning algorithms of Random Forest (RF), Support Vector Machine (SVM), and Neural Network (NN) algorithms using sci-kit-learn (sklearn) python library and 100% accuracy has recorded in RF and NN algorithms in all 1_Hour, 1_Day and 7_Day datasets. Hence, we have concluded that with a good CDR analysis engine or module, RF, and NN can effectively identify possible fraudulent subscribers. |