Confusion Matrix and Cyber crime

Divyansh garg
4 min readJun 4, 2021

💨What is Confusion Matrix ?

A confusion matrix is a performance measurement technique for Machine learning classification. It is a kind of table which helps you to the know the performance of the classification model on a set of test data for that the true values are known. It is used in places where the classification problem is highly imbalanced and one class dominates over other classes. In such scenarios, you may be surprised to see the accuracy of the model peaking at 99% but in reality, the model is highly biased towards the dominant class. There is very little possibility that you will get predictions for minority classes. Therefore, to test such an imbalanced dataset, we consider the confusion matrix.

💨FOUR OUTCOMES OF CONFUSION MATRIX

The confusion matrix visualizes the accuracy of a classifier by comparing the actual and predicted classes. The binary confusion matrix is composed of squares:

  • TP: True Positive: Predicted values correctly predicted as actual positive
  • FP: Predicted values incorrectly predicted an actual positive. i.e., Negative values predicted as positive. Also known as the Type 1 error
  • FN: False Negative: Positive values predicted as negative. Also known as the Type 2 error
  • TN: True Negative: Predicted values correctly predicted as an actual negative

The accuracy of a model (through a confusion matrix) is calculated using the given formula below.

Accuracy = TN+TP / TN+FP+FN+TP

💨 WHAT WE CAN LEARN FROM THIS ?

A valid question arises that what we can do with this matrix. There are some important terminologies based on this:

  1. Precision: It is the portion of values that are identified by the model as correct and are relevant to the problem statement solution. We can also quote this as values, which are a portion of the total positive results given by the model and are positive. Therefore, we can give its formula as TP/ (TP + FP).
  2. Recall: It is the portion of values that are correctly identified as positive by the model. It is also termed as True Positive Rate or Sensitivity. Its formula comes out to be TP/ (TP+FN).
  3. F-1 Score: It is the harmonic mean of Precision and Recall. It means that if we were to compare two models, then this metric will suppress the extreme values and consider both False Positives and False Negatives at the same time. It can be quoted as 2*Precision*Recall/ (Precision+Recall).
  4. Accuracy: It is the portion of values that are identified correctly irrespective of whether they are positives or negatives. It means that all True positives and True negatives are included in this. The formula for this is (TP+TN)/ (TP+TN+FP+FN).

Out of all the terms, precision and recall are most widely used. Their tradeoff is a useful measure of the success of a prediction. The desired model is supposed to have high precision and high recall, but this is only in perfectly separable data. In practical use cases, the data is highly unorganized and imbalanced.

💨Why you we need Confusion matrix?

Here are pros/benefits of using a confusion matrix.

  • It shows how any classification model is confused when it makes predictions.
  • Confusion matrix not only gives you insight into the errors being made by your classifier but also types of errors that are being made.
  • This breakdown helps you to overcomes the limitation of using classification accuracy alone.
  • Every column of the confusion matrix represents the instances of that predicted class.

💨CYBER CRIME CASES

Though the word Crime carries its general meaning as “a legal wrong that can be followed by criminal proceedings which may result into punishment” whereas Cyber Crime may be “unlawful acts wherein the computer is either a tool or target or both”. The world 1st computer specific law was enacted in the year 1970 by the German State of Hesse in the form of ‘Data Protection Act, 1970’ with the advancement of cyber technology. With the emergence of technology the misuse of technology has also expanded to its optimum level and then there arises a need of strict statutory laws to regulate the criminal activities in the cyber world and to protect technological advancement system. It is under these circumstances Indian parliament passed its “INFORMATION TECHNOLOGY ACT, 2000” on 17th oct to have its exhaustive law to deal with the technology in the field of e-commerce, e-governance, e-banking as well as penalties and punishments in the field of cyber crimes.

In the present world, cybercrime offenses are happening at an alarming rate. As the use of the Internet is increasing many offenders, make use of this as a means of communication in order to commit a crime. The framework developed in our work is essential to the creation of a model that can support analytics regarding the identification, detection, and classification of integrated cybercrime offenses (structured and unstructured). The main focus of our work is to find the attacks that take advantage of the security vulnerabilities and analyze these attacks by making use of machine learning techniques.

That’s all for this article.

Hope you like it.😀

--

--