WizRule – Data Mining for Auditing
Abraham Meidan, Ph.D.
One of the main tasks of auditors, forensic investigators and data-quality managers is revealing fraudulent cases and errors in data. WizRule can help in carrying out this task. WizRule is a data-auditing tool based on data mining technology. It performs an analysis of the data revealing inconsistencies and “strange” cases to be investigated.
The standard method for revealing fraudulent cases and errors uses reports that filter and sort the data. For example, one such report may list all the transactions in which the discount percentage is above a certain threshold. There are several tools – mainly ACL and IDEA – that can be used in order to issue these kinds of reports.
WizRule does not compete with these tools. Rather it complements them.
By definition the above mentioned reports can only reveal frauds and errors that the report was designed to find. For example, if the investigators suspect that some transactions include fraudulent discounts, they may issue the above-mentioned report. But if they don’t suspect a problem related to the discount, the reports issued by tools such as Idea and ACL, will not discover a fraudulent discount. And since there is an enormous number of possible frauds it is impractical to generate a report for each of them.
This is where WizRule can help. WizRule works automatically – the user just selects the data and WizRule does the analysis. WizRule checks all the relationships among the values within the various fields and reports unexpected and unlikely cases. Therefore, WizRule reveals fraudulent cases missed by the standard auditing tools.
How does WizRule work?
WizRule is a data mining tool. Data mining programs reveal interesting patterns in data.
Usually data mining tools reveal the patterns in data registered in the past and use these patterns to issue predictions for new cases. For example, a bank may apply a data mining program in order to reveal the patterns of customers that did not pay their loans. Then, when a new customer asks for a loan these patterns are used to calculate the probability of default on this loan. This approach is also used for revealing frauds. For example, credit cards companies use data mining programs in order to discover the patterns of known fraudulent cases and apply these patterns when checking new transactions. But this approach cannot be used when one searches for fraudulent cases without having previous examples.
This is where WizRule is relevant. Instead of issuing predictions for new cases WizRule points out cases deviating from valid patterns. WizRule‘s approach is based on the following assumption: In many cases, frauds are exceptions to the rule. For example, if in all sales transactions to a certain +customer the salesperson is Dan, and there is a single transaction in which the salesperson is someone else, who is usually connected with other customers, then this is a suspected case that should be investigated.
In creating a software application that discovers exceptions to the rule, the program first needs to discover all the rules (patterns) in a given data set. In other words, the software should do a “reverse engineering” of the rules that created the data. This is precisely WizRule’s strong point. WizRule is based on a mathematical algorithm that is capable of revealing all the rules governing a data set within a very short span of time. The output of a WizRule analysis is a list of records that are unlikely in reference to the discovered rules. These records are suspected cases, or at least cases to be investigated.
When using WizRule, you simply select the data that you wish to analyze and the software does all the rest. Within a short time the analysis report is displayed on the screen.
When analyzing the data, WizRule performs the following operations:
It first reads the data. WizRule can read all the standard databases. You are then given the opportunity to “fine-tune” the analysis parameters such as “minimum probability of if-then rules” and “minimum number of cases of a rule.” You can also define exactly which types of rules WizRule should search for and whether some fields should be ignored.
Within a short time, WizRule reveals the rules governing the data and points out the cases deviating from the discovered rules. Each deviation is displayed along with the rules from which it deviates.
What kind of rules does WizRule reveal?
WizRule analyzes the data by revealing four types of rules:
- Formula rules
- If-then rules
- Outstanding rules
- Spelling rules
As mentioned, the user does not enter the rules. Rather all the rules are discovered automatically.
An example of a formula rule is:
A = B * C
Where: A = Total
B = Quantity
C = Unit Price
Rule’s Accuracy Level: 0.99
The rule exists in 1890 records
The “Accuracy Level” in formula rules indicates the ratio between the number of cases in which the formula holds and the total number of relevant cases. The cases in which the formula holds are those cases where the formula matches the data exactly except for deviations that may result from a rounding.
WizRule reveals arithmetical formulas with up to 5 variables that hold in the data. Formulas where A is 0 or 1 are ignored.
Obviously if a formula rule holds for all the records in the data except for just a few records, then these deviating records should be investigated.
An example of an if-then rule is:
If Customer is Summit
and Item is Computer type A
Price = 765
Rule’s probability: 0.998
The rule exists in 1002 records
Significance level: error probability < 0.001
The “Probability” in if-then rules designates the ratio between the number of records in which the condition(s) and the result hold, and the corresponding number of records in which the condition(s) hold with or without the result.
The “Significance Level” indicates the degree of the rule’s validity. It is equal to 1 minus the “error probability”, which quantifies the probability that the rule exists accidentally in the data under analysis.
WizRule reveals all the if-then rules with any number of conditions.
Once again, a deviation from a highly valid rule might point to a fraud.
An example of an outstanding rule is:
If Customer is Summit
% Discount = 20
The rule is unexpected since:
There are 100 values in the Customer field, each having no less than the minimum number of cases in a rule. Found no similar rules that relate between the other values in the customer field and the values in the % Discount field.
In other words, the rule is outstanding since it is the only rule that relates between a certain customer and a certain % discount (while all the customers have various discounts).
If the % discount of all the other customers were 10% for example, the above mentioned rule would still be outstanding (since the discount of this customer deviates from the discount of all the other customers).
An example of a spelling rule is:
The value Summit appears 2080 times
in the Customer field.
There are 2 case(s) containing similar value(s)
These rules are presented mainly in order to reveal cases of misspelled names. A name is suspected as misspelled if (a) it is similar to another name in this field, and (b) the frequency of the first name is very low, while the frequency of the second name is very high. For example, if the name Zummit appears only one time (in the Customer field) it will be presented as a deviation to be examined.
How does WizRule avoid False Alarms?
Following the discovery of the rules that govern the data, WizRule checks the deviations from these rules. However, not every deviation from a rule is a case to be examined. Suppose WizRule reveals the following if-then rule:
If Customer is Summit
Salesperson is Dan
Rule’s probability: 0.98
The rule exists in 1003 records
Significance level: error probability < 0.001
Since the rule’s probability is 0.98 and the rule exists in 1003 records, then there are about 20 records in which the salesperson deviates from this rule. Reviewing each of these 20 records is quite tedious and many of these deviations might be false alarms.
To avoid such false alarms WizRule checks whether the deviation is explainable by another rule that holds in the data. If the answer is positive the case is not a suspected one. For example, it may be the case that if the item is computer then the salesperson is John, and this rule may explain some of the above-mentioned deviations. Since these deviations are explainable they are not considered as cases to be investigated.
WizRule also checks the frequency of the then value in the deviated case. Its frequency under the rule conditions should be lower than its overall frequency in the data. If it does not, then once again the case is not considered as a case to be investigated. For example, if the salesperson in two of the deviated records is Frank and these are the only two cases in the entire data where Frank is the salesperson, then these cases are not considered as cases to be investigated. However, if Frank is usually the salesperson of other customers, then the above-mentioned deviations are indeed suspected cases.
When the then field is numeric, WizRule also lets you reduce false alarms by only displaying deviations where the then value deviates from the expected value by at least one standard deviation. Smaller deviations are ignored.
When viewing the suspected cases you can sort the cases by the then field, and sort the values within this field. This sorting is mainly relevant when the report lists many deviations. By sorting the deviations you can concentrate on the most interesting cases.
Applying all these methods WizRule avoids almost all the false alarms and points out only the cases that need to be investigated.