Data Matching Algorithm

From MTHWiki

Revision as of 17:59, 7 May 2008 by Wikiadmin (Talk | contribs)
Jump to: navigation, search

Payee Name Resolution

Most banks generate payee name based on the Point Of Sale (POS) identifier. The exact way this generation happens is different from bank to bank, but most of the time it does include the business name that originated the transaction and some sort of POS identifier.

For example, transactions originated from our favorite coffee shop, Dunkin Donuts on Milk Street in Boston are settled in the bank as originated by "DUNKIN #343418 Q35 ".

So how does My Money match a bank originated payee with a internal payee, in this case Dunkin Donuts ?

Synonyms

Almost every object within My Money can carry a list of synonyms. This list is a simple comma separated list of other potential names or aliases than this object is known under. So we if we define a payee "Dunkin Donuts" we can attach as many synonyms to it as we please, for example "Dunkin Donuts Shop #35, Dunkin Coffee".

My Money string matching algorithms always look for the name matches first and then for synonym matches. If the downloaded payee name and either internal payee name or its synonyms match, then we found our payee.


String Analysis

In earlier versions of My Money we mostly looked for a match between the words in the strings which was deeply flawed because if we had a common word bread in any of internal payee names, then we would match Atlanta Bread, Panera Bread as well as Au-Bon Pain Bread to the same payee.

Obviously some words are more important than the others. This problem is well understood in Computer Science, where one of the branches is called Natural language processing (NLP) and it deals deals with the problems of automated generation and understanding of natural human languages.

For our purposes we separate the strings into sequences (Markov sequences) and send the strings through multi-step N-Gram analysis to calculate how statistically important these sequences are. Then we rank our confidence level and see whether we can match an internal payee.

Here is the screen shot from the Import/Export Options page that controls this process.

Image:Stringmatcher.png

In this example the user has specified to match the strings only when confidence level is 70% or better, but two strings are compared to each other with confidence level of 58%.

My Money will ignore this potential match and will create a brand new Payee.

If the user drops the confidence level to 50% or better, then the strings are matched positively and an existing payee is reused.

Personal tools