Fuzzy Duplicates Testing

the challenge

A Nairobi based level 5 teaching and referral hospital was facing some challenges with its Hospital Management and Integrated System (HMIS). There was a suspicion of data corruption / mishandling and requested us to perform a data audit analysis to confirm this.

Solution

Arbutus analyzer has very powerful data staging capabilities. We put this to good use by extracting useful data from financial statements and ledgers that were exported from the client’s financial system. The statements were primarily in PDF file format while the exported ledgers were in Microsoft’s Excel file formats.

Direct database querying of datasets on both systems: the HMIS and the Finance System. By doing so, we were able to give a qualified opinion to the lead auditor concerning the integrity of the data in the client’s systems. We created a data extraction model on Arbutus that enabled us to convert PDF financial statements into analyzable rows and columns.

Once the template was created, any similar PDF statement that was imported into Arbutus was automatically converted into this format. We then employed fuzzy matching and comparison functions (which are standard functions on Arbutus Analyzer) to match data from either system.

For this project, we also used a couple of Python libraries to stage and showcase the results for further clarity.

Results

We were able to definitively qualify the data in the system as incomplete and corrupted.
The client was then able to leverage this to compel the vendors to re-work their systems to address this issue. It was clear that problem was aggravating as seen from the trend analysis.

  • HMIS Ledger
  • Financial System Statements

Start by doing what’s necessary; then do what’s possible; and suddenly you will have solved the problem and doing the impossible.

placeholder
St Francis of Assisi
1181 - 1226

Seeing is believing.

Get in touch with us today for a demonstration (using your own data) of what our analysis can do for you.

START YOUR BUSINESS ON THE ROAD TO DATA DRIVEN PERFORMANCE