Fuzzy Duplicates Testing
A Nairobi based level 5 teaching and referral hospital was facing some challenges with its Hospital Management and Integrated System (HMIS). There was a suspicion of data corruption / mishandling and requested us to perform a data audit analysis to confirm this.
Arbutus analyzer has very powerful data staging capabilities. We put this to good use by extracting useful data from financial statements and ledgers that were exported from the client’s financial system. The statements were primarily in PDF file format while the exported ledgers were in Microsoft’s Excel file formats.
Direct database querying of datasets on both systems: the HMIS and the Finance System. By doing so, we were able to give a qualified opinion to the lead auditor concerning the integrity of the data in the client’s systems. We created a data extraction model on Arbutus that enabled us to convert PDF financial statements into analyzable rows and columns.
Once the template was created, any similar PDF statement that was imported into Arbutus was automatically converted into this format. We then employed fuzzy matching and comparison functions (which are standard functions on Arbutus Analyzer) to match data from either system.
For this project, we also used a couple of Python libraries to stage and showcase the results for further clarity.
We were able to definitively qualify the data in the system as incomplete and corrupted.
The client was then able to leverage this to compel the vendors to re-work their systems to address this issue. It was clear that problem was aggravating as seen from the trend analysis.