1 Problem Statement
Consolidation of multiple Govt. programs implies an individual be uniquely identified in all DBs. To uniquely identify, a location address is provided (e.g. a village name) and then other details such as name, DOB, Mother’s, Father’s name etc. The name itself might not be same in all and can have variations in other DBs such as (Name – Raam, Ram, Rama), (Father’s Name – Shyam, Syam, Sham etc.). Algorithm is required such that inference can be drawn as to what percent the field value is matching between entries in other DBs. This will help estimating the overall variation % of field values in other DBs and the accuracy level if these programs are to be converged with each other.

Initially it can be one field, then can be extended to combination of multiple fields etc. Due to large volume of data, initially a sample size can be determined and taken to calculate the match.

Sample data required: No