The Wayback Machine - :80/projects/soempi/record_linkage_experiments.html
Record Linkage Experiments on the local SOEMPI

Record Linkage Experiments on the local SOEMPI

You supposedly imported at least two datasets to play with (in case of a PRL a Key Server component is needed, but that can be configured as the local machine itself). A wizard is accessible at the Data Providers which guide through the configuration steps: import guide.
  1. First please log in to the data provider if you haven't done so.
  2. Click on the Perform Match toolbar icon to get to the Match View.
  3. You will be presented the Match View page. You can perform record linkage (match) between two imported datasets here using the controls in the headerline of the view, and you can see all of the performed record linkages in the listview below that. In this guid we will perform a record linkage and see what results can we display about finished matches.
  4. Let us first specify the dataset which considered to be on the left side of the linkage. Selecting the dataset all of the drop-down controls which offer selection of fields of the left dataset. This includes the left "original id field" selector on this MatchView, field selectors in various dialogs of blocking and matching configuration related user interfaces.
  5. Let us then specify the dataset which considered to be on the right side of the linkage. Selecting the dataset all of the drop-down controls which offer selection of fields of the right dataset. This includes the right "original id field" selector on this MatchView, field selectors in various dialogs of blocking and matching configuration related user interfaces.
  6. Specify a unique table name for the record pair links for database persistence. This will be stored in a field of the PersonMatch entity related to the match and there will be an actual table created (prefixed by "tbl_lnk_")
  7. Select the Blocking algorithm you want to use during the record linkage procedure from the drop-down list.
  8. Select the Matching algorithm you want to use during the record linkage procedure from the drop-down list.
  9. Check the "Check True Matches checkbox if your datasets have their own inherited Id fields and you want to specify these "original id fields" in the drop-down boxes below.
  10. If your datasets have their own inherited Id fields, you can select the left "original id field" in the drop-down box if you want do so.
  11. If your datasets have their own inherited Id fields, you can select the right "original id field" in the drop-down box if you want do so.
  12. If you don't want to persist the record pairs and you are only interested in the end result of the EM algorithm please check this box. This is highly advised if you are doing a non-blocking type record linkage on large datasets. Indication that no persistence needed can dramatically decrease runtime and memory usage.
  13. Click on the Match button to finally start the procedure.
  14. An AJAX wait icon will indicate that the computation is under progress.
  15. At the end the AJAX wait icon will disappear and a new row will appear in the listview.
You can examine several properties of the performed record linkages using the icons at the end of the rows of the list view.