This paper describes the Fuzzy Lookup transformation and Fuzzy Grouping transformation in Data Transformation Services (DTS) for SQL Server 2005. Fuzzy Grouping transformations are useful for improving data quality at the destination database. Fuzzy Lookup matches input that is "dirty"—because of misspellings, truncations, missing or inserted tokens, null fields, unexpected abbreviations, and other irregularities—with clean records in a reference table. Fuzzy Grouping detects similarities between input rows and determines which rows are duplicates by using their string values. Fuzzy Grouping and Fuzzy Lookup are useful primitives that can simplify a variety of data cleaning and preparation tasks that are frequently encountered in data warehousing. By customizing them for your domain, you can leverage general search and clustering algorithms inside the DTS Designer while avoiding complex custom code.
|