SF Bay ACM Data Mining SIG: Using Data Mining to Measure Similarity Between Words and Objects

3410 Hillview Avenue
Palo Alto, California 94304

Presented by Mehran Sahami, Senior Research Scientist, Google

The World Wide Web provides a wealth of data that can be harnessed to help improve information retrieval and increase understanding of the relationships between different entities. In many cases, we are often interested in determining how similar two entities may be to each other, where the entities may be pieces of text or descriptions of some object. In this work, we examine multiple instances of this problem, and show how they can be addressed by harnessing data mining techniques applied to large web-based data sets. Specifically, we examine the problems of determining the similarity of short texts (even those that may not share any terms in common) and also of learning similarity functions for semi-structured data to address tasks such as record linkage between objects. While we present rather different techniques for each problem, we show how measuring similarity between entities in these domains has a direct application to the overarching goal of improving information access for users of web-based systems.

Official Website: http://www.sfbayacm.org/events/2007-04-11.php

SF Bay ACM Data Mining SIG: Using Data Mining to Measure Similarity Between Words and Objects April 11, 2007

Interested 6