Matching of an input document to documents in a document collection G Koutrika, D Papadimitriou, SJ Simske, [PCT/US2013/073680]

Abstract. Matching of an input document to documents in a document collection is described herein. In an example, a similarity correspondence between an input document and one or more documents in a base document collection is established. A set of base document segments and a set of message types associated to document segments in the set of base document segments is provided. The set of base document segments is derived from documents in the base document collection. The input document is segmented into input document segments corresponding to message types. Segment similarity between input document segments and base document segments corresponding to the same message types is computed. The similarity correspondence between the input document and at least one document in the base document collection is based on the computed segment similarity.