concept Tanimoto similarity in category data
appears as: Timoto similarity, The Timoto similarity, Timoto similarities

This is an excerpt from Manning's book Data Science Bookcamp: Five Python Projects MEAP V04 livebook.
We are comparing Normalized Query Vector and Normalized Title A vector The Tanimoto similarity between vectors is 1.0000 The cosine similarity between vectors is 1.0000 The Euclidean distance between vectors is 0.0000 The angle between vectors is 0.0000 degrees We are comparing Normalized Query Vector and Title B Vector The Tanimoto similarity between vectors is 0.5469 The cosine similarity between vectors is 0.7071 The Euclidean distance between vectors is 0.7654 The angle between vectors is 45.0000 degrees
Figure 13.12. Transforming our three texts into a normalized matrix. The initial texts appear in the upper-left corner of the figure. These texts share a vocabulary of 15 unique words. We leverage the vocabulary to transform the texts into a matrix of word-counts. This count matrix appears in the upper-right corner of the figure. Its three rows correspond to the three texts. Its 15 columns track the word-occurrence count of every word within each text. We’ll now normalize these counts, by dividing each row by its magnitude. The normalization will produce matrix in the lower-right corner of the figure. The dot product between any two rows in the normalized matrix will equal the cosine similarity between the corresponding texts. Subsequently, running
cos / (2 - cos)
will transform the cosine similarity into the Tanimoto similarity.![]()
Listing 13.32. Computing a table of normalized Tanimoto similarities
num_texts = len(tf_vectors) similarities = np.array([[0.0] * num_texts for _ in range(num_texts)]) #1 similarities = np.zeros((num_texts, num_texts)) unit_vectors = np.array([vector / norm(vector) for vector in tf_vectors]) for i, vector_a in enumerate(unit_vectors): for j, vector_b in enumerate(unit_vectors): similarities[i][j] = normalized_tanimoto(vector_a, vector_b) labels = ['Text 1', 'Text 2', 'Text 3'] sns.heatmap(similarities, cmap='YlGnBu', annot=True, xticklabels=labels, yticklabels=labels) plt.yticks(rotation=0) plt.show()Figure 13.13. A table of normalized of Tanimoto similarities across text-pairs. The table’s diagonal represents the similarity between each text and itself. Not surprisingly, that similarity is 1. Ignoring the diagonal, we see that texts 1 and 2 share the highest similarity. Meanwhile, texts 2 and 3 share the lowest similarity.
![]()