concept `cosine similarity` in category `machine learning`

appears as: cosine similarity, cosine similarity

Human-in-the-Loop Machine Learning MEAP V09

This is an excerpt from Manning's book Human-in-the-Loop Machine Learning MEAP V09. Login to get full access to this book.

Figure 4.5: An example of a clustering algorithm using cosine similarity. For each cluster, the center is defined as a vector from 0 and the membership of that cluster is the angle between the vector representing the cluster and the vector representing the item.

to see more go to 4 Diversity Sampling

You can think of Cosine similarity in terms of looking at stars in the night sky. If you drew a straight line from yourself towards two stars, and measure the angle between those lines, then that angle is used to give the cosine similarity. In the night-time sky example, there are only the three physical dimensions, but in your data there is one dimension for each feature. Cosine similarity is not immune to the problems of high dimensionality, but tends to perform better than Euclidean distance especially for sparse data, like our text encodings.

to see more go to 4 Diversity Sampling

def get_cluster_samples(self, data, num_clusters=5, max_epochs=5, limit=5000):
    """Create clusters using cosine similarity
   
    Keyword arguments:
        data -- data to be clustered
        num_clusters -- the number of clusters to create
        max_epochs -- maximum number of epochs to create clusters
        limit -- sample only this many items for faster clustering (-1 = no limit)
   
    Creates clusters by the K-Means clustering algorithm,
    using cosine similarity instead of more common euclidean distance
   
    Creates clusters until converged or max_epochs passes over the data
       
    """
   
    if limit > 0:
        shuffle(data)
        data = data[:limit]
   
    cosine_clusters = CosineClusters(num_clusters)
   
    cosine_clusters.add_random_training_items(data)    #A
   
    for i in range(0, max_epochs):
        print("Epoch "+str(i))
        added = cosine_clusters.add_items_to_best_cluster(data)    #B
        if added == 0:
            break
 
    centroids = cosine_clusters.get_centroids()    #C
    outliers = cosine_clusters.get_outliers()    #D
    randoms = cosine_clusters.get_randoms(3, verbose)    #E
   
    return centroids + outliers + randoms
 
A: Initialize clusters with random assignments
B: Move each item to the cluster that it is the best fit for, and repeat
C: Sample the best-fit (centroid) from each cluster
D: Sample the biggest outlier in each cluster
E: Sample three random items from each cluster, and pass the “verbose” parameter to get an intuition for what is in each cluster.

to see more go to 4 Diversity Sampling

concept cosine similarity in category machine learning

Human-in-the-Loop Machine Learning MEAP V09

Figure 4.5: An example of a clustering algorithm using cosine similarity. For each cluster, the center is defined as a vector from 0 and the membership of that cluster is the angle between the vector representing the cluster and the vector representing the item.

Unable to load book!

concept `cosine similarity` in category `machine learning`