Introduction to K-means clustering
Today we will introduce K-means Clustering! K-means is one of the most popular and fundamental unsupervised learning algorithms in machine learning. T...
LLMs generate inconsistent lexical labels for identical inputs, but these labels are often semantically similar.
Clustering generated labels using vector embeddings and disjoint set union enables consistent classification, reducing unique labels from linear to sublinear growth as dataset size increases. This vectorization approach is initially slower and slightly more expensive, but scales to become significantly cheaper and faster than pure LLM-only classification at large volumes.