“…The works of (Slonim & Tishby, 2000), , (Yaniv & Souroujon, 2001) use heuristic procedures to cluster documents and features independently using an agglomerative algorithm. (Dhillon et al, 2002(Dhillon et al, , 2003b on the other hand, propose an information-theoretic coclustering algorithm that intertwines both row (feature) and column (document) clustering. The algorithm starts with a random partition of rows, X, and columns, Y, and computes an approximation q(X,Y) to the original distribution P(X,Y) and a corresponding compressed distribution by co-clustering rows and columns intertwined, i.e.…”