We investigate the space of all protein sequences in search of clusters of related proteins. The goal is to automatically detect these sets and thus obtain a classification of all protein sequences. The graph-based approach uses a graph representation of the sequence space. Sequences are mapped to vertices, and edges between vertices are weighted in a manner that reflects the degree of similarity between the corresponding proteins. To assign weights to edges we start from an all-vs-all comparison of SWISSPROT using the standard measures for sequence similarity (SW, FASTA and BLAST). In a preprocessing step, edges are screened to maintain only those similarities which we believe to reflect true relationships. An effort was made to include as much as possible distant relationships (weak similarities) in the graph, while avoiding meaningless, random similarities. Clusters of related proteins correspond to strongly connected components of this graph. To identify these sets, we start from a very conservative initial classification based on the highest scoring pairs. The many classes in this classification correspond to protein subfamilies. Subsequently we merge the subclasses using the weaker pairs in a two-phase clustering algorithm. The algorithm makes use of transitivity to identify homologous proteins, however, transitivity is applied {\em restrictively} in order to prevent unrelated proteins from clustering together. This process is repeated at varying levels of statistical significance. Consequently, a hierarchical organization of all proteins is obtained. These methods are discussed in detail in this chapter. The chapter starts with the main guidelines of this approach and a short review of related works, followed by a description of the process of building the graph (especially the process of assigning weights to edges), and the two-phase clustering algorithm.