Clustering Implementation
Bunch
Bunch implementation and extension for the consensus-based decomposition approach: zip.
Alternatively, the Bunch clustering approach can be executed using a jar file: jar.
To run the jar file:
java -cp Bunch.jar bunch.RunBunch <relationship graph: file in .csv format> <consensus groups: file in .txt format> <population size: integer> <output dir: directory path>
Inputs:
- The
relationship graph
must be a CSV file. Specifically, each line represents a relationship and must be formatted as follows:[caller class],[callee class],[relationship weight]
. - The
consensus groups
in TXT format contains all the elements in the relationship graph that you wish to lock together during the clustering process. Each line represents a consensus group and must be formatted as follows:SS('GROUP_NAME'.ss) = entity0, entity1, entity2...
.- Note: if you wish to define NO consensus groups, then input an empty txt file.
- The
population size
is parameter used to optimize the clustering. The higher the population size, the better the clustering result. - The
output dir
is the directory path where the results will be saved.
Examples:
- To produce the by-static decomposition, execute with the static relationships graph and no consensus groups.
- To produce the consensus-based decomposition, execute with the combined weighted relationship graph with consensus groups.
Spectral Clustering
Spectral clustering implementation: zip.
Execute the run_consensus_spectral_clustering.py
script with the following arguments:
--relationship-graph
*: relationship graph in CSV format--output
*: output destination--num-of-cluster
*: number of clusters k--consensus-groups
: file containing the consensus groups (i.e. groups of entities that should be locked together during clustering) in TXT format. If this is not provided, then no entities are locked together.- Note: this argument is required to produce the consensus-based decomposition
--normalize
: boolean to indicate whether the graph should be normalized before clustering. If the graph is already normalized, set to false.--directed
: boolean to indicate whether the graph is directed. This should be set to true for static and combined weighted relationship graphs, and false for name relationship graphs.