Clustering Implementation


Bunch

Bunch implementation and extension for the consensus-based decomposition approach: zip.
Alternatively, the Bunch clustering approach can be executed using a jar file: jar.

To run the jar file:
java -cp Bunch.jar bunch.RunBunch <relationship graph: file in .csv format> <consensus groups: file in .txt format> <population size: integer> <output dir: directory path>

Inputs:

  • The relationship graph must be a CSV file. Specifically, each line represents a relationship and must be formatted as follows: [caller class],[callee class],[relationship weight].
  • The consensus groups in TXT format contains all the elements in the relationship graph that you wish to lock together during the clustering process. Each line represents a consensus group and must be formatted as follows: SS('GROUP_NAME'.ss) = entity0, entity1, entity2....
    • Note: if you wish to define NO consensus groups, then input an empty txt file.
  • The population size is parameter used to optimize the clustering. The higher the population size, the better the clustering result.
  • The output dir is the directory path where the results will be saved.


Examples:

  • To produce the by-static decomposition, execute with the static relationships graph and no consensus groups.
  • To produce the consensus-based decomposition, execute with the combined weighted relationship graph with consensus groups.

Spectral Clustering

Spectral clustering implementation: zip.

Execute the run_consensus_spectral_clustering.py script with the following arguments:

  • --relationship-graph*: relationship graph in CSV format
  • --output*: output destination
  • --num-of-cluster*: number of clusters k
  • --consensus-groups: file containing the consensus groups (i.e. groups of entities that should be locked together during clustering) in TXT format. If this is not provided, then no entities are locked together.
    • Note: this argument is required to produce the consensus-based decomposition
  • --normalize: boolean to indicate whether the graph should be normalized before clustering. If the graph is already normalized, set to false.
  • --directed: boolean to indicate whether the graph is directed. This should be set to true for static and combined weighted relationship graphs, and false for name relationship graphs.