Project summary

The overall goal of SCHISM is to develop a robust approach to interactive DM, and deliver a prototype that allows users to launch pattern mining or clustering algorithms, visualize the results, give feedback, and rerun mining operations, which will take the given feedback into account. Our research hypothesis can be expressed
bluntly: putting the user back into the loop will lead to better understood results, and mining processes that actually run faster than in the non-interactive case.
Research objectives:

  • Improved Interpretability: place pattern visualizations into the context of data or other patterns, develop new techniques for interactive clustering visualization because the visualization of pattern mining results often suffers from the very large number of patterns to consider. In addition, patterns on their own, without an idea of related patterns and/or the data context such as instance coverage and [partial] labels, are hard to process since they are local by definition
  • Feedback: develop new feedback options for pattern mining, such as accepting or rejecting a pattern, judging its interestingness on a scale, deciding on which patterns are preferable to others, or actively suggesting constituting elements to be changed. Investigated interactions for clustering could involve straight-up rejecting a clustering, rejecting individual clusters, but also changing (parts of) the description, or rearranging instance arrangements.
  • Translation of feedback into constraints: Rejection can take both the form of hard filtering of patterns and of down-weighting. Reassigning instances can be translated into must-link/cannot-link constraints or into more general interpretations.
  • Enforcing constraints: derive constraint properties that can be exploited during pattern mining, exploit constraint-programming techniques to enforce complex constraints.
  • Efficiency: develop approximation and parallelization techniques to speed up existing techniques, exploit sampling, and develop hybrid solutions that interleave pattern mining and clustering.