Constraints have played an integral role in pattern mining since the birth of the field, mainly because exploiting constraints with particular characteristics, such as (anti-)monotonicity, convertible
monotonicity, or succinctness, make it possible to exhaustively explore the search space without being overwhelmed. The problem with constraints having those useful properties, however, is that
they are often badly aligned with what users need, and thresholds (which are involved in many of these constraints) cannot be easily chosen. The work that we will perform in this work package will
employ three strategies:
- Exploit user-feedback to change thresholds. Instead of demanding from the user that she set a (or several) threshold(s) herself, a first mining operation uses tight settings to produce a small result set, and adjusts thresholds based on user feedback. This can be viewed as a version of the MINE (a solution), INTERACT (with the user), LEARN (her preferences), REPEAT (with updated constraints) loop. This will require that relations between data characteristics and constraint behavior be learned so that user feedback such as “I’d like to see twice as many patterns” or “I’d like to see longer patterns, at least involving four atoms and three bonds” can be translated into threshold settings that will be pushed into the next iteration of the mining process. We will adapt knowledge developed in the field of meta-learning, which is mainly concerned with supervised settings, to our problem settings. The concrete result of this line of research will be approaches to iteratively learn this information.
- Use constraint relaxations. Even though many interesting measures do not have useful properties for mining, existing research has identified relaxations of constraints, constraint primitives, and piecewise anti-monotonic constraints. Following this direction, the first step will consist of collecting constraint proposals from the experts at CERMN, which will then be translated into primitives etc to make them enforceable in the mining process. This will not be a straightforward process for all constraints so that considerable work will be required to find (approximations) of the desired constraints. The concrete result of this line of research, apart from expressing new constraints in new languages, will be a framework laying out rules for general problem settings, not necessarily the one used as a test case in this project.
- Employ constraint programming (CP) solvers. CP languages allow for much richer constraint languages than dedicated pattern mining algorithms, in part because they enforce (or
propagate) constraints in a more flexible manner than dedicated algorithms. CP has seen increasing use in pattern mining in the last decade and is therefore well-suited to be used in the context of our project. In a somewhat similar vein as for constraint relaxations, how to translate an expert constraint into a CP language is not straight-forward, however,
even though the CP community has much greater experience with this task. The drawback of CP-based solutions is that solvers are typically slower than dedicated miners and are
ill-equipped to work on structured data. We will therefore develop hybrid approaches, in which rich constraints are used to partition the data before a dedicated mining approach mines the actual patterns.