An Empirical Study of Data Constraint Implementations in Java
Software systems are designed according to guidelines and constraints defined by business rules. Some of these constraints define the allowable or required values for data handled by the systems. These data constraints usually originate from the problem domain (e.g., regulations), and developers must write code that enforces them. Understanding how data constraints are implemented is essential for testing, debugging, and software change. Unfortunately, there are no widely-accepted guidelines or best practices on how to implement data constraints.
This paper presents an empirical study that investigates how data constraints are implemented in Java. We study the implementation of 187 data constraints extracted from the documentation of eight real-world Java software systems. First, we perform a qualitative analysis of the textual description of data constraints and identify four data constraint types. Second, we manually identify the implementations of these data constraints and reveal that they can be grouped into 31 implementation patterns. The analysis of these implementation patterns indicates that developers prefer a handful of patterns when implementing data constraints. We also found evidence suggesting that deviations from these patterns are associated with unusual implementation decisions or code smells. Third, we develop a tool-assisted protocol that allows us to identify 256 additional trace links for the data constraints implemented using the 13 most common patterns. We find that almost half of these data constraints have multiple enforcing statements, which are code clones of different types. Finally, a study with 16 professional developers indicates that the patterns we describe can be easily and accurately recognized in Java code.