Our internal discussion board brought up the topic of input validation last week. The discussion was around the regex for validating an email address. The message was that what seems like a very simple input validation can get complicated if the full standard is supported. As I read the discussion I started thinking about Data Dictionaries and their failed attempt to standardize data definitions within an enterprise.
The history lesson here is that around the time that database management systems were becoming commercialized (1980s) there were products called data dictionaries or data repositories. The idea was that metadata should be kept outside of the DBMS and shared by all of the application code. In the data dictionary nirvana, there would be ONE definition of EmployeeID for the entire enterprise and all applications and database would share this one definition.
Fast forward to 2010. Where are these products today? Just about every client I walk into has a commercial DBMS (Oracle, Microsoft or IBM), but nowhere do we encounter data dictionaries. Why? Because they never got deployed. What enterprises found out is that no one could agree on what an EmployeeID (or any other piece of data) looked like. Sometime it was politics; sometimes it was acquisitions; sometimes it was I18n. Whatever the reason, the bottom line is the same – there are precious few common data definitions in any enterprise.
So, how in the world does one expect to do input validation against data when there are only a precious few data definition that accurately describe data? This seems like a path that will only lead to the same quagmire that data dictionaries fell into.
For the curious, the regex for email addresses is:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|”(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|\[x01-x09x0bx0cx0e-x7f])*”)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[x01-x08x0bx0cx0e-x1fx21-x5ax53-x7f]|\[x01-x09x0bx0cx0e-x7f])+)])
(from [www.regular-expressions.info])
Thanks to David Lindsay for the reference.