Wednesday, February 25, 2009

Create usable error messages with XmlValidate

My first venture into XSD validation with Coldfusion has been an "exciting" one, to say the least. Teaching myself XSD wasn't near as difficult as I'd expected, but I'm sure I'm just touching the surface of what I could be doing with it.

My current project allows users to upload a comma-delimited file for batch submissions of form data. My application will convert that CSV file to XML using a CFC based on Ben Nadel's fantastic CSV to Array function. Before I pass it to our internal systems, I want to make sure the CSV data is valid based on our data requirements. So I take this XML object that I've created and my XSD file and run the Coldfusion XmlValidate() function.

My XML object looks something like this:

<batch>
<item>
<child1></child1>
<child2></child2>
etc....
</item>
</batch>

As expected, I receive errors in the data validation, but boy are they NOT useful:

[Error] :2:56506: cvc-pattern-valid: Value '4112' is not facet-valid with respect to pattern '[0-9]{5}(-[0-9]{4})?' for type 'null'.
[Error] :2:56506: cvc-type.3.1.3: The value '4112' of element 'Emp_Zip' is not valid.

Let's break this down. The error message is delimited by colons (":"). The first position obviously indicates an error in the validation. The second position is the line number where the error occurred, the third is the character position. The fourth is the error type and the fifth is the error message.

The biggest issue with this is the line number. This indicates that the error occurred on line 2, but it actually occurred in the 49th "item" element of my XML object. The XML function that creates the XML object creates two lines: the first is the XML declaration and the second is the 99 XML items and all of their children run together on a single line. This is why it also indicates that the error occurred at character 56,506!

In my case, I'm interested in returning the entire "record" or element so I can give the user context of where the error is occurring. Here's the plan of attack:

  1. Delimit the element with linefeed characters to get an accurate representation of which line the validation error occurs.

  2. Run XmlValidate() on the delimited XML object.

  3. Return a view to the user showing where the data errors are occurring.

Sounds easy, right? Well, it is after you figure out how the heck to do it! Here we go:

Where myXmlObj is the run-together XML returned from my CSV conversion function and myXSDObj is a reference to the XSD file...

<!--- Delimit the XML object with linefeeds --->
<cfset myXmlObj = Replace(ToString(myXmlObj),"<item>",Chr(10) & "<item>", "all") />

<!--- Validate the data using XmlValidate --->
<cfset myResults = #XmlValidate(myXmlObj, myXSDObj)# />

<cfoutput>

<!--- Give the user something useful --->
<p>Errors occurred validating submission:</p>
<ul>
<!--- Loop through the error messages --->
<cfloop index="i" from="1" to="#ArrayLen(myResults.errors)#">
<!--- Ignore the ugly error messages (the RegEx stuff) --->
<cfif NOT myResults.errors[i] contains "facet-valid">
<!--- Get the user-friendly error text from the error message --->
<li>#ListGetAt(myResults.errors[i],5,":")#</li>
</cfif>
</cfloop>
</ul>

<!--- Display the entire record to give context of where error occurred --->
<table class="errorTable">
<tr id="header">
<!--- Get the column headers from the XML object --->
<cfloop index="i" from="1" to="#ArrayLen(XmlParse(myXmlObj).batch.item[1].XmlChildren)#">
<th>#XmlParse(myXmlObj).batch.item[1].XmlChildren[i].XmlName#</th>
</cfloop>
</tr>
<!--- Loop through the errors to display every row where an error occurred --->
<cfloop index="i" from="1" to="#ArrayLen(myResults.errors)#">
<!--- Again, ignoring the ugly error messages in favor of the nice ones --->
<cfif NOT myResults.errors[i] contains "facet-valid">
<!--- Get the current error message --->
<cfset errorMessage = myResults.errors[i] />
<!--- Get the line number where the error occurred --->
<cfset errorLine = ListGetAt(myResults.errors[i],2,":") />
<!--- Create a new XML object with just the <item> where the error occurred --->
<cfset xmlSnip = XmlParse(ListGetAt(myXmlObj,errorLine,Chr(10))) />
<tr>
<!--- Output the children of xmlSnip --->
<cfloop index="i" from="1" to="#ArrayLen(xmlSnip.item.XmlChildren)#">
<!--- The fun stuff... if the error occurred in the current child element, style it --->
<cfif errorMessage contains "#xmlSnip.item.XmlChildren[i].XmlName#">
<cfset styleMe="color:red; border-color:red;" />
<cfelse>
<cfset styleMe="" />
</cfif>
<td nowrap="nowrap" style="#styleMe#">#xmlSnip.item.XmlChildren[i].XmlText#</td>
</cfloop>
</tr>
</cfif>
</cfloop>
</table>
</cfoutput>

The result:

Only the errors are displayed, they're highlighted in red, and the user has context of where the error occurred so they can go back to their original document and fix it before re-submitting.

Thanks to Adam Cameron on the Adobe Forums for helping me pick the right path.


UPDATED 2/27/2009: The above solution outputs each error on a single line. What if a line has multiple errors? I'd rather output all of the errors that occur for the same item on the same line.

Conceptually, I created a 2-D array with the line number as one item and the error text as another item. I then get the unique line numbers from the errors. Loop through the unique line numbers to output the unique rows and check the 2-D errors array for the current line number and the current element. It's a little loop heavy, but gives the expected results.