JARV is an implementation-independent interface set for validators developed by the RELAX community. There are several implementations available that support this interface.
Although it originally came from the RELAX community, JARV is not limited to RELAX; it can be used with many other schema languages. One of the advantages of JARV is that it allows you to use multiple schema languages with minimal change on your code.
First, you need the latest isorelax.jar
file, which is available here.
Then, you need actual implementations. Currently, following implementations are available:
You need to set up those jars so that the class loader can find them.
JARV consists of three components. VerifierFactory
, Schema
and Verifier
.
The VerifierFactory
interface is the main interface between the implementation and your application. It has a method to compile a schema into a Schema
object. The Schema
interface is the internal representation of the schema. This interface is thread-safe, so you can have multiple threads access one Schema
object concurrently. Also, this interface has a method to create a new Verifier
object. The Verifier
interface represents a so-called "validator"; it has a schema object in it and it validates documents by using that schema.
VerifierFactory
The first thing you would do is to create an instance of VerifierFactory
. To do that, simply create an instance of VerifierFactory
implementation. In case of MSV, it will be:
VerifierFactory factory = new com.sun.msv.verifier.jarv.TheFactoryImpl();
To use Swift RELAX Verifier for Java:
VerifierFactory factory = new jp.xml.gr.relax.swift.SwiftVerifierFactory();
JARV is also capable of finding an implementation that supports a particular schema language at run-time. To learn more about this discovery mechanism, please read this. Once you get a factory, then you can use it to compile a schema. To compile a schema, call the This method can accept many types of input. For example, you can pass In this way, you can create a Verifier that checks documents against a particular schema. Verifier is not thread-safe. So typically you want to create one instance per one validation (or one thread.) Verifier has several methods to validate documents. One way is to call the This method will only give you yes/no answer, but you can get more detailed error information by setting an error handler through the Just like a parser reports well-formedness errores through If you throw an exception from the error handler, that exception will not be catched by the verify method. So the validation is effectively aborted there. If you return from the error handler normally, then MSV will try to recover from the error and find as much errors as possible. Every JARV implementation supports the validation via SAX2 in two ways. The first one is a validator implemented as The second one is a validator implemented as A verifier implemented as a filter, Not only you can validate documents before you process them, you can validate them after your application process them. In the following example, a verifier filter is used to validate documents before your own handler process it. SAX-based validation will not make much sense unless you set an error handler, because to know that the document was invalid after you‘ve processed it is too late. To set an error handler, call the In this way, you can abort the processing by throwing an exception in case of an error. If you are using Some JARV implementations (e.g., MSV, Jing, RELAX Verifier for Java) always runs in the fail-fast manner. So as long as you set an error handler, it is guaranteed that your application will never see incorrect document at all. A simple, obvious way to create a In this way, you can decide the JARV implementation at the compile time. Especially in case of MSV, it is advantageous to do so because of the support of the "multi-schema" capability. The MSV factory will accept any schema written in any of the supported languages. Thus you can instantly change the schema language without changing your code at all However, there is one problem in this approach. Specifically, it locks you into a particular JARV implementation, so you need to change your code to use other JARV implementations. For this reason, you may want to "discover" an implementation (just like you usually do with JAXP) at run-time by calling the static Usually, the namespace URI of the schema language is used as the name. For the complete list, plaese consult the javadoc. One of the problems of some validators (like DTD validator in Xerces) is that it doesn‘t work in the fail-fast manner. This problem is unique to SAX. What is "fail-fast"? A fail-fast validator is a validator that can flag an error as soon as an error is found. A non fail-fast validator may let some part of the wrong document slip in (they will flag an error at the later moment.) When you are using non fail-fast validator, you need to take extra care to write your code because your code may be exposed to bad documents. For example, imagine a following simple DTD and a bad document: Suprisingly, in a typical non-fail-fast validator, the error will be signaled as late as in the end-element event of the Typically, this robs the merit of the validation because you do the validation to protect your application code from unexpected inputs. Many of JARV implementations (including MSV, Jing, RELAX Verifier for Java) are fail-fast validators; so they will signal an error at the start-element event of the first ‘ Note that some other JARV implementations may be non fail-fast validators. The This is sometimes useful when you are using only one thread. JAXP masquerading feature is a wrapper implementation of JAXP. This wrapper enhances another JAXP implementation (such as Aelfred or Crimson) by adding JARV-based validation capability to it. Parsing is done by the wrapped JAXP implementation, and JARV implementation adds advanced validation capability to it. This is often the easiest way to incorporate the validation into your application. Since it‘s just so easy to use. To create a wrapped This will create a JAXP SAXParserFactory that validates every parsed document by the specified schema. Similarly, to create a wrapped Once those instances are created, just use them as you use a normal JAXP implementation. The The The The thread affinity of JARV is designed after that of TrAX API ( This causes a problem when you are passing To avoid this problem, wrap it by an This ugly limitation came from the difficulty in correctly detecting XML DTDs, which are written in non-XML syntax, from other schema languages, which are written in XML syntax. Any input on this restriction is very welcome. If you need an example that is not listed here, please let me know so that I can add it in the next release. Have a look at Have a look at This example shows you how to use JARV in the multi-threaded environment and how you can cache a compiled schema into memory. The following code shows how you can validate DOM by using JARV. The following code shows how you can use JARV together with SAX. The following code shows how you can use JARV via JAXP-masquerading. Step 2: compile a schema
compileSchema
method of the factory. Schema schema = factory.compileSchema("http://www.example.org/test.xsd");
InputSource
, File
, InputStream
, etc. Schema
objects are thread-safe. So even if you have more than one threads, you only need one instance of Schema
; you can share that one instance with as many threads as you want. Step 3: create a verifier
Schema
is just a compiled schema, so it cannot do anything by itself. Verifier
object is the object that performs the actual validation. To create a Verifier
object, do as follows: Verifier verifier = schema.newVerifier();
Step 4-1: perform validation
verify
method, which accepts a DOM tree, File
, URL, etc and returns the validity. For example, to validate a DOM document, simply pass it as an argument: if(verifier.verify(domDocument))// the document is validelse// the document is invalid (wrong)
setErrorHandler
method. org.xml.sax.ErrorHandler
, JARV implementations (like MSV) reports validity errors through the same interface. In this way, you can get the error message, line number that caused the error, etc. For example, in the following code, a custom error handler is set to report error messages to the client. verifier.setErrorHandler( new MyErrorHandler() );try {if(verifier.verify(new File("abc.xml")))// the document is validelse// the execution will never reach here because// if the document is invalid, then an exception should be thrown.} catch( SAXParseException e ) {// if the document is invalid, then the execution will reach here// because we throw an exception for an error.}...class MyErrorHandler implements ErrorHandler {public void fatalError( SAXParseException e ) throws SAXException {error(e);}public void error( SAXParseException e ) throws SAXException {System.out.println(e);throw e;}public void warning( SAXParseException e ) {// ignore warnings}}
Step 4-2: perform validation via SAX
ContentHandler
, which can be obtained by calling the getVerifierHandler
method. This content handler will validate incoming SAX2 events, and you can obtain the validaity through the isValid
method. For example, XMLReader reader = ... ; // get XML reader from somewhereVerifierHandler handler = verifier.getVerifierHandler();reader.setContentHandler(handler);reader.parse("http://www.mydomain.com/some/file.xml");if(handler.isValid())// the document is correctelse// the document is incorrect
XMLFilter
, which can be obtained by calling the getVerifierFilter
method. VerifierFilter
, is particularly useful because you can plug it right in the middle of any SAX event pipeline. VerifierFilter filter = verifier.getVerifierFilter();// create a new XML reader and setup the pipelinefilter.setParent(getNewXMLReader());filter.setContentHandler( new MyApplicationHandler() );// parse the documentfilter.parse("http://www.mydomain.com/some/file.xml");if(filter.isValid())// the parsed document was validelse// invalid
setErrorHandler
method just as you did with the verify
method. filter = verifier.getXMLFilter();verifier.setErrorHandler(new MyErrorHandler());...filter.parse(...);
VerifierFilter
you can also set an error handler by calling the setErrorHandler
method of the VerifierFilter
interface. Advanced Topics
Finding implementation at Run-time
VerifierFactory
is to create a new instance of appropriate implementation class (like com.sun.msv.verifier.jarv.TheFactoryImpl
. newInstance
method of the VerifierFactory
class. To do that, you need to pass the name of schema language you want to use. This method will find an implementation that supports a given schema language from the class path and returns its VerifierFactory
. VerifierFactory factory = VerifierFactory.newInstance("http://relaxng.org/ns/structure/1.0");
Fail-Fast Design
<!ELEMENT root (a,b)*><!ELEMENT a #EMPTY><!ELEMENT b #EMPTY><root><b/> <!-- error --><b/></root>
root
element. So you have to make sure that your application behaves gracefully when it sees the wrong ‘b
‘. b
‘. This guarantees that the application will never see a wrong document. Creating Verifier directly from VerifierFactory
VerifierFactory
class has the newVerifier
method as a short-cut. It is a short-cut in the sense that the following two code fragments have exactly the same meaning: Verifier v = factory.compileSchema(x).newVerifier();Verifier v = factory.newVerifier(x);
JAXP Masquerading
SAXParserFactory
, do as follows: Schema schema = /* compile schema */;SAXParserFactory parserFactory = new org.iso_relax.jaxp.ValidatingSAXParserFactory(schema);
DocumentBuilder
, do as follows: Schema schema = /* compile schema */;DocumentBuilderFactory dbf = new org.iso_relax.jaxp.ValidatingDocumentBuilderFactory(schema);
Thread Affinity
VerifierFactory
interface is not thread-safe. This basically means that you cannot use one object from two threads. Schema
interface is thread-safe. So once you compile a schema file into a Schema
object, it can be shared by multiple threads and accessed concurrently. This is useful at server-side, where multiple threads process client requests simultaneously. Verifier
interface is again not thread-safe. Each thread needs its own copy of Verifier
. Verifier
objects are still re-usable, as you can use the same object to validate multiple documents one by one. What you cannot do is to validate multiple documents simultaneously. javax.transform
package). Familiarity with TrAX will help you understand JARV better.MSV and Schema Language Auto Detection
com.sun.msv.verifier.jarv.TheFactoryImpl
automatically detects the schema language from the schema file. However, there is one important limitation. Currently, the detection of XML DTDs is based on the file extension. Specifically, if the schema name has ".dtd" extension, it is treated as XML DTD and otherwise it is treated as other schema languages. InputStream
as the parameter to the compileSchema
method. Since InputStream
s do not have names, they are always treated as non-DTD schemas. InputSource
and call the setSystemId
method to set the system id. The following example shows how to do that: InputSource is = new InputSource(MyClass.class.getResourceAsStream("abc.dtd") );is.setSystemId("abc.dtd");verifierFactory.compileSchema(is);
Examples
Validating bunch of files
SingleThreadDriver.java
example in this zip file. It compiles a schema and obtains a verifier object, then use the same verifier to validate multiple documents. Multi-threaded example
MultiThreadDriver.java
example in this zip file. This example first compiles a schema, then it launches a lot of threads and let them share one schema object. DOM validation
import org.iso_relax.verifier.*;void f( org.w3c.dom.Document dom ){// create a VerifierFactoryVerifierFactory factory = VerifierFactory.newInstance("http://relaxng.org/ns/structure/1.0");// compile a RELAX NG schemaSchema schema = factory.compileSchema( new File("foo.rng") );// obtain a verifierVerifier verifier = schema.newVerifier();// check the validity of a DOM.if( verifier.verify(dom) )// the document is validelse// the document is not valid// you can use the same verifier object to test multiple DOMs// as long as you don‘t use it concurrently.if( verifier.verify(anotherDom) )...// or you can pass an Element to validate that subtree.Element e = (Element)dom.getDocumentElement().getFirstSibling();if( verifier.verify(e) )...}
SAX validation
import org.iso_relax.verifier.*;void f( javax.xml.parsers.SAXParserFactory parserFactory ){// create a VerifierFactory with the default SAX parserVerifierFactory factory = VerifierFactory.newInstance("http://www.xml.gr.jp/xmlns/relaxCore");// compile a RELAX schemaSchema schema = factory.compileSchema( new File("foo.rxg") );// obtain a verifierVerifier verifier = schema.newVerifier();// set an error handler// this error handler will throw an exception if there is an errorverifier.setErrorHandler( new MyErrorHandler() );// get a XMLFilterVerifierFilter filter = verifier.getVerifierFilter();// set up the pipe-lineXMLReader reader = parserFactory.newSAXParser().getXMLReader();filter.setParent( reader );filter.setContentHandler( new MyContentHandler() );// parse the documenttry {filter.parse( "MyInstance.xml" );// if the execution reaches here, the document was valid and// there was nothing wrong.} catch( SAXException e ) {// error.// maybe the document is not well-formed, or it‘s not valid// or some other reasons.}}
JAXP Masquerading
import org.iso_relax.verifier.*;import org.iso_relax.jaxp.*;void f(){// create a RELAX NG validatorVerifierFactory factory = VerifierFactory.newInstance("http://relaxng.org/ns/structure/1.0");// compile a schemaSchema schema = factory.compileSchema( new File("myschema.rng") );// wrap it into a JAXPSAXParserFactory parserFactory = new ValidatingSAXParserFactory(schema);// create a new XMLReader from itparserFactory.setNamespaceAware(true);XMLReader reader = parserFactory.newSAXParser().getXMLReader();// set an error handler// this error handler will throw an exception if there is an well-formedness// error or a validation error.reader.setErrorHandler( new MyErrorHandler() );// set the content handlerreader.setContentHandler( new MyContentHandler() );// parse the documenttry {reader.parse( "MyInstance.xml" );// if the execution reaches here, the document was valid and// there was nothing wrong.} catch( SAXException e ) {// error.// maybe the document is not well-formed, or it‘s not valid// or some other reasons.}}