XML from a Data Store

Mr. XML Publisher's data pulling features can be accessed by users or administrators. Each group uses a different mechanism:

Both mechanisms use the same intermediate Java class files, data pullers, and drivers. The differences are in how Mr. XML Publisher gathers the information necessary to pull the data and when the data is pulled.

IncludeMap.xml Files in User Projects

Using an IncludeMap.xml file in an uploaded project, a user instructs Mr. XML Publisher to:

  1. Pull XML from a data store.

  2. Write the XML to a file on the server.

  3. Include the XML file in the project.

An IncludeMap.xml file provides usernames, passwords, query strings, etc. Mr. XML Publisher pulls the XML from the data store and writes it to a file in the temporary directory created when it unpacked the user's uploaded project. The project's main XML file includes the XML file created from the pulled data via <include> elements, for example:

<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Chap_1.xml"/>

The IncludeMap.xml file can instruct Mr. XML Publisher to simultaneously pull data from multiple data sources.[18] Its constraints are defined in its schema. Details of the IncludeMap.xml file are discussed in IncludeMap.xml. An entire example IncludeMap.xml file is provided in Example IncludeMap.xml and a copy of its schema is provided in IncludeMap.xml Schema.

PULL_XXX_ON_STARTUP[…] <context-param>s in the web.xml File

Using PULL_XXX_ON_STARTUP[…][19] <context-param> elements in the web.xml file, administrators instruct Mr. XML Publisher to:

  1. Pull XML from a data store upon loading of the Mr. XML Publisher servlet context.

  2. Write the XML to a file on the server.

PULL_XXX_ON_STARTUP <context-param> element values are delimited strings with each member corresponding to a value required by the specified data puller. Creation of the file on the server happens just once, upon loading of the servlet context.

Use this mechanism to avoid repeatedly performing the same data pull. Use it when referencing XSL files from within command arrays and you wish to ensure that those files are consistent with some specific version from a data store, for example when server-side XSL is shared in common by multiple formats via <xsl:import href="…"> elements. PULL_XXX_ON_STARTUP <context-param> elements are discussed in detail in PULL_XXX_ON_STARTUP.

Data Stores

Mr. XML Publisher supports pulling data from the following data stores:

  • IBM DB2 9.1

  • Oracle 10g 10.2

  • MS SQL Server 2005

  • Sybase 15.0.2

  • mySQL 5.0.41

  • Tamino XML Server 4.4.1

  • X-Hive/DB 7.5.6

  • MarkLogic 3.2-1

  • TigerLogic XDMS 2.6.4

  • XStreamDB 3.2

  • eXist 1.1.1

  • Xindice 1.1

  • Sedna 2.0

Mr. XML Publisher's documentation does not provide advice or instructions on how to import data into your data store or manage it once it's there. Data management policies are, of course, specific to each site and each organization. However, you must carefully consider the following:

  • character encoding

    Upon import, export, or just for internal storage, a data store might use a character encoding you don't expect. In all cases, Mr. XML Publisher writes pulled data to disk using UTF-8 character encoding. If data pulled from your data store cannot be encoded using UTF-8, you will get a java.io.UnsupportedEncodingException.

  • character entities

    Data stores handle character entities differently. Upon import or export, a data store might translate character entities to/from their numeric character references. If character entities are not represented in a way you expect, pulled data might be successfully written to disk using UTF-8 character encoding, but you could still get a validity error upon formatting. Or worse, character entities might get replaced by the data store, with no error occurring upon formatting and the formatted output being not what was expected. Make absolutely certain that you know what your data store is doing with character entities upon import and export. In some cases, representing character entities using hexadecimal encoding values in the XML before importing it into the data store can be a solution to otherwise difficult character encoding problems.

  • validation

    Mr. XML Publisher does not validate XML pulled from a data store and does not require the XML it pulls to be valid. Your data store might validate XML upon import or export, depending on its features and how it's configured. If the XML in your data store is not validated, keep in mind that individual commands in a format's command array, such as those that use an XSL transformer or an FO processor, are likely to require valid XML.

XIncludes

Modularity is facilitated by using XIncludes (http://www.w3.org/TR/xinclude/). In projects uploaded for formatting, end users are free to use XIncludes, or not. An uploaded project might contain all its referenced XML inclusions, or it may need to pull the XML from a data store. As the Mr. XML Publisher administrator, you must make sure that in each command array the command that performs the actual transformation uses an appropriate flag or option. For example, when using xsltproc from the libxml2 package (http://xmlsoft.org/):

xsltproc --xinclude […]

If an uploaded project contains an IncludeMap.xml file, Mr. XML Publisher pulls data and creates files according to the rules described in IncludeMap.xml. If a query successfully executes but returns an empty result set, no file is created. Thus, the XML inclusion would not be performed because the resource is missing. Or, a resource expected to be in an uploaded project may simply be missing. In these cases, no exception is thrown on account of a missing XML inclusion. Rather, Mr. XML Publisher allows the XInclude fallback mechanisms to take over and provides the user with an appropriate message available from the Server Processing Messages popup (Server Processing Messages).

IncludeMap.xml

If a project contains an IncludeMap.xml file, Mr. XML Publisher pulls data from a data store and creates files from that data according to the instructions it takes from the IncludeMap.xml file's element values. The files are written to the temporary directory Mr. XML Publisher created when it unpacked the uploaded project.

Example 38 shows the structure of an IncludeMap.xml file using just its outer elements.

Example 38. IncludeMap.xml Outer Elements


The root element of an IncludeMap.xml file is always an <XMLP_Includes> element. It must always exist exactly once, it must always contain exactly one <Text> element, and it must always contain exactly one <Binary> element. The <Binary> element is not used. It is a placeholder, but it must exist in all IncludeMap.xml files as an <XMLP_Includes> child element. An example IncludeMap.xml file showing all elements is provided in Example IncludeMap.xml. The W3C XML Schema version of the IncludeMap.xml schema is provided in IncludeMap.xml Schema.

As an administrator or as one who advises end users on how to construct IncludeMap.xml files, you are mostly interested in the <Text> element's child elements. You instruct Mr. XML Publisher to pull data via the various <XXX_Pull> elements and their <XXX_Resource> child elements. Replace "XXX" with the appropriate data store name. For example, for Oracle, replace "XXX" with "Oracle" to get "<Oracle_Pull>" and "<Oracle_Resource>". Values of <XXX_Resource> child elements provide Mr. XML Publisher with the necessary information needed to pull the data and create its file.

For each data puller, its <XXX_Pull> element must contain one or more <XXX_Resource> elements. Each <XXX_Resource> element represents a single data pull.

In all cases, the filename Mr. XML Publisher uses when creating the file is the textual value taken from a child element of the <XXX_Resource> element. That element is the <XInclude_FileName> element. An <XInclude_FileName> element value can include one or more directories in a path, but those directories must exist in the uploaded project. For example, any of the following are acceptable values for an <XInclude_FileName> element:

  • fileName.xml

  • refDir/fileName.xml

  • refDir/XML/fileName.xml

Any directories that don't already exist as part of the uploaded project will, by default NOT be created. If the value of an <XInclude_FileName> element uses a directory in its path that did not exist as part of the uploaded project, the user will get a FileNotFoundException and be transferred to an error page. You can change that behavior so that the missing directories are created. For security reasons, how to do that is not explained here. Contact Mr. XML Publisher support.

If a project's IncludeMap.xml file provides instructions to pull data and create a file that already exists in a project, the file is not overwritten. Mr. XML Publisher logs the event and continues servicing the request.

[Caution]Caution

Because files created from data pulls are owned by the owner of your server's web container process, you should prevent the creation of files in unwanted places in the same way you would prevent any user on the server from doing that. You do not want your server's web container process owned by a privileged user. For suggestions on server security, see Security.

Two <XXX_Pull> elements are used for multiple data stores. The <JDBC_Pull> element is used for pulling data from:

  • Microsoft SQL Server

  • IBM DB2

  • Sun mySQL

  • Sybase

The <XMLDB_Pull> element is used for pulling data from:

  • eXist

  • xindice

Each <XXX_Pull> element is discussed in detail in the following sections.

[Caution]Caution

You must not allow unintended spaces or line breaks to occur within the textual values of any elements in an IncludeMap.xml file.

<JDBC_Pull>

Within a <JDBC_Pull> element, <JDBC_Resource> elements may pull data from any supported JDBC data store and may be arranged in any order.[20] Example IncludeMap.xml shows a <JDBC_Pull> element using multiple <JDBC_Resource> elements, each pulling data from a different JDBC data store.

The required syntax for some element values varies with the data store, but all <JDBC_Resource> elements use exactly the same child elements. An example for each supported JDBC data store is provided in the following sections.

Microsoft SQL Server

When pulling XML from Microsoft SQL Server, the column type must be XML, and you can use either any valid Microsoft SQL statement or any valid Microsoft XQuery statement. For example, if the column type is XML, the table name is "tableName", and the column name is "xsl":

  • Microsoft XQuery

    select xsl.query('/*') from tableName

  • Microsoft SQL

    select xsl from tableName

Example 39 shows a <JDBC_Pull> element using a single <JDBC_Resource> element for SQL Server.

Example 39. <JDBC_Pull> Element in IncludeMap.xml for SQL Server


IBM DB2

When pulling XML from IBM DB2, the column type must be XML, and you can use either any valid IBM DB2 SQL statement or any valid IBM DB2 XQuery statement. For example, if the column type is XML, the table name is "TABLENAME", and the column name is "XSL":

  • IBM DB2 XQuery

    XQUERY db2-fn:xmlcolumn ('TABLENAME.XSL')

  • IBM DB2 SQL

    select XSL from TABLENAME

Example 40 shows a <JDBC_Pull> element using a single <JDBC_Resource> element for DB2.

Example 40. <JDBC_Pull> Element in IncludeMap.xml for DB2


Sun mySQL

When pulling XML from mySQL, you can use only SQL queries. Example 41 shows a <JDBC_Pull> element using a single <JDBC_Resource> element for mySQL. The table name is "tableName", the "XML_Source" column datatype is MEDIUMTEXT, and the "fileName" column datatype is VARCHAR(80).

Example 41. <JDBC_Pull> Element in IncludeMap.xml for mySQL


[Caution]Caution

If you are using mySQL, you know about the required ampersand in the connection string. When using an IncludeMap.xml file to pull data from mySQL, you must code the ampersand ("&") as a character entity ("&amp;") in the <Connection_String> element value.

Sybase

When pulling XML from Sybase, the column type must be TEXT. You can use Sybase SQL queries, and if your Sybase system has ASE XML Services installed, you can use Sybase XML query functions. For example, if the "XML_Source" column datatype is TEXT, the table name is "tableName", and the "fileName" column datatype is VARCHAR(80):

  • Sybase XML Query Function

    select xmlextract('/chapter[@id="chap1"]' XML_Source) from tableName where fileName='Chap_1.xml'

  • Sybase SQL

    select XML_Source from tableName where fileName='Chap_1.xml'

Example 42 shows a <JDBC_Pull> element using a single <JDBC_Resource> element for Sybase.

Example 42. <JDBC_Pull> Element in IncludeMap.xml for Sybase


[Caution]Caution

If you are using Sybase, you know about the required ampersand in the connection string. When using an IncludeMap.xml file to pull data from Sybase, you must code the ampersand ("&") as a character entity ("&amp;") in the <Connection_String> element value.

<XMLDB_Pull>

The <XMLDB_Pull> element uses <XMLDB_Resource> child elements to pull XML data using the XML:DB API (http://xmldb-org.sourceforge.net/xapi/). <XMLDB_Resource> elements may pull data from any supported XMLDB data store and may be arranged in any order within an <XMLDB_Pull> element.

Pulling XML data from a native XML database using the XML:DB API requires a service name and a service version. You supply the necessary values in the <Service_Name> and <Service_Version> child elements of the <XMLDB_Resource> element. The current XML:DB API supports only service version "1.0." Thus, for now, you should always use a value of "1.0" in the <Service_Version> element. Mr. XML Publisher allows only the XML:DB API services "XQueryService" or "XPathQueryService", but you may be limited further by your data store. For example, xindice supports only "XPathQueryService". You must use an appropriate value in the <Service_Name> element.

Example IncludeMap.xml shows an <XMLDB_Pull> element using multiple <XMLDB_Resource> elements, each pulling data from a different XMLDB data store. The required syntax for some element values varies with the data store, but all <XMLDB_Resource> elements use exactly the same child elements. The following XML:DB data stores are supported:

  • eXist

  • xindice

eXist

When pulling XML from eXist, you can use as the XML:DB service name either "XQueryService" or "XPathQueryService". Example 43 shows an <XMLDB_Pull> element using a single <XMLDB_Resource> element for pulling XML data from eXist. It uses an XQuery and a value of "XQueryService" in the <Service_Name> element.

Example 43. <XMLDB_Pull> Element in IncludeMap.xml for eXist


xindice

The only XML:DB service supported by xindice is "XPathQueryService"; thus, when using xindice, the only acceptable value for the <Service_Name> element is "XPathQueryService". Example 44 shows an <XMLDB_Pull> element using a single <XMLDB_Resource> element for xindice.[21]

Example 44. <XMLDB_Pull> Element in IncludeMap.xml for xindice


<TGL_Pull>

The <TGL_Pull> element and its <TGL_Resource> child elements are used for pulling XML data from TigerLogic XDMS[22] (http://www.tigerlogic.com/). Example IncludeMap.xml shows a <TGL_Pull> element using multiple <TGL_Resource> elements. You can use any XQuery statement supported by TigerLogic.

Example 45 shows a <TGL_Pull> element and a single <TGL_Resource> element. Notice the parameters being passed in the <URI> element: "RETRY_COUNT=1" and "RETRY_DELAY=0". Mr. XML Publisher allows you to specify a timeout for all data pulls by using the DEFAULT_QUERY_TIMEOUT <context-param> in Mr. XML Publisher's web.xml file (see DEFAULT_QUERY_TIMEOUT) However, it is recommended that you consider also passing these parameters to TigerLogic.

Example 45. <TGL_Pull> Element in IncludeMap.xml


[Caution]Caution

Parameters passed in TigerLogic's <URI> element must be separated by an ampersand. When using an IncludeMap.xml file to pull data from TigerLogic, you must code the ampersand ("&") as a character entity ("&amp;").

<Oracle_Pull>

The <Oracle_Pull> element and its <Oracle_Resource> child elements are used for pulling XML data from Oracle. Example IncludeMap.xml shows an <Oracle_Pull> element using multiple <Oracle_Resource> elements.

When pulling XML data from Oracle, you can use Oracle SQL queries, Oracle SQL functions, and, if your Oracle system has Oracle XML DB installed, you can use any Oracle XML DB XMLType operation. The column datatype must be XMLType.

For example, if the "xml_source" column datatype is XMLType, the table name is "tableName", and the "fileName" column datatype is VARCHAR2(10):

  • Oracle SQL

    SELECT * from tableName WHERE fileName = 'Chap_1.xml'

  • Oracle SQL Function

    SELECT extract(xml_source, '/*') FROM tableName WHERE fileName = 'Chap_1.xml'

  • Oracle XMLQuery

    SELECT XMLQuery('/*' PASSING xml_source RETURNING CONTENT) FROM tableName WHERE fileName='Chap_1.xml'

Example 46 shows an <Oracle_Pull> element and a single <Oracle_Resource> element.

Example 46. <Oracle_Pull> Element in IncludeMap.xml


[Caution]Caution

Oracle 10g ships with a set of parser-related class files in its xmlparserv2.jar file. Do not use that jar file as it is. If you do, Mr. XML Publisher will be unable to properly perform XML parsing on the server.

In the xmlparserv2.jar file that ships with Oracle 10g, inside the META-INF/services directory, you will find a file named "javax.xml.parsers.SAXParserFactory." That file contains the text "oracle.xml.jaxp.JXSAXParserFactory", and this forces Mr. XML Publisher to use Oracle's XML parsers instead of those that ship with Java 6. You must prevent that.

Mr. XML Publisher must be allowed to use the parsing classes provided in Java 6, or at least parser classes that support the features in Java 6. Some essential features specified with javax.xml.parsers.SAXParserFactory.setFeature("…") are unavailable in Oracle's parser classes.

Mr. XML Publisher ships with a xmlparserv2.jar file in its /lib directory that is a modified version of the one that ships with Oracle 10g. If that file somehow gets replaced with the one that ships with Oracle 10g, the easiest fix is to un-jar the library, delete the entire services directory, and re-jar the library.

<MKL_Pull>

The <MKL_Pull> element and its <MKL_Resource> child elements are used for pulling XML data from MarkLogic Server. Example IncludeMap.xml shows a <MKL_Pull> element using multiple <MKL_Resource> elements.

When pulling XML data from MarkLogic Server, Mr. XML Publisher uses the MarkLogic XML Contentbase Connector API for Java (XCC/J).[23] It does not use the older MarkLogic connector solution, XDBC. Syntax for the values of the <URI> and <Query_String> elements within an <MKL_Resource> element must be thusly appropriate.

Example 47 shows an <MKL_Pull> element and a single <MKL_Resource> element for MarkLogic.

Example 47. <MKL_Pull> Element in IncludeMap.xml


<XHive_Pull>

The <XHive_Pull> element and its <XHive_Resource> child elements are used for pulling XML data from X-Hive/DB. Example IncludeMap.xml shows an <XHive_Pull> element using multiple <XHive_Resource> elements.

Example 48 shows an <XHive_Pull> element using a single <XHive_Resource> element. Mr. XML Publisher sessions with X-Hive/DB are created in read-only mode. You may use as the value in the <Query_String> element any XQuery statement supported by X-Hive/DB for reading XML data.[24]

Example 48. <XHive_Pull> Element in IncludeMap.xml


<Tamino_Pull>

The <Tamino_Pull> element and its <Tamino_Resource> child elements are used for pulling XML data from Tamino XML Server. Example IncludeMap.xml shows a <Tamino_Pull> element using multiple <Tamino_Resource> elements.

When pulling XML from Tamino, you can use either Tamino TXQuery or Tamino TQuery.[25]

  • Tamino TXQuery

    collection("collectionName")/chapter[@id="chap1"]

  • Tamino TQuery

    Greeting[@ino:id=1]

Example 49 shows a <Tamino_Pull> element using a single <Tamino_Resource> element to perform a data pull using a Tamino TXQuery.

Example 49. <Tamino_Pull> Element in IncludeMap.xml


<Sedna_Pull>

The <Sedna_Pull> element and its <Sedna_Resource> child elements are used for pulling XML data from the Sedna XML Database Management System. Example IncludeMap.xml shows a <Sedna_Pull> element using multiple <Sedna_Resource> elements.

When pulling XML from Sedna, you can use any XQuery statement supported by your Sedna server. Example 50 shows a <Sedna_Pull> element using a single <Sedna_Resource> element.[26]

Example 50. <Sedna_Pull> Element in IncludeMap.xml


<XStreamDB_Pull>

The <XStreamDB_Pull> element and its <XStreamDB_Resource> child elements are used for pulling XML data from XStreamDB Server. Example IncludeMap.xml shows an <XStreamDB_Pull> element using multiple <XStreamDB_Resource> elements.

When pulling XML from XStreamDB, you can use any XQuery statement it supports. XStreamDB XQuery statements require that a query flag be set before execution. The query flag is always automatically set to zero when pulling XML data from XStreamDB using an IncludeMap.xml file. Zero is the constant field value for com.bluestream.xdb.XStatement.SF_DEFAULT.[27]

Example 51 shows a <XStreamDB_Pull> element using a single <XStreamDB_Resource> element.[28]

Example 51. <XStreamDB_Pull> Element in IncludeMap.xml




[18] The number of files that can be created from data pulls is limited per formatting request to the value specified in the web.xml file's MAX_PULLS_PER_REQUEST <context-param>.

[19] The <context-param> name, "PULL_XXX_ON_STARTUP[…]", refers to any <context-param> whose name begins with "PULL_XXX_ON_STARTUP", where "XXX" is the name of a specific data puller and "[…]" may be anything. Such <context-param>s include "PULL_JDBC_ON_STARTUP", "PULL_XMLDB_ON_STARTUP", etc.

[20] The supported JDBC data stores are SQL Server, DB2, mySQL, and Sybase.

[21] Connection pooling is unavailable with xindice.

[22] Connection pooling for TigerLogic is controlled on the TigerLogic server through the TigerLogic XDMS Data Source Connectivity Manager. Mr. XML Publisher does nothing, and there is nothing you can do as the Mr. XML Publisher administrator, to take advantage of connection pooling in TigerLogic.

[23] The MarkLogic XML Contentbase Connector interface provides automatic connection pooling.

[24] X-Hive/DB provides nothing in its API that explicitly enables connection/session pooling. Mr. XML Publisher's data puller for X-Hive/DB provides some automatic optimization, and tests show that its speed is approximately the same as the speed of the data pullers that use Apache's DBCP connection pooling.

[25] Mr. XML Publisher's data puller for Tamino XML Server uses Tamino's TConnectionPoolManager class and related Tamino classes to pool connections to Tamino. If your Tamino XML Server prevents the use of pooled connections, Mr. XML Publisher's data puller for Tamino falls back to the use of unpooled connections.

[26] The Sedna XML Database Management System does not provide any connection pooling features.

[27] When pulling XML data from XStreamDB using PULL_XSTREAMDB_ON_STARTUP[…], you must explicitly supply a query flag.

[28] The XStreamDB client API is not suited for connection pooling. Thus, by default, Mr. XML Publisher provides no connection pooling in its data puller for XStreamDB. Contact Mr. XML Publisher technical support if you feel you must have connection pooling in XStreamDB.