Mr. XML Publisher's data pulling features can be accessed by users or administrators. Each group uses a different mechanism:
Users provide an IncludeMap.xml file within an uploaded project.
Administrators specify PULL_XXX_ON_STARTUP[…] <context-param>s within Mr. XML Publisher's web.xml file.
Both mechanisms use the same intermediate Java class files, data pullers, and drivers. The differences are in how Mr. XML Publisher gathers the information necessary to pull the data and when the data is pulled.
Using an IncludeMap.xml file in an uploaded project, a user instructs Mr. XML Publisher to:
Pull XML from a data store.
Write the XML to a file on the server.
Include the XML file in the project.
An IncludeMap.xml file provides usernames, passwords, query strings, etc. Mr. XML Publisher pulls the XML from the data store and writes it to a file in the temporary directory created when it unpacked the user's uploaded project. The project's main XML file includes the XML file created from the pulled data via <include> elements, for example:
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Chap_1.xml"/>The IncludeMap.xml file can instruct Mr. XML Publisher to simultaneously pull data from multiple data sources.[18] Its constraints are defined in its schema. Details of the IncludeMap.xml file are discussed in IncludeMap.xml. An entire example IncludeMap.xml file is provided in Example IncludeMap.xml and a copy of its schema is provided in IncludeMap.xml Schema.
Using PULL_XXX_ON_STARTUP[…][19] <context-param> elements in the web.xml file, administrators instruct Mr. XML Publisher to:
Pull XML from a data store upon loading of the Mr. XML Publisher servlet context.
Write the XML to a file on the server.
PULL_XXX_ON_STARTUP <context-param> element values are delimited strings with each member corresponding to a value required by the specified data puller. Creation of the file on the server happens just once, upon loading of the servlet context.
Use this mechanism to avoid repeatedly performing the same data pull. Use it when referencing XSL files from within command arrays and you wish to ensure that those files are consistent with some specific version from a data store, for example when server-side XSL is shared in common by multiple formats via <xsl:import href="…"> elements. PULL_XXX_ON_STARTUP <context-param> elements are discussed in detail in PULL_XXX_ON_STARTUP.
Mr. XML Publisher supports pulling data from the following data stores:
Mr. XML Publisher's documentation does not provide advice or instructions on how to import data into your data store or manage it once it's there. Data management policies are, of course, specific to each site and each organization. However, you must carefully consider the following:
character encoding
Upon import, export, or just for internal storage, a data store might use a character encoding you don't expect. In all cases, Mr. XML Publisher writes pulled data to disk using UTF-8 character encoding. If data pulled from your data store cannot be encoded using UTF-8, you will get a java.io.UnsupportedEncodingException.
character entities
Data stores handle character entities differently. Upon import or export, a data store might translate character entities to/from their numeric character references. If character entities are not represented in a way you expect, pulled data might be successfully written to disk using UTF-8 character encoding, but you could still get a validity error upon formatting. Or worse, character entities might get replaced by the data store, with no error occurring upon formatting and the formatted output being not what was expected. Make absolutely certain that you know what your data store is doing with character entities upon import and export. In some cases, representing character entities using hexadecimal encoding values in the XML before importing it into the data store can be a solution to otherwise difficult character encoding problems.
validation
Mr. XML Publisher does not validate XML pulled from a data store and does not require the XML it pulls to be valid. Your data store might validate XML upon import or export, depending on its features and how it's configured. If the XML in your data store is not validated, keep in mind that individual commands in a format's command array, such as those that use an XSL transformer or an FO processor, are likely to require valid XML.
Modularity is facilitated by using XIncludes (http://www.w3.org/TR/xinclude/). In projects uploaded for formatting, end users are free to use XIncludes, or not. An uploaded project might contain all its referenced XML inclusions, or it may need to pull the XML from a data store. As the Mr. XML Publisher administrator, you must make sure that in each command array the command that performs the actual transformation uses an appropriate flag or option. For example, when using xsltproc from the libxml2 package (http://xmlsoft.org/):
xsltproc --xinclude […]
If an uploaded project contains an IncludeMap.xml file, Mr. XML Publisher pulls data and creates files according to the rules described in IncludeMap.xml. If a query successfully executes but returns an empty result set, no file is created. Thus, the XML inclusion would not be performed because the resource is missing. Or, a resource expected to be in an uploaded project may simply be missing. In these cases, no exception is thrown on account of a missing XML inclusion. Rather, Mr. XML Publisher allows the XInclude fallback mechanisms to take over and provides the user with an appropriate message available from the Server Processing Messages popup (Server Processing Messages).
If a project contains an IncludeMap.xml file, Mr. XML Publisher pulls data from a data store and creates files from that data according to the instructions it takes from the IncludeMap.xml file's element values. The files are written to the temporary directory Mr. XML Publisher created when it unpacked the uploaded project.
Example 38 shows the structure of an IncludeMap.xml file using just its outer elements.
The root element of an IncludeMap.xml file is always an <XMLP_Includes> element. It must always exist exactly once, it must always contain exactly one <Text> element, and it must always contain exactly one <Binary> element. The <Binary> element is not used. It is a placeholder, but it must exist in all IncludeMap.xml files as an <XMLP_Includes> child element. An example IncludeMap.xml file showing all elements is provided in Example IncludeMap.xml. The W3C XML Schema version of the IncludeMap.xml schema is provided in IncludeMap.xml Schema.
As an administrator or as one who advises end users on how to construct IncludeMap.xml files, you are mostly interested in the <Text> element's child elements. You instruct Mr. XML Publisher to pull data via the various <XXX_Pull> elements and their <XXX_Resource> child elements. Replace "XXX" with the appropriate data store name. For example, for Oracle, replace "XXX" with "Oracle" to get "<Oracle_Pull>" and "<Oracle_Resource>". Values of <XXX_Resource> child elements provide Mr. XML Publisher with the necessary information needed to pull the data and create its file.
For each data puller, its <XXX_Pull> element must contain one or more <XXX_Resource> elements. Each <XXX_Resource> element represents a single data pull.
In all cases, the filename Mr. XML Publisher uses when creating the file is the textual value taken from a child element of the <XXX_Resource> element. That element is the <XInclude_FileName> element. An <XInclude_FileName> element value can include one or more directories in a path, but those directories must exist in the uploaded project. For example, any of the following are acceptable values for an <XInclude_FileName> element:
fileName.xml
refDir/fileName.xml
refDir/XML/fileName.xml
Any directories that don't already exist as part of the uploaded project will, by default NOT be created. If the value of an <XInclude_FileName> element uses a directory in its path that did not exist as part of the uploaded project, the user will get a FileNotFoundException and be transferred to an error page. You can change that behavior so that the missing directories are created. For security reasons, how to do that is not explained here. Contact Mr. XML Publisher support.
If a project's IncludeMap.xml file provides instructions to pull data and create a file that already exists in a project, the file is not overwritten. Mr. XML Publisher logs the event and continues servicing the request.
![]() | Caution |
|---|---|
Because files created from data pulls are owned by the owner of your server's web container process, you should prevent the creation of files in unwanted places in the same way you would prevent any user on the server from doing that. You do not want your server's web container process owned by a privileged user. For suggestions on server security, see Security. |
Two <XXX_Pull> elements are used for multiple data stores. The <JDBC_Pull> element is used for pulling data from:
The <XMLDB_Pull> element is used for pulling data from:
Each <XXX_Pull> element is discussed in detail in the following sections.
![]() | Caution |
|---|---|
You must not allow unintended spaces or line breaks to occur within the textual values of any elements in an IncludeMap.xml file. |
Within a <JDBC_Pull> element, <JDBC_Resource> elements may pull data from any supported JDBC data store and may be arranged in any order.[20] Example IncludeMap.xml shows a <JDBC_Pull> element using multiple <JDBC_Resource> elements, each pulling data from a different JDBC data store.
The required syntax for some element values varies with the data store, but all <JDBC_Resource> elements use exactly the same child elements. An example for each supported JDBC data store is provided in the following sections.
When pulling XML from Microsoft SQL Server, the column type must be XML, and you can use either any valid Microsoft SQL statement or any valid Microsoft XQuery statement. For example, if the column type is XML, the table name is "tableName", and the column name is "xsl":
Microsoft XQuery
select xsl.query('/*') from tableName
Microsoft SQL
select xsl from tableName
Example 39 shows a <JDBC_Pull> element using a single <JDBC_Resource> element for SQL Server.
When pulling XML from IBM DB2, the column type must be XML, and you can use either any valid IBM DB2 SQL statement or any valid IBM DB2 XQuery statement. For example, if the column type is XML, the table name is "TABLENAME", and the column name is "XSL":
IBM DB2 XQuery
XQUERY db2-fn:xmlcolumn ('TABLENAME.XSL')
IBM DB2 SQL
select XSL from TABLENAME
Example 40 shows a <JDBC_Pull> element using a single <JDBC_Resource> element for DB2.
When pulling XML from mySQL, you can use only SQL queries. Example 41 shows a <JDBC_Pull> element using a single <JDBC_Resource> element for mySQL. The table name is "tableName", the "XML_Source" column datatype is MEDIUMTEXT, and the "fileName" column datatype is VARCHAR(80).
![]() | Caution |
|---|---|
If you are using mySQL, you know about the required ampersand in the connection string. When using an IncludeMap.xml file to pull data from mySQL, you must code the ampersand ("&") as a character entity ("&") in the <Connection_String> element value. |
When pulling XML from Sybase, the column type must be TEXT. You can use Sybase SQL queries, and if your Sybase system has ASE XML Services installed, you can use Sybase XML query functions. For example, if the "XML_Source" column datatype is TEXT, the table name is "tableName", and the "fileName" column datatype is VARCHAR(80):
Sybase XML Query Function
select xmlextract('/chapter[@id="chap1"]' XML_Source) from tableName where fileName='Chap_1.xml'
Sybase SQL
select XML_Source from tableName where fileName='Chap_1.xml'
Example 42 shows a <JDBC_Pull> element using a single <JDBC_Resource> element for Sybase.
![]() | Caution |
|---|---|
If you are using Sybase, you know about the required ampersand in the connection string. When using an IncludeMap.xml file to pull data from Sybase, you must code the ampersand ("&") as a character entity ("&") in the <Connection_String> element value. |
The <XMLDB_Pull> element uses <XMLDB_Resource> child elements to pull XML data using the XML:DB API (http://xmldb-org.sourceforge.net/xapi/). <XMLDB_Resource> elements may pull data from any supported XMLDB data store and may be arranged in any order within an <XMLDB_Pull> element.
Pulling XML data from a native XML database using the XML:DB API requires a service name and a service version. You supply the necessary values in the <Service_Name> and <Service_Version> child elements of the <XMLDB_Resource> element. The current XML:DB API supports only service version "1.0." Thus, for now, you should always use a value of "1.0" in the <Service_Version> element. Mr. XML Publisher allows only the XML:DB API services "XQueryService" or "XPathQueryService", but you may be limited further by your data store. For example, xindice supports only "XPathQueryService". You must use an appropriate value in the <Service_Name> element.
Example IncludeMap.xml shows an <XMLDB_Pull> element using multiple <XMLDB_Resource> elements, each pulling data from a different XMLDB data store. The required syntax for some element values varies with the data store, but all <XMLDB_Resource> elements use exactly the same child elements. The following XML:DB data stores are supported:
eXist
xindice
When pulling XML from eXist, you can use as the XML:DB service name either "XQueryService" or "XPathQueryService". Example 43 shows an <XMLDB_Pull> element using a single <XMLDB_Resource> element for pulling XML data from eXist. It uses an XQuery and a value of "XQueryService" in the <Service_Name> element.
The only XML:DB service supported by xindice is "XPathQueryService"; thus, when using xindice, the only acceptable value for the <Service_Name> element is "XPathQueryService". Example 44 shows an <XMLDB_Pull> element using a single <XMLDB_Resource> element for xindice.[21]
The <TGL_Pull> element and its <TGL_Resource> child elements are used for pulling XML data from TigerLogic XDMS[22] (http://www.tigerlogic.com/). Example IncludeMap.xml shows a <TGL_Pull> element using multiple <TGL_Resource> elements. You can use any XQuery statement supported by TigerLogic.
Example 45 shows a <TGL_Pull> element and a single <TGL_Resource> element. Notice the parameters being passed in the <URI> element: "RETRY_COUNT=1" and "RETRY_DELAY=0". Mr. XML Publisher allows you to specify a timeout for all data pulls by using the DEFAULT_QUERY_TIMEOUT <context-param> in Mr. XML Publisher's web.xml file (see DEFAULT_QUERY_TIMEOUT) However, it is recommended that you consider also passing these parameters to TigerLogic.
![]() | Caution |
|---|---|
Parameters passed in TigerLogic's <URI> element must be separated by an ampersand. When using an IncludeMap.xml file to pull data from TigerLogic, you must code the ampersand ("&") as a character entity ("&"). |
The <Oracle_Pull> element and its <Oracle_Resource> child elements are used for pulling XML data from Oracle. Example IncludeMap.xml shows an <Oracle_Pull> element using multiple <Oracle_Resource> elements.
When pulling XML data from Oracle, you can use Oracle SQL queries, Oracle SQL functions, and, if your Oracle system has Oracle XML DB installed, you can use any Oracle XML DB XMLType operation. The column datatype must be XMLType.
For example, if the "xml_source" column datatype is XMLType, the table name is "tableName", and the "fileName" column datatype is VARCHAR2(10):
Oracle SQL
SELECT * from tableName WHERE fileName = 'Chap_1.xml'
Oracle SQL Function
SELECT extract(xml_source, '/*') FROM tableName WHERE fileName = 'Chap_1.xml'
Oracle XMLQuery
SELECT XMLQuery('/*' PASSING xml_source RETURNING CONTENT) FROM tableName WHERE fileName='Chap_1.xml'
Example 46 shows an <Oracle_Pull> element and a single <Oracle_Resource> element.
The <MKL_Pull> element and its <MKL_Resource> child elements are used for pulling XML data from MarkLogic Server. Example IncludeMap.xml shows a <MKL_Pull> element using multiple <MKL_Resource> elements.
When pulling XML data from MarkLogic Server, Mr. XML Publisher uses the MarkLogic XML Contentbase Connector API for Java (XCC/J).[23] It does not use the older MarkLogic connector solution, XDBC. Syntax for the values of the <URI> and <Query_String> elements within an <MKL_Resource> element must be thusly appropriate.
Example 47 shows an <MKL_Pull> element and a single <MKL_Resource> element for MarkLogic.
The <XHive_Pull> element and its <XHive_Resource> child elements are used for pulling XML data from X-Hive/DB. Example IncludeMap.xml shows an <XHive_Pull> element using multiple <XHive_Resource> elements.
Example 48 shows an <XHive_Pull> element using a single <XHive_Resource> element. Mr. XML Publisher sessions with X-Hive/DB are created in read-only mode. You may use as the value in the <Query_String> element any XQuery statement supported by X-Hive/DB for reading XML data.[24]
The <Tamino_Pull> element and its <Tamino_Resource> child elements are used for pulling XML data from Tamino XML Server. Example IncludeMap.xml shows a <Tamino_Pull> element using multiple <Tamino_Resource> elements.
When pulling XML from Tamino, you can use either Tamino TXQuery or Tamino TQuery.[25]
Tamino TXQuery
collection("collectionName")/chapter[@id="chap1"]
Tamino TQuery
Greeting[@ino:id=1]
Example 49 shows a <Tamino_Pull> element using a single <Tamino_Resource> element to perform a data pull using a Tamino TXQuery.
The <Sedna_Pull> element and its <Sedna_Resource> child elements are used for pulling XML data from the Sedna XML Database Management System. Example IncludeMap.xml shows a <Sedna_Pull> element using multiple <Sedna_Resource> elements.
When pulling XML from Sedna, you can use any XQuery statement supported by your Sedna server. Example 50 shows a <Sedna_Pull> element using a single <Sedna_Resource> element.[26]
The <XStreamDB_Pull> element and its <XStreamDB_Resource> child elements are used for pulling XML data from XStreamDB Server. Example IncludeMap.xml shows an <XStreamDB_Pull> element using multiple <XStreamDB_Resource> elements.
When pulling XML from XStreamDB, you can use any XQuery statement
it supports. XStreamDB XQuery statements require that a
query flag be set before execution. The query flag is always automatically
set to zero when pulling XML data from XStreamDB using an IncludeMap.xml
file. Zero is the constant field value for
com.bluestream.xdb.XStatement.SF_DEFAULT.[27]
Example 51 shows a <XStreamDB_Pull> element using a single <XStreamDB_Resource> element.[28]
[18] The number of files that can be created from data pulls is limited per formatting request to the value specified in the web.xml file's MAX_PULLS_PER_REQUEST <context-param>.
[19] The <context-param> name, "PULL_XXX_ON_STARTUP[…]", refers to any <context-param> whose name begins with "PULL_XXX_ON_STARTUP", where "XXX" is the name of a specific data puller and "[…]" may be anything. Such <context-param>s include "PULL_JDBC_ON_STARTUP", "PULL_XMLDB_ON_STARTUP", etc.
[20] The supported JDBC data stores are SQL Server, DB2, mySQL, and Sybase.
[21] Connection pooling is unavailable with xindice.
[22] Connection pooling for TigerLogic is controlled on the TigerLogic server through the TigerLogic XDMS Data Source Connectivity Manager. Mr. XML Publisher does nothing, and there is nothing you can do as the Mr. XML Publisher administrator, to take advantage of connection pooling in TigerLogic.
[23] The MarkLogic XML Contentbase Connector interface provides automatic connection pooling.
[24] X-Hive/DB provides nothing in its API that explicitly enables connection/session pooling. Mr. XML Publisher's data puller for X-Hive/DB provides some automatic optimization, and tests show that its speed is approximately the same as the speed of the data pullers that use Apache's DBCP connection pooling.
[25] Mr. XML Publisher's data puller for Tamino XML Server uses Tamino's TConnectionPoolManager class and related Tamino classes to pool connections to Tamino. If your Tamino XML Server prevents the use of pooled connections, Mr. XML Publisher's data puller for Tamino falls back to the use of unpooled connections.
[26] The Sedna XML Database Management System does not provide any connection pooling features.
[27] When pulling XML data from XStreamDB using PULL_XSTREAMDB_ON_STARTUP[…], you must explicitly supply a query flag.
[28] The XStreamDB client API is not suited for connection pooling. Thus, by default, Mr. XML Publisher provides no connection pooling in its data puller for XStreamDB. Contact Mr. XML Publisher technical support if you feel you must have connection pooling in XStreamDB.