Perform Documents

Perform documents are used by clients to instruct data service resources to perform activities. These activities may include data resource queries and updates, data transformations or data delivery operations. A simple perform document might specify a single database query activity, while a more complicated example could connect several activities into a pipeline. For example, the results of a database query could be filtered, transformed and then delivered by FTP using a a pipeline of activities.

The advantage of pipelining activities in this way is the elimination of unnecessary data movement. This is achieved by moving the computation closer to the data source and allowing data to stream efficiently between the activities. If each activity was instead to be performed using a different request or service, the overhead incurred by serialization, deserialization and transportation of data over the network would be much larger.

Perform documents are also used to specify the session requirements for a request. For instance, a perform document can instruct the data service resource to create a new session with a specified life-time. The activities described within the perform document will have access to this session for storing state that will be available to any subsquent request that joins the same session.

The Client Toolkit APIs that are provided with OGSA-DAI enable Java developers to build applications which generate perform documents and interpret response document programatically. This is generally the most straight-forward way to use the software. However, this page of documentation describes the underlying structure and content of perform documents. A knowledge of this will provide a deeper understanding of how OGSA-DAI operates and it is recommended that all users read this.

Document Structure

A simple perform document containing a session element, activity element and end-point.

Perform documents are expressed using XML. The root element of a perform document is a element, under which can be an optional element and a collection of zero or more activity elements. The element is used to specify the session requirments for the request and is described in more detail in the section below. An activity element is an element corresponding to an activity for the data service resource to perform. All activity elements contained within a single perform document must have unique name attribute values. Activities can be connected to form pipelines by ensuring that the named output stream of one activity element is referenced by the input stream of another. Any output streams that are left unconnected are known as the end-points of a perform document. Data that is written into these end-points will be delivered back to the client within a response document.

Synchronous and Asynchronous Requests

If a perform document contains any end-points then the data service resource will process it synchronously. This means that result data will be returned in the response document to the client when the activity processing is complete. If however a perform document contains no end-points then there will be no result data to embed within the response document. In this case the data service resource is able to return the response document to the client immediately and then begin processing the perform document asynchronously. The response document will contain only the current execution state of the request and the identity of the session that the request was joined to. The perform document request will continue processing to completion within the data service resource.

Session Requirements

The element is optional and may be used to specify the session requirments for a request. If it is omitted then the request is joined to an implicit session which exists only for the duration of the request processing and expires automatically afterwards. There are four valid forms of the element which instruct the data service resource to behave differently:

- create a new session with default settings then join the request to that session. The ID of the new session will be returned to the client in the response document.
- create a new session with the specified timeout. The timeout setting is the lifetime in milliseconds of the session from when it was last accessed to when it will expire. A session will never expire while a request is joined to it, but as soon as a session is left unused, a timer will start and if the timeout duration is exceeded the session will expire. The ID of the new session will be returned to the client in the response document.
- join the request to an existing session with the specified ID. If this session does not exist then an error will be returned to the user. If it does exist, then activities within the request will be able to interact with the state previously stored in the session.
- join the request to an existing session with the specified ID and then terminate that session as soon as request processing completes. This form can be used to explicity expire a session when it is no longer needed. The expiry occurs after the request processing completes so it is still possible for the request to contain activities that access and interact with the session.

Activity Control Flow

Perform documents also allow the expression of simple control-flow logic. This provides the client with more precise control over the order in which an OGSA-DAI service will process the activities. By default any unconnected activities or distinct activity pipelines contained in a perform document are processed concurrently. OGSA-DAI control-flow supports this behaviour but also allows unconnected activities and distinct activity pipelines to be processed in sequence. Two different control flow elements, which can be nested within one another, are defined to achieve this:

- any activities, or activity pipelines, contained within the element will be processed sequentially, one after the other.
- any activities, or activity pipelines, contained within the element will be processed concurrently using different processing threads.

It is important to note that an individual activity is not always the smallest component of a control-flow. When activities are connected into pipelines, the entire pipeline will form a single component of control-flow. This is because OGSA-DAI processes pipelines by streaming data from one end to the other, so all the connected activities are processing synchronously. Note also that although control-flow elements can be nested within one another, an activity pipeline cannot span multiple control-flow elements.

In most interactions with OGSA-DAI services these control-flow elements are not required, however, for some advanced scenarios it is necessary to ensure strict ordering of activity execution. For example, consider creating a temporary table, populating it with data, querying that data, and then dropping the temporary table. This could be achieved with a single perform document by using a sequence element to order these four activities.

Examples

Simple Synchronous Query

The following perform document describes an SQL query on a relational database the results of which will be transformed into WebRowSet XML. There is one end-point so this perform document will be processed synchronously and the query results will be delivered within the response document. No session requirements are specified so the request will be joined to an implicit session.



  
    Perform a simple SELECT statement and transform the
    results into WebRowSet XML.
  
  
    select * from littleblackbook where id=10
    statementOutput"/>
  
  
    statementOutput"/>
    webRowSetOutput"/>

An activity pipeline is formed connecting the sqlQueryStatement and sqlResultsToXML activities. The query results are then transformed into WebRowSet XML and delivered back to the client in the response document. They will be instered into a CDATA child node of a element within the response document. The name attribute of the element will be webRowSetOutput, corresponding with the end-point (or unconnected output stream) in the perform document.

Control-Flow Example

The following perform document specifies an SQL update and then two SQL queries on a relational database. The results of both queries are transformed into WebRowSet XML. A sequence element is used to ensure that the queries are not performed until the update is complete. A flow element is used to allow both queries to process simultaneously. No session requirements are specified so the request will be joined to an implicit session.



  
    Perform an UPDATE followed by a SELECT statement.
  
  
    
      
        update littleblackbook set address='13 Cod Road' where id=10
      
      
    
    
      
        
          select * from LittleBlackBook where id = 10
        
        
      
      
        
        
      
      
        
          select * from BigRedBook where id < 100

The query results will be transformed into WebRowSet XML and inserted into two result elements of the response document named webRowSet1Output and webRowSet2Output.

Asynchronous Data Transport Query

Data can be transported between data service resources using the OGSA-DAI data transport functionality. All OGSA-DAI data services implement a number of data transport operations for reading and writing data from session streams. A session stream is an input or output that is stored in a session and is exposed for access and interaction via the data transport operations. Various activities are provided for creating session streams and reading and writing data to and from them. As per usual, these can be connected into pipelines. The example below demonstrates how the results of an SQL query can be transported between two data service resources via their data services using this data transport functionality.

Data delivery between two data service resources requires two perform documents.

A data delivery between two data service resources requires two perform document interactions. In this example, the first perform document is sent to the source data service resource and the second to the sink. Data is then pulled between the data service resources using the data transport functionality. It should be noted here that the Client Toolkit APIs provide a simple interface for assembling and processing perform documents. For data delivery scenarios use of the Client Toolkit is often more straight-forward than writing perform documents by hand. However, this example will provide a useful explanation of what takes place beneath the Client Toolkit.

The first perform document defines an activity pipeline connecting an SQL query to a WebRowSet transformation and then on to an outputStream activity. The outputStream activity exposes the output of the activity pipeline as a session stream. An empty session element is specified to instruct the data service resource to create a new session with default lifetime settings. Because there are no unconnected outputs (end-points) in this perform document, the request will be processed asynchronously and the response document will be returned to the client immediately. Remember that this does not mean that processing is actually complete, but merely that there is no data to be delivered in the response document so the response document can be returned early. Only after the result data has been consumed by the sink data service resource will the request complete.



  
  
    
      select * from littleblackbook where id<100

The response to this request will contain a session element describing the ID of the new session that contains the session stream. Session IDs are auto-generated and guaranteed to be unique within the scope of the data service resource. For example:

The second perform document below specifies a deliverFromGDT activity which will pull the result data from the session stream at the source data service resource. The handle to the source data service and the value of the streamId attribute identify the session stream. The value of the streamId attribute is a compound indentifier made by appending the session stream name to the ID of the session containing the stream. The Client Toolkit provides convenient classes for assembling these identifiers.

The implementation of the deliverFromGDT activity uses the data transport GetBlock operation to perform the data transfer. When the data arrives at the sink data service it becomes available through the output stream named myDeliveryOutput. This output stream could then be connected to another activity or simply delivered back in the response document.



  
    streamId="session-ogsadai-106efa15ca3:myOutputStream"
      mode="block">
      http://handle/to/SourceDataService

More example perform documents are located in: OGSA-DAI/examples/Perform.

Specification

A Perform document consists of a root perform element belonging to the namespace https://ogsadai.org.uk/namespaces/2005/10/types. This element can contain:

Zero or more documentation elements containing human-readable descriptions of what the perform document is designed to do.
Zero or one session element describing the session requirements for the request. See above for more details.
Zero or more activities or control-flow elements.
- Each activity must be one of those supported by the data service resource.
- Each sequence or flow control-flow element may contain zero or more activities or nested control-flow elements.

Data service resources also offer the following properties of relevance to perform documents:

{https://ogsadai.org.uk/namespaces/2005/10}activityTypes

This property provides a list of the activities supported by the data service resource.

{https://ogsadai.org.uk/namespaces/2005/10}performDocumentSchema

This property provides the XML-Schema of perform documents accepted by the data service resource.

XML Schema

OGSA-DAI/schema/ogsadai/xsd/perform.xsd