|
home > documentation > scenarios > working with blobs >
Using OGSA-DAI with binary large objects (BLOBs)OGSA-DAI has several activities that are useful when reading BLOBS from, and writing BLOBs to, relational databases. The page presents the recommended way of using these activities and also raises some database specific issues that can be important when manipulating BLOBs. Reading BLOBsTo read BLOBs from relational data service resources using OGSA-DAI the sqlQueryStatement activity must be used to specify an SQL query that selects the BLOBs from the database. Having obtained the BLOB from the database there are various ways of delivering it back to the client. This section discusses the various delivery options. Return in the response document using base 64 encodingThe simplest way to deliver BLOBs back to the client is to pass the output from the sqlQueryStatement activity to either the sqlResultsToXML or sqlResultsToCSV activities as shown: The sqlResultsToXML and sqlResultsToCSV activities convert an SQL result set into a textual format that can be returned to the client within the response document. Within these textual formats binary data will be represented using base64 encoding. Base 64 encoding is a way of encoding arbitrary binary data using only printable ASCII characters. The size in bytes of encoded data will be approximately 135% of the original data. The advantages of this approach are:
The disadvantages of this approach are:
Return as an attachment built using byte arraysThe expense of base 64 encoding can be avoided by returning the binary data to the client as an attachment to the response message. In order to deliver the binary data as an attachment we must first extract the binary data from the SQL result set. This can be done using the sqlResultToBytes activity the output of which can then be sent to the deliverToAttachment activity as shown: The sqlResultToBytes activity can extract a single BLOB from the result set. It does not support multiple BLOBs within a row nor multiple rows. The sqlResultToBytes allows the client to specify the number of bytes in each block and hence allows the binary data to be streamed. The deliverToAttachment activity reassembles these blocks into a single big byte array that is then passed to the SOAP engine for delivery as an attachment. Thus the deliverToAttachment activity does not support streaming and must hold the whole BLOB in memory at one time. For large BLOBs this is an undesirable approach to take. The advantages of this approach are:
The disadvantages of this approach are:
Return as an attachment built using a fileThe deliverToAttachment activity also supports inputs of type java.io.File. When the activity gets such an input it does not need to hold the whole contents of the file in memory in order to send it as an attachment. It therefore has a much smaller memory requirement than if the same data was passed as a sequence of byte array blocks. In order to make use of this functionality the output from the sqlResultToBytes activity can be passed to a bytesToTempFile activity and then onto the deliverToAttachment as shown: The advantages of this approach are:
The disadvantages of this approach are:
Return using FTP or GridFTPIf the client wishes the BLOB data to appear as a file on their system then the you can deliver the file via FTP with the deliverToURL activity or via GridFTP using the deliverToGFTP activity. Both of these approachs require the intermediate sqlResultToBytes activity as show: Both of these delivery activities can stream the data to the destination in blocks. It is therefore good practice to keep the memory usage small by selecting a suitable block size to use with the sqlResultToBytes activity. The advantages of this approach are:
The disadvantages of this approach are:
Streaming using Data Transport functionalityAnother delivery mechanism that may be useful when moving BLOBs is the OGSA-DAI Data Transport functionality. This has the advantage of allowing data be to delivered from one OGSA-DAI perform document to another. The chain of activities is shown: OGSA-DAI's Data Transport functionality also allows clients to obtain the data in streamed blocks so it can be useful for more than service to service data transport. The advantages of this approach are:
The disadvantages of this approach are:
IssuesAll of the recommended approaches for handling all but the smallest of BLOBs recommend using the sqlResultToBytes. This activity can only handle a single BLOB in the result set. It should be easy to extend the activity so that it can have multiple outputs and so can handle multiple BLOBs in each row. Handling multiple rows is a bit more tricky. Doing so would require outputting special markers to separate one BLOBs data stream from the next. The various other activities to which this output is delivered would also be required to handle these new markers and behave accordingly. This is something the OGSA-DAI team hope to look into for future releases. Writing BLOBsTo write BLOBs to relational data service resources using OGSA-DAI the sqlUpdateStatement activity must be used. This activity must specify a parameterized query and also specify inputs to which the parameter values must be sent. The sqlUpdateStatement expects each input block to contain a whole column value. To pass BLOB data you have two options:
There are various options for creating the byte stream that is piped into the bytesToTempFile activity. This section discusses these options. Get data using FTP or GridFTPRemote files can be delivered as blocks of byte arrays into OGSA-DAI using the deliverFromURL or deliverFromGFTP activities. The output of these activities can be piped into the bytesToTempFile activity and on to the sqlUpdateStatement activity to insert the data in the remove file into the database as show: The advantages of this approach are:
The disadvantages of this approach are:
Get data using Data Transport functionalityOGSA-DAI's Data Transport functionality can be used to provide the input to the bytesToTempFile activity. This has the advantage of allowing data be to delivered from one OGSA-DAI perform document to another. The chain of activities is shown: OGSA-DAI's Data Transport functionality also allows clients to send the data in streamed blocks so it can be useful for more than service to service data transport. The advantages of this approach are:
The disadvantages of this approach are:
IssuesUnfortunately OGSA-DAI does not yet provide a deliverFromAttachment activity. Such an activity would allow BLOBs to be delivered to the OGSA-DAI server as an attachment. We hope to provide this in the future. Similarly we do not provide a mechanism for BLOB data to be sent to OGSA-DAI within the perform document using base 64 encoding. We will like to provide this functionality in the future. Database and operating system specific issuesProcessing large BLOBs can require that service administrators and client have a good understanding of the database and operating system the are using. Any database specific issues we have discovered are logged here. MySQLMySQL has a variable called max_allowed_packet that controls the maximum size of the communication buffer between the database and a client. This variable limits the size of a row in a result set. In order to return rows that contain large BLOBs this variable may have to be set to a value higher than the default. Windows operating systemToo many repeated calls to the server within a short period of time can cause clients running on windows to run out of reserved ports. This can happen down using the Data Transport functionality to transfer a large BLOB in lots of small chunks. When this happens clients will see the following error: java.net.BuildException: Address already in use: connectA detailed explanation of the problem and suggested solutions can be found here. In many cases the easiest solution is simply to transfer the data in a smaller number of larger chunks. |
Copyright ? The University of Edinburgh, 2005-2007. Please send comments to . |