home > about > ogsa-dqp >

What is OGSA-DQP?

OGSA-DAI components are either data access components or data integration components. A Distributed Query Processing (DQP) system is an example of a data integration component and can potentially provide effective declarative support for service orchestration as well as data integration. The service-based DQP framework described in [1],[2] provides an approach that:

  • supports queries over OGSA-DAI data services and over other services available on the Grid, thereby combining data access with analysis;
  • adapts techniques from parallel databases to provide implicit parallelism for complex data-intensive requests; and
  • uses the emerging standard for Grid data services to provide consistent access to database metadata and to interact with databases on the Grid.

The service-based DQP framework consists of the following two services:

  • Grid Distributed Query Service (Coordinator). The Grid Distributed Query Service (GDQS), or coordinator, is the main interaction point for the clients. When a coordinator is set up, it obtains the metadata and computational resource information that it needs to compile, optimise, partition and schedule distributed query execution plans over multiple execution nodes in the Grid. The implementation of the coordinator builds on a previous work on the Polar* distributed query processor for the Grid [3],[4] by encapsulating its compilation and optimisation functionality. The coordinator is currently implemented as a set of OGSA-DAI data service resources and activities.

  • Query Evaluation Service (Evaluator). The Query Evaluation Service (QES), or evaluator, is used by the coordinator to execute query plans generated by the query compiler, optimiser and scheduler. Each evaluator evaluates a partition of the query execution plan assigned to it by a coordinator. A set of evaluators participating in a query form a tree through which the data flows from leaf evaluators which interact with Grid data services, up the tree to reach its destination.

As well as using the services provided by OGSA-DAI data services, the coordinator is itself implemented as an OGSA-DAI data service, and thus can be discovered and invoked in the same way as other OGSA-DAI data services. Consequently, the Grid stands to benefit from OGSA-DQP, through the provision of facilities for declarative request formulation that complement existing approaches to service orchestration, via uniform interfaces and interaction semantics.

Figure 1 provides an overview of the interactions during the instantiation and set-up of a OGSA-DQP coordinator as well as those that take place when a query is received and processed via a set of evaluators. The components in this figure and the numbered interactions between each component are now described. The 3-dot sequence in this figure can, as usual, be read as `and so on, up to'. This description of OGSA-DQP is intended to give a high level overview of the system.

Setting up and executing queries using OGSA-DQP
Figure 1: Setting up and executing queries using OGSA-DQP

1: An OGSA-DQP coordinator consists of two types of OGSA-DAI data service resources: GDQS factory data service resources and GDQS data service resources. Initially, an installed coordinator service will expose only a GDQS factory data service resource. This data service resource is then used to create GDQS data service resources which can be used by a client to execute queries.

In this first step in the interaction between a client and OGSA-DQP, the client uses a deployed GDQS factory data service resource to create a configured GDQS data service resource. The client interacts with the GDQS factory data service resource by sending an OGSA-DAI perform document which specifies that a DQPFactory activity should be executed. The DQPFactory activity is able to interact with a GDQS factory data service resource in order to dynamically deploy a GDQS data service resource. The DQPFactory activity is parameterised by an XML document which specifies exactly how the deployed GDQS data service resource should be configured. Configuration parameters include the databases and evaluators which can be utilised by the data service resource which is to be created. The result of this interaction is that a GDQS data service resource is created and initialised. The coordinator service now exposes this dynamically deployed GDQS data service resource and it is automatically assigned a resource ID by OGSA-DAI.

2: During the initialisation of the GDQS data service resource, the schemas of the databases it will use are imported by contacting the OGSA-DAI data services which wrap these databases.

3: The client receives the result of the perform document submitted in step 1. This result contains the resource ID needed by the client to identify the created GDQS data service resource in subsequent interactions with this data service resource.

[Note] steps 1-3 need not take place if a GDQS data service resource already exists which imports the databases and analysis services required by a client (if this is the case, the client should contact the existing GDQS data service resource directly). Each GDQS data service resource is able to process multiple concurrent queries and the GDQS data service resource is not terminated by a client following a query session. Steps 1-3 represent a setup process which is necessary to configure a GDQS data service resource for use by one or more clients.

4: The client submits a perform document containing a query. Queries are written in OQL and are executed by the OQLQueryStatement activity. The GDQS data service resource uses the Polar* query compiler to parse, optimise and schedule the query. A query plan is created, consisting of a number of partitions. Each partition specifies an individual evaluator's role in the query plan.

5: Query partitions are sent to the relevant evaluator services.

6: Some evaluators interact directly with OGSA-DAI data service to obtain data.

7: Other evaluators may interact with other evaluators to implement their role in the execution of the query.

8 - 9: Results propagate back from the evaluators to the coordinator and eventually back to the client.

[Note] OGSA-DQP is also able to invoke Web services from within queries. This is not illustrated in Figure 1 in order to preserve the clarity of the figure and its associated description. Also omitted from the figure are the resource properties made available by the GDQS data service resource. Following initialisation, the GDQS data service resource provides a resource property enabling the client to obtain a description of the database schemas imported by OGSA-DQP.

References

[1] M. N. Alpdemir, A. Mukherjee, N.W. Paton, P.Watson, A. A. Fernandes, A. Gounaris, and J. Smith. Service-based distributed querying on the grid. In the Proceedings of the First International Conference on Service Oriented Computing, pages 467-482. Springer, 15-18 December 2003.

[2] M.Nedim Alpdemir, Arijit Mukherjee, Norman W. Paton, Paul Watson, Alvaro A.A. Fernandes, Anastasios Gounaris, and Jim Smith. OGSA-DQP: A service-based distributed query processor for the Grid. In Simon J. Cox, editor, Proceedings of UK e-Science All Hands Meeting Nottingham. EPSRC, 24 September 2003.

[3] J. Smith, A. Gounaris, P. Watson, N. W. Paton, A. A. A. Fernandes, and R. Sakellariou. Distributed Query Processing on the Grid. In Proc. Grid Computing 2002, pages 279-290. Springer, LNCS 2536, 2002.

[4] J. Smith, A. Gounaris, P. Watson, N. W. Paton, A. A. A. Fernandes, R. Sakellariou, Distributed Query Processing on the Grid, Intl. J. High Performance Computing Applications, Vol 17, No 4, 353-368, 2003 (Extended Version of Grid 2002 paper selected for publication in special issue).