gov.ornl.csed.csiir.idr.kdd.dataservice.client
Class DataServiceClient

java.lang.Object
  extended by gov.ornl.csed.csiir.idr.kdd.dataservice.client.DataServiceClient

public class DataServiceClient
extends java.lang.Object

Provides an API for accessing the KDD data service. Maintains a thread pool to service asynchronous requests for data set samples, the processing of queries, and the retrieval of documents.


Constructor Summary
DataServiceClient(java.net.URI uri)
          Constructs a data service client object with the supplied URI and the default thread pool size of 16.
DataServiceClient(java.net.URI uri, int threadPoolSize)
          Constructs a data service client object with the supplied URI and the supplied thread pool size.
 
Method Summary
 java.util.Set<java.lang.String> getActiveProcessIdSet()
          Retrieve the unique query ID's for all of the active asynchronous processes.
 java.net.URL getDocumentURL(java.lang.String dataSetId, java.lang.String documentId, Format format, java.lang.String logId)
          Retrieves the URL for a source document.
 boolean requestCancel(java.lang.String queryId)
          Request to cancel the asynchronous retrieval of a sample set, query, document, or query count request.
 DataSetInformationResponse requestDataSetInformation(java.lang.String dataSetId, java.lang.String logId)
          Retrieves information about a single data set from the server.
 DataSetListResponse requestDataSetList(java.lang.String logId)
          Retrieves a list of the available data sets from the server.
 java.lang.String requestDataSetQuery(java.lang.String dataSetId, DataSetQueryRequest dataSetQueryRequest, DataConsumer consumer, java.lang.String logId)
          Retrieves the records resulting from the provided query.
 java.lang.String requestDataSetQueryRecordCount(java.lang.String dataSetId, DataSetQueryRequest dataSetQueryRequest, RecordCountConsumer consumer, java.lang.String logId)
          Retrieves the total number of records that match the query, along with the time taken to execute the query.
 java.lang.String requestDataSetSamples(java.lang.String dataSetId, DataSetSamplesRequest dataSetSamplesRequest, DataConsumer consumer, java.lang.String logId)
          Retrieves the sample records from the data set.
 DataSetSchemaResponse requestDataSetSchema(java.lang.String dataSetId, java.lang.String logId)
          Retrieves the schema for a data set.
 java.lang.String requestDocument(java.lang.String dataSetId, java.lang.String documentId, Format format, DocumentConsumer consumer, java.lang.String logId)
          Retrieves the source document asynchronously.
 void shutdown()
          Close the connection to the data service and kill the thread pool.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DataServiceClient

public DataServiceClient(java.net.URI uri)
Constructs a data service client object with the supplied URI and the default thread pool size of 16.

Parameters:
uri - the URI for the data service

DataServiceClient

public DataServiceClient(java.net.URI uri,
                         int threadPoolSize)
Constructs a data service client object with the supplied URI and the supplied thread pool size.

Parameters:
uri - the URI for the data service
threadPoolSize - the size of the thread pool that services asynchronous requests
Method Detail

requestDataSetList

public DataSetListResponse requestDataSetList(java.lang.String logId)
                                       throws KDDDataServiceClientException
Retrieves a list of the available data sets from the server.

Parameters:
logId - a value provided from the Result Set Management System to facilitate log file reconciliation
Returns:
a list of available data sets along with their data set ID.
Throws:
KDDDataServiceClientException

requestDataSetInformation

public DataSetInformationResponse requestDataSetInformation(java.lang.String dataSetId,
                                                            java.lang.String logId)
                                                     throws KDDDataServiceClientException
Retrieves information about a single data set from the server.

Parameters:
dataSetId - the data set ID for the request
logId - a value provided from the Result Set Management System to facilitate log file reconciliation
Returns:
all available information about a data set.
Throws:
KDDDataServiceClientException

requestDataSetSchema

public DataSetSchemaResponse requestDataSetSchema(java.lang.String dataSetId,
                                                  java.lang.String logId)
                                           throws KDDDataServiceClientException
Retrieves the schema for a data set.

Parameters:
dataSetId - the data set ID for the request
logId - a value provided from the Result Set Management System to facilitate log file reconciliation
Returns:
the schema
Throws:
KDDDataServiceClientException

getDocumentURL

public java.net.URL getDocumentURL(java.lang.String dataSetId,
                                   java.lang.String documentId,
                                   Format format,
                                   java.lang.String logId)
                            throws KDDDataServiceClientException
Retrieves the URL for a source document.

Parameters:
dataSetId - the data set ID for the request
documentId - the document ID for the request
format - the format for the returned document
logId - a value provided from the Result Set Management System to facilitate log file reconciliation
Returns:
the URL
Throws:
KDDDataServiceClientException

requestDocument

public java.lang.String requestDocument(java.lang.String dataSetId,
                                        java.lang.String documentId,
                                        Format format,
                                        DocumentConsumer consumer,
                                        java.lang.String logId)
                                 throws KDDDataServiceClientException
Retrieves the source document asynchronously. For unstructured data, requesting RAW format retrieves the document in its native format and request TEXT format retrieves the extracted plain text content of the document. For structured data, requesting RAW or TEXT format retrieves the data record in comma separated value (CSV) format. The provided consumer receives the HTML response code, MIME type, the bytes for the document, and a done event. If the document is available the response code is 200, otherwise 404. This process can be cancelled via the requestCancel method.

Parameters:
dataSetId - the data set ID
documentId - the document ID
format - the requested format
consumer - a class instance that implements the DocumentConsumer interface
logId - a value provided from the Result Set Management System to facilitate log file reconciliation
Returns:
the unique query ID
Throws:
KDDDataServiceClientException

requestDataSetSamples

public java.lang.String requestDataSetSamples(java.lang.String dataSetId,
                                              DataSetSamplesRequest dataSetSamplesRequest,
                                              DataConsumer consumer,
                                              java.lang.String logId)
                                       throws KDDDataServiceClientException
Retrieves the sample records from the data set. The provided consumer receives the HTML response code, possibly multiple ResultSetReponse instances and a done event. If the document is available the response code is 200, otherwise 404. This process can be cancelled via the requestCancel method.

Parameters:
dataSetId - the data set ID
dataSetSamplesRequest - the sample request object
consumer - a class instance that implements the DataConsumer interface.
logId - a value provided from the Result Set Management System to facilitate log file reconciliation
Returns:
the unique query ID
Throws:
KDDDataServiceClientException

requestDataSetQuery

public java.lang.String requestDataSetQuery(java.lang.String dataSetId,
                                            DataSetQueryRequest dataSetQueryRequest,
                                            DataConsumer consumer,
                                            java.lang.String logId)
                                     throws KDDDataServiceClientException
Retrieves the records resulting from the provided query. The provided consumer receives the HTML response code, possibly multiple ResultSetReponse instances and a done event. If the document is available the response code is 200, otherwise 404. This process can be cancelled via the requestCancel method.

Parameters:
dataSetId - the data set ID
dataSetQueryRequest - the query
consumer - a class instance that implements the DataConsumer interface
logId - a value provided from the Result Set Management System to facilitate log file reconciliation
Returns:
the unique query ID
Throws:
KDDDataServiceClientException

requestDataSetQueryRecordCount

public java.lang.String requestDataSetQueryRecordCount(java.lang.String dataSetId,
                                                       DataSetQueryRequest dataSetQueryRequest,
                                                       RecordCountConsumer consumer,
                                                       java.lang.String logId)
                                                throws KDDDataServiceClientException
Retrieves the total number of records that match the query, along with the time taken to execute the query.

Parameters:
dataSetId - the data set ID
dataSetQueryRequest - the query
consumer - a class instance that implements the RecordCountConsumer interface
logId - a value provided from the Result Set Management System to facilitate log file reconciliation
Returns:
the unique query ID
Throws:
KDDDataServiceClientException

requestCancel

public boolean requestCancel(java.lang.String queryId)
Request to cancel the asynchronous retrieval of a sample set, query, document, or query count request.

Parameters:
queryId - the unique query ID
Returns:
true if the cancel is successful, otherwise false

getActiveProcessIdSet

public java.util.Set<java.lang.String> getActiveProcessIdSet()
Retrieve the unique query ID's for all of the active asynchronous processes.

Returns:
the set of unique query ID's

shutdown

public void shutdown()
Close the connection to the data service and kill the thread pool.