Package netlogger :: Package analysis :: Package workflow :: Module stampede_statistics

Module stampede_statistics

Library to generate statistics from the new Stampede 3.1 backend.

Usage:

stats = StampedeStatistics(connString='sqlite:///montage.db')
stats.initialize('unique_wf_uuid')
stats.set_job_filter('dax')
print stats.get_total_jobs_status()
print stats.get_total_jobs_statistics()
stats.set_job_filter('dag')
print stats.get_total_jobs_status()
print stats.get_total_jobs_statistics()
etc.

Constructor and initialize methods:

The constructor takes a required sqlalchemy connection string as the first argument. The stats class will default to returning data in the "expanded workflow" mode. To change this behavior and only analyize a single workflow set the optional arg:

expand_workflow = False

along with the connection string argument.

The initialize method is called with a single argument - the wf_uuid of the desired "root workflow" whether returning data in expanded mode or not. The method will return True or False if a query exception is raised so the programmer can test for success before calling the subsequent query methods. This method is intended to be called once per object.

Job filtering:

Jobs can be filtered using any of the strings in the jobtype ENUM, with the addition of the values 'all' and 'nonsub' which will return all jobs and non-subworkflow jobs respectively. If the filter is not explicitly set, it will default to the 'all' mode.

The desired filter can be set with the set_job_filter() method. After setting this method, all subsequent calls to the query methods will return results according to the filter. This can be set and reset as many times as the user desires. There is an example of re/setting the job filter in the usage section above. The query methods will return different values after the filter is re/set.

Return values from methods:

The return value types will vary from method to method. Most of the methods will return a single integer or floating point number.

Methods which return rows from the DB (rather than just a number) will return a list which can be interacted with in one of two ways - either by array index (list of tuples) or by a named attr (list of objects). The two following methods of interacting with the same query results will both produce the same output:

Example:

for row in s.get_job_kickstart():
    print row[0], row[1], row[2]
    print row.job_id, row.job_name, row.kickstart

Either syntax will work. When using the named attribute method, the attributes are the names of the columns/aliases in the SELECT stanza of the query. If the row returned by the method is printed, it will display as a tuple of results per row.

Methods:

get_sub_workflow_ids
get_descendant_workflow_ids
get_total_jobs_status
get_total_succeeded_jobs_status
get_total_failed_jobs_status
get_total_unknown_jobs_status
get_total_tasks_status
get_total_succeeded_tasks_status
get_total_failed_tasks_status
get_total_jobs_statistics
get_total_succeeded_jobs_statistics
get_total_failed_jobs_statistics
get_total_tasks_statistics
get_total_succeeded_tasks_statistics
get_total_failed_tasks_statistics
get_workflow_wall_time
get_workflow_cum_job_wall_time
get_submit_side_job_wall_time
get_job_name
get_job_site
get_job_kickstart
get_job_runtime
get_job_seqexec
get_job_seqexec_delay
get_condor_q_time
get_resource_delay
get_dagman_delay
get_post_time
get_transformation_statistics

Methods listed in order of query list on wiki.

https://confluence.pegasus.isi.edu/display/pegasus/Pegasus+statistics+python+version

Author: Monte Goode

Classes

[hide private]

StampedeStatistics

Variables

[hide private]

__rcsid__ = '$Id: stampede_statistics.py 28074 2011-06-09 15:5...

__package__ = 'netlogger.analysis.workflow'

Variables Details

[hide private]

rcsid

Value:

'$Id: stampede_statistics.py 28074 2011-06-09 15:50:35Z mgoode $'