High Level API¶

The high level API contains all the routines necessary for calculating POP metrics via the Jupyter notebook interface.

Trace Data Summary Statistics Calculator Classes¶

PyPop provides classes for importing summary statistics for different profiling tools.

Currently the following tools are supported:

Extrae

class pypop.traceset.RunData(metadata, stats, tracefile=None, chopped=False)[source]¶

metadata¶: TraceMetadata – Trace metadata information

stats¶: pd.DataFrame – Trace statistics

tracefile¶: str – Tracefile path

chopped¶: bool – Was tracefile chopped to ROI before analysis?

class pypop.traceset.TraceSet(path_list=None, cache_stats=True, ignore_cache=False, chop_to_roi=False, no_progress=False)[source]¶

A set of tracefiles for collective analysis

Collect statistics from provided trace files, currently only Extrae .prv files are supported.

This are the necessary statistics for calculating the POP metrics (including hybrid metrics) for the provided application traces. Data caching is used to improve performance of subsequent analysis runs, with md5 checksumming used to detect tracefile changes. (This is particularly useful for large trace files where calculation of statistics can take a long time).

Parameters:

path_list (str or iterable of str) – String or iterable of strings providing path(s) to the tracefiles of interest.
cache_stats (bool) – Cache the calculated statistics as a pickled data file in the trace directory, along with the checksum of the source trace file. (Default True)
ignore_cache (bool) – By default, if a cache file is present for a given trace it will be used. This behaviour can be overridden by setting ignore_cache=True.
chop_to_roi (bool) – If true, cut trace down to the section bracketed by the first pair of Extrae_startup and Extrae_shutdown commands. Default false.
no_progress (bool) – If true, disable the use of tqdm progress bar

add_traces(path_list=None, cache_stats=True, ignore_cache=False, chop_to_roi=False, no_progress=False)[source]¶

Collect statistics from provided trace files, currently only Extrae .prv files are supported.

This are the necessary statistics for calculating the POP metrics (including hybrid metrics) for the provided application traces. Data caching is used to improve performance of subsequent analysis runs, with md5 checksumming used to detect tracefile changes. (This is particularly useful for large trace files where calculation of statistics can take a long time).

Parameters:

path_list (str or iterable of str) – String or iterable of strings providing path(s) to the tracefiles of interest.
cache_stats (bool) – Cache the calculated statistics as a pickled data file in the trace directory, along with the checksum of the source trace file. (Default True)
ignore_cache (bool) – By default, if a cache file is present for a given trace it will be used. This behaviour can be overridden by setting ignore_cache=True.
chop_to_roi (bool) – If true, cut trace down to the section bracketed by the first pair of Extrae_startup and Extrae_shutdown commands. Default false.
no_progress (bool) – If true, disable the use of tqdm progress bar

by_commsize()[source]¶

Return a dictionary of traces keyed by commsize

This is a helper function equivalent to

by_key(lambda x: x.metadata.application_layout.commsize)

Returns:	traces_by_key – A dictionary of traces organised by the requested key
Return type:	dict

by_key(key)[source]¶

Return a dictionary of traces with given key

Parameters:	key (func) – A function which takes a RunData object and returns a valid dictionary key
Returns:	traces_by_key – A dictionary of traces organised by the requested key
Return type:	dict

Metric Calculator Classes¶

PyPOP provides calculator classes for generating the POP Metrics for different application types.

Currently PyPOP supports calculation of the following Metric types:

`MPI_Metrics`	Pure MPI Metrics.
`MPI_OpenMP_Metrics`	Proposed Hybrid MPI+OpenMP Metrics.

class pypop.metrics.MPI_Metrics(stats_dict, ref_key=None, sort_keys=True)[source]¶

Pure MPI Metrics.

Parameters:

stats_dict (dict or list of pd.DataFrame) – Statistics as collected with collect_statistics(). Dictionary keys will be used as the dataframe index. If a list, a dict will be constructed by enumeration.
ref_key (scalar) – Key of stats_dict that should be used as the reference for calculation of scaling values. If not specified, the lexical minimum key will be used (i.e min(stats_dict.keys()).
sort_keys (bool) – If true (default), lexically sort the keys in the returned DataFrame.

metric_data¶: pandas.DataFrame – Calculated metric data.

metrics¶: List of pypop.metrics.Metric – List of metrics that will be calculated.

plot_scaling(x_key='Number of Processes', y_key='Speedup', label=None, title=None)¶

Plot scaling graph with region shading.

Plots scaling data from pandas dataframe(s). The 0-80% and 80-100% scaling regions are shaded for visual identification. Multiple scaling lines may be plotted by passing a dict of dataframes.

Parameters:	x_key (scalar) – Key of Dataframe column to use as x-axis. y_key (scalar) – key of Dataframe column to use as y-axis. label (str or None) – Label to be used for y-axis and data series. Defaults to y_key. title (str or None) – Optional title for plot.
Returns:	figure – Figure containing complete scaling plot.
Return type:	matplotlib.figure.Figure

plot_table(columns_key='Number of Processes', title=None, columns_label=None, good_thres=0.8, bad_thres=0.5, skipfirst=0, bwfirst=0)¶

Plot Metrics in colour coded Table

Parameters:	columns_key (str or None) – Key to pandas dataframe column containing column heading data (default “Number of Processes”). If None then the index will be used. title (str or None) – Title for table. columns_label (str or None) – Label to apply to column heading data (defaults to value of columns_key). good_thres (float [0.0 - 1.0]) – Threshold above which cells are shaded green. bad_thres (float [0.0 - 1.0]) – Threshold below which cells are shaded red. skipfirst (int) – Skip output of first N columns of metric data (default 0). bwfirst (int) – Skip coloring of first N columns of metric data (default 0).
Returns:	figure – Figure containing the metrics table.
Return type:	matplotlib.figure.Figure

class pypop.metrics.MPI_OpenMP_Metrics(stats_dict, ref_key=None, sort_keys=True)[source]¶

Proposed Hybrid MPI+OpenMP Metrics.

Parameters:

stats_dict (dict or list of pd.DataFrame) – Statistics as collected with collect_statistics(). Dictionary keys will be used as the dataframe index. If a list, a dict will be constructed by enumeration.
ref_key (scalar) – Key of stats_dict that should be used as the reference for calculation of scaling values. If not specified, the lexical minimum key will be used (i.e min(stats_dict.keys()).
sort_keys (bool) – If true (default), lexically sort the keys in the returned DataFrame.

metric_data¶: pandas.DataFrame – Calculated metric data.

metrics¶: List of pypop.metrics.Metric – List of metrics that will be calculated.

plot_scaling(x_key='Number of Processes', y_key='Speedup', label=None, title=None)¶

Plot scaling graph with region shading.

Plots scaling data from pandas dataframe(s). The 0-80% and 80-100% scaling regions are shaded for visual identification. Multiple scaling lines may be plotted by passing a dict of dataframes.

Parameters:	x_key (scalar) – Key of Dataframe column to use as x-axis. y_key (scalar) – key of Dataframe column to use as y-axis. label (str or None) – Label to be used for y-axis and data series. Defaults to y_key. title (str or None) – Optional title for plot.
Returns:	figure – Figure containing complete scaling plot.
Return type:	matplotlib.figure.Figure

plot_table(columns_key='Number of Processes', title=None, columns_label=None, good_thres=0.8, bad_thres=0.5, skipfirst=0, bwfirst=0)¶

Plot Metrics in colour coded Table

Parameters:	columns_key (str or None) – Key to pandas dataframe column containing column heading data (default “Number of Processes”). If None then the index will be used. title (str or None) – Title for table. columns_label (str or None) – Label to apply to column heading data (defaults to value of columns_key). good_thres (float [0.0 - 1.0]) – Threshold above which cells are shaded green. bad_thres (float [0.0 - 1.0]) – Threshold below which cells are shaded red. skipfirst (int) – Skip output of first N columns of metric data (default 0). bwfirst (int) – Skip coloring of first N columns of metric data (default 0).
Returns:	figure – Figure containing the metrics table.
Return type:	matplotlib.figure.Figure

class pypop.metrics.MPI_OpenMP_Multiplicative_Metrics(stats_dict, ref_key=None, sort_keys=True)[source]¶

Proposed Hybrid MPI+OpenMP Metrics (multiplicative version).

Parameters:

stats_dict (dict or list of pd.DataFrame) – Statistics as collected with collect_statistics(). Dictionary keys will be used as the dataframe index. If a list, a dict will be constructed by enumeration.
ref_key (scalar) – Key of stats_dict that should be used as the reference for calculation of scaling values. If not specified, the lexical minimum key will be used (i.e min(stats_dict.keys()).
sort_keys (bool) – If true (default), lexically sort the keys in the returned DataFrame.

metric_data¶: pandas.DataFrame – Calculated metric data.

metrics¶: List of pypop.metrics.Metric – List of metrics that will be calculated.

plot_scaling(x_key='Number of Processes', y_key='Speedup', label=None, title=None)¶

Plot scaling graph with region shading.

Plots scaling data from pandas dataframe(s). The 0-80% and 80-100% scaling regions are shaded for visual identification. Multiple scaling lines may be plotted by passing a dict of dataframes.

Parameters:	x_key (scalar) – Key of Dataframe column to use as x-axis. y_key (scalar) – key of Dataframe column to use as y-axis. label (str or None) – Label to be used for y-axis and data series. Defaults to y_key. title (str or None) – Optional title for plot.
Returns:	figure – Figure containing complete scaling plot.
Return type:	matplotlib.figure.Figure

plot_table(columns_key='Number of Processes', title=None, columns_label=None, good_thres=0.8, bad_thres=0.5, skipfirst=0, bwfirst=0)¶

Plot Metrics in colour coded Table

Parameters:	columns_key (str or None) – Key to pandas dataframe column containing column heading data (default “Number of Processes”). If None then the index will be used. title (str or None) – Title for table. columns_label (str or None) – Label to apply to column heading data (defaults to value of columns_key). good_thres (float [0.0 - 1.0]) – Threshold above which cells are shaded green. bad_thres (float [0.0 - 1.0]) – Threshold below which cells are shaded red. skipfirst (int) – Skip output of first N columns of metric data (default 0). bwfirst (int) – Skip coloring of first N columns of metric data (default 0).
Returns:	figure – Figure containing the metrics table.
Return type:	matplotlib.figure.Figure