High Level API¶
The high level API contains all the routines necessary for calculating POP metrics via the Jupyter notebook interface.
Trace Data Summary Statistics Calculator Classes¶
PyPop provides classes for importing summary statistics for different profiling tools.
Currently the following tools are supported:
- Extrae
-
class
pypop.traceset.
RunData
(metadata, stats, tracefile=None, chopped=False)[source]¶ -
metadata
¶ TraceMetadata
– Trace metadata information
-
stats
¶ pd.DataFrame – Trace statistics
-
tracefile
¶ str – Tracefile path
-
chopped
¶ bool – Was tracefile chopped to ROI before analysis?
-
-
class
pypop.traceset.
TraceSet
(path_list=None, cache_stats=True, ignore_cache=False, chop_to_roi=False, no_progress=False)[source]¶ A set of tracefiles for collective analysis
Collect statistics from provided trace files, currently only Extrae .prv files are supported.
This are the necessary statistics for calculating the POP metrics (including hybrid metrics) for the provided application traces. Data caching is used to improve performance of subsequent analysis runs, with md5 checksumming used to detect tracefile changes. (This is particularly useful for large trace files where calculation of statistics can take a long time).
Parameters: - path_list (str or iterable of str) – String or iterable of strings providing path(s) to the tracefiles of interest.
- cache_stats (bool) – Cache the calculated statistics as a pickled data file in the trace directory, along with the checksum of the source trace file. (Default True)
- ignore_cache (bool) – By default, if a cache file is present for a given trace it will be used. This behaviour can be overridden by setting ignore_cache=True.
- chop_to_roi (bool) – If true, cut trace down to the section bracketed by the first pair of Extrae_startup and Extrae_shutdown commands. Default false.
- no_progress (bool) – If true, disable the use of tqdm progress bar
-
add_traces
(path_list=None, cache_stats=True, ignore_cache=False, chop_to_roi=False, no_progress=False)[source]¶ Collect statistics from provided trace files, currently only Extrae .prv files are supported.
This are the necessary statistics for calculating the POP metrics (including hybrid metrics) for the provided application traces. Data caching is used to improve performance of subsequent analysis runs, with md5 checksumming used to detect tracefile changes. (This is particularly useful for large trace files where calculation of statistics can take a long time).
Parameters: - path_list (str or iterable of str) – String or iterable of strings providing path(s) to the tracefiles of interest.
- cache_stats (bool) – Cache the calculated statistics as a pickled data file in the trace directory, along with the checksum of the source trace file. (Default True)
- ignore_cache (bool) – By default, if a cache file is present for a given trace it will be used. This behaviour can be overridden by setting ignore_cache=True.
- chop_to_roi (bool) – If true, cut trace down to the section bracketed by the first pair of Extrae_startup and Extrae_shutdown commands. Default false.
- no_progress (bool) – If true, disable the use of tqdm progress bar
Metric Calculator Classes¶
PyPOP provides calculator classes for generating the POP Metrics for different application types.
Currently PyPOP supports calculation of the following Metric types:
MPI_Metrics |
Pure MPI Metrics. |
MPI_OpenMP_Metrics |
Proposed Hybrid MPI+OpenMP Metrics. |
-
class
pypop.metrics.
MPI_Metrics
(stats_dict, ref_key=None, sort_keys=True)[source]¶ Pure MPI Metrics.
Parameters: - stats_dict (dict or list of pd.DataFrame) – Statistics as collected with collect_statistics(). Dictionary keys will be used as the dataframe index. If a list, a dict will be constructed by enumeration.
- ref_key (scalar) – Key of stats_dict that should be used as the reference for calculation of
scaling values. If not specified, the lexical minimum key will be used (i.e
min(stats_dict.keys())
. - sort_keys (bool) – If true (default), lexically sort the keys in the returned DataFrame.
-
metric_data
¶ pandas.DataFrame – Calculated metric data.
-
metrics
¶ List of
pypop.metrics.Metric
– List of metrics that will be calculated.
-
plot_scaling
(x_key='Number of Processes', y_key='Speedup', label=None, title=None)¶ Plot scaling graph with region shading.
Plots scaling data from pandas dataframe(s). The 0-80% and 80-100% scaling regions are shaded for visual identification. Multiple scaling lines may be plotted by passing a dict of dataframes.
Parameters: Returns: figure – Figure containing complete scaling plot.
Return type: matplotlib.figure.Figure
-
plot_table
(columns_key='Number of Processes', title=None, columns_label=None, good_thres=0.8, bad_thres=0.5, skipfirst=0, bwfirst=0)¶ Plot Metrics in colour coded Table
Parameters: - columns_key (str or None) – Key to pandas dataframe column containing column heading data (default “Number of Processes”). If None then the index will be used.
- title (str or None) – Title for table.
- columns_label (str or None) – Label to apply to column heading data (defaults to value of columns_key).
- good_thres (float [0.0 - 1.0]) – Threshold above which cells are shaded green.
- bad_thres (float [0.0 - 1.0]) – Threshold below which cells are shaded red.
- skipfirst (int) – Skip output of first N columns of metric data (default 0).
- bwfirst (int) – Skip coloring of first N columns of metric data (default 0).
Returns: figure – Figure containing the metrics table.
Return type: matplotlib.figure.Figure
-
class
pypop.metrics.
MPI_OpenMP_Metrics
(stats_dict, ref_key=None, sort_keys=True)[source]¶ Proposed Hybrid MPI+OpenMP Metrics.
Parameters: - stats_dict (dict or list of pd.DataFrame) – Statistics as collected with collect_statistics(). Dictionary keys will be used as the dataframe index. If a list, a dict will be constructed by enumeration.
- ref_key (scalar) – Key of stats_dict that should be used as the reference for calculation of
scaling values. If not specified, the lexical minimum key will be used (i.e
min(stats_dict.keys())
. - sort_keys (bool) – If true (default), lexically sort the keys in the returned DataFrame.
-
metric_data
¶ pandas.DataFrame – Calculated metric data.
-
metrics
¶ List of
pypop.metrics.Metric
– List of metrics that will be calculated.
-
plot_scaling
(x_key='Number of Processes', y_key='Speedup', label=None, title=None)¶ Plot scaling graph with region shading.
Plots scaling data from pandas dataframe(s). The 0-80% and 80-100% scaling regions are shaded for visual identification. Multiple scaling lines may be plotted by passing a dict of dataframes.
Parameters: Returns: figure – Figure containing complete scaling plot.
Return type: matplotlib.figure.Figure
-
plot_table
(columns_key='Number of Processes', title=None, columns_label=None, good_thres=0.8, bad_thres=0.5, skipfirst=0, bwfirst=0)¶ Plot Metrics in colour coded Table
Parameters: - columns_key (str or None) – Key to pandas dataframe column containing column heading data (default “Number of Processes”). If None then the index will be used.
- title (str or None) – Title for table.
- columns_label (str or None) – Label to apply to column heading data (defaults to value of columns_key).
- good_thres (float [0.0 - 1.0]) – Threshold above which cells are shaded green.
- bad_thres (float [0.0 - 1.0]) – Threshold below which cells are shaded red.
- skipfirst (int) – Skip output of first N columns of metric data (default 0).
- bwfirst (int) – Skip coloring of first N columns of metric data (default 0).
Returns: figure – Figure containing the metrics table.
Return type: matplotlib.figure.Figure
-
class
pypop.metrics.
MPI_OpenMP_Multiplicative_Metrics
(stats_dict, ref_key=None, sort_keys=True)[source]¶ Proposed Hybrid MPI+OpenMP Metrics (multiplicative version).
Parameters: - stats_dict (dict or list of pd.DataFrame) – Statistics as collected with collect_statistics(). Dictionary keys will be used as the dataframe index. If a list, a dict will be constructed by enumeration.
- ref_key (scalar) – Key of stats_dict that should be used as the reference for calculation of
scaling values. If not specified, the lexical minimum key will be used (i.e
min(stats_dict.keys())
. - sort_keys (bool) – If true (default), lexically sort the keys in the returned DataFrame.
-
metric_data
¶ pandas.DataFrame – Calculated metric data.
-
metrics
¶ List of
pypop.metrics.Metric
– List of metrics that will be calculated.
-
plot_scaling
(x_key='Number of Processes', y_key='Speedup', label=None, title=None)¶ Plot scaling graph with region shading.
Plots scaling data from pandas dataframe(s). The 0-80% and 80-100% scaling regions are shaded for visual identification. Multiple scaling lines may be plotted by passing a dict of dataframes.
Parameters: Returns: figure – Figure containing complete scaling plot.
Return type: matplotlib.figure.Figure
-
plot_table
(columns_key='Number of Processes', title=None, columns_label=None, good_thres=0.8, bad_thres=0.5, skipfirst=0, bwfirst=0)¶ Plot Metrics in colour coded Table
Parameters: - columns_key (str or None) – Key to pandas dataframe column containing column heading data (default “Number of Processes”). If None then the index will be used.
- title (str or None) – Title for table.
- columns_label (str or None) – Label to apply to column heading data (defaults to value of columns_key).
- good_thres (float [0.0 - 1.0]) – Threshold above which cells are shaded green.
- bad_thres (float [0.0 - 1.0]) – Threshold below which cells are shaded red.
- skipfirst (int) – Skip output of first N columns of metric data (default 0).
- bwfirst (int) – Skip coloring of first N columns of metric data (default 0).
Returns: figure – Figure containing the metrics table.
Return type: matplotlib.figure.Figure