GTPin
GTPin: Perftrace Sample Tool

The Perftrace tool generates a dynamic trace of execution cycles for each kernel SW thread invocation on any given HW thread

The trace is provided for each kernel, each Draw/Enqueue granularity, and each separate HW thread.

Running the Perftrace tool

The Perftrace tool (as well as all GTPin tracing tools) works in two phases, which should be run separately:

To run the pre-processing phase of the perftrace tool in its default configuration, use the following command:

Profilers/Bin/gtpin -t perftrace --phase 1 -- app

Note:
You may run this phase only once per application.

To run the trace gathering phase of the perftrace tool in its default configuration, use the following command:

Profilers/Bin/gtpin -t perftrace --phase 2 -- app

Modes of execution

The Perftrace tool supports these execution modes:

To run Perftrace in a specific mode, you can specify the mode in the command line, using the "--mode n" argument, where n = 0, 1, or 2. If you do not specify the argument, GTPin will use default mode 0 (Funtime).

How to understand Perftrace results

When you run the in-house GTPin Perftrace tool - in its default configuration - for the pre-processing phase (phase 1), the tool generates the directory GTPIN_PROFILE_PERFTRACE0. In addition, the following two files are created within the current directory:

This file is an input to the trace gathering phase. It has the following format:

BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 262144

where, for each kernel, the maximum number of required trace records is provided.

This file contains informational data only, and has the following format:

BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1     262144  OpenCL 0  0
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1     131072  OpenCL 0  1
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1     262144  OpenCL 0  2
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1     131072  OpenCL 0  3
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1     131072  OpenCL 0  4
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1     262144  OpenCL 0  5
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1     131072  OpenCL 0  6
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1     131072  OpenCL 0  7
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1     131072  OpenCL 0  8
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1     262144  OpenCL 0  9
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1     131072  OpenCL 0  10

where each line corresponds to a single kernel for a single Draw/Enqueue command. The fields have the following meaning (from left to right):

When Perftrace is run for trace gathering (phase 2), the tool generates the directory: GTPIN_PROFILE_PERFTRACE1. GTPin saves the profiling results in the folder: GTPIN_PROFILE_PERFTRACE1\Session_Final. he traces for each kernel are saved in a separate sub-folder that has the same name as the kernel. Each Draw/Enqueue command has a separate trace, which is saved in a corresponding sub-directory, as shown in the following screenshot:

perftrace_res_dir_structure.jpg

How to uncompress Perftrace and read the trace

Each trace is saved in a compressed binary format, in a file called perftrace_compressed.bin, as shown above. To uncompress the trace, you must run a Profilers\Scripts\uncompress_perftrace.py Python Software Foundation Python* script, in the following manner (Python 3.5 or above is required):

python3 Profilers\Scripts\uncompress_perftrace.py --input_dir GTPIN_PROFILE_PERFTRACE1\Session_Final\BitonicSort\device_0__enqueue_0 --occupancy --gen 9

Use the --funtime, --memlatency, or --occupancy flags to specify the mode for the generated trace.

Running the script opens the compressed trace into separate traces for each HW thread, as shown in the following screenshot:

perftrace_uncompressed_res.jpg

where the trace generated on each HW thread is saved in a text file named: occupancy___s_0_ss_0_eu_1_tid_5.out. The file name indicates the HW thread topology ID (where S means Slice, DSS means DualSubSlice, SS means SubSlice, EU means Execution Unit, and TID refers to the HW thread ID). The resulting trace (such as the occupancy trace below) is provided in the following format:

 Invocation   Start          End
================================
       0        0x00000000007692      0x0000000000899a         duration = 4872
       1        0x00000000008ff0      0x00000000009962         duration = 2418
       2        0x00000000009d84      0x0000000000a614         duration = 2192
       3        0x0000000000aae4      0x0000000000b40c         duration = 2344
       4        0x0000000000b818      0x0000000000c0e2         duration = 2250
       5        0x0000000000c6a8      0x0000000000d096         duration = 2542
       6        0x0000000000d4fe      0x0000000000df18         duration = 2586
       7        0x0000000000e420      0x0000000000ed78         duration = 2392
       8        0x00000000000f8a      0x0000000000223a         duration = 4784
       9        0x00000000002d32      0x00000000003e7a         duration = 4424
      10        0x000000000047de      0x000000000058ec         duration = 4366
      11        0x00000000006040      0x0000000000729a         duration = 4698
      12        0x00000000007c56      0x000000000089e2         duration = 3468
      13        0x000000000099d0      0x0000000000abba         duration = 4586
      14        0x0000000000b52e      0x0000000000ca84         duration = 5462
      15        0x0000000000d478      0x0000000000e62e         duration = 4534
      16        0x0000000000ef04      0x00000000010430         duration = 5420
      17        0x00000000010970      0x00000000011e2a         duration = 5306
      18        0x00000000012408      0x000000000137fc         duration = 5108
      19        0x00000000014008      0x00000000015748         duration = 5952
      20        0x000000000160d2      0x00000000017974         duration = 6306
      21        0x00000000017f66      0x0000000001962a         duration = 5828
      22        0x00000000019936      0x0000000001b30e         duration = 6616
      23        0x0000000001b886      0x0000000001d4c0         duration = 7226

where each line corresponds to a single invocation of the kernel (a single kernel SW thread), on a specific HW thread, during the execution of a Draw/Enqueue command. The left column indicates the invocation number. Then the start and end timestamp counter values are provided; and finally, the delta of these two numbers (in other words, the duration of the SW thread).

This information allows you create, for example, an occupancy graph of the workload:

perftrace_occupancy_example.jpg

(Back to the list of all GTPin Sample Tools)

perftrace.h

00001 /*========================== begin_copyright_notice ============================
00002 Copyright (C) 2021-2023 Intel Corporation
00003 
00004 SPDX-License-Identifier: MIT
00005 ============================= end_copyright_notice ===========================*/
00006 
00007 /*!
00008  * @file Pertrace tool definitions
00009  */
00010 
00011 #ifndef PERFTRACE_H_
00012 #define PERFTRACE_H_
00013 
00014 #include <list>
00015 #include <map>
00016 #include <vector>
00017 
00018 #include "gtpin_api.h"
00019 #include "gtpin_tool_utils.h"
00020 #include "kernel_weight.h"
00021 
00022 using namespace gtpin;
00023 
00024 #pragma pack(push, 1)
00025 
00026 /* ============================================================================================= */
00027 // Struct PerfTraceRecord
00028 /* ============================================================================================= */
00029 /*!
00030  * Structure of the trace records.
00031  */
00032 struct alignas(16) PerfTraceFuntimeRecord
00033 {
00034     uint32_t sr0;       ///< State register sr0.0:ud
00035     uint32_t tileId;    ///< Tile ID
00036     uint64_t timeStart; ///< Time stamp counter before the first instruction of the kernel
00037     uint64_t timeEnd;   ///< Time stamp counter before the last (EOT) instruction of the kernel
00038 };
00039 
00040 struct alignas(16) PerfTraceMemoryLatencyRecord
00041 {
00042     uint32_t sr0;       ///< State register sr0.0:ud
00043     uint32_t tileId;    ///< Tile ID
00044     uint32_t cycles;    ///< Accumulated cycles
00045 };
00046 
00047 /* ============================================================================================= */
00048 // Class PerfTraceDispatch
00049 /* ============================================================================================= */
00050 /*!
00051  * Class that holds memory trace collected during a single kernel dispatch
00052  */
00053 class PerfTraceDispatch
00054 {
00055 public:
00056     /// Construct a PerfTraceDispatch object with the empty trace
00057     explicit PerfTraceDispatch(const IGtKernelDispatch& dispatch) : _isTrimmed(false) { dispatch.GetExecDescriptor(_kernelExecDesc); }
00058     
00059     /// Read the entire trace from the specified profile buffer into this object
00060     bool ReadTrace(const GtProfileTrace& traceAccessor, const IGtProfileBuffer& profileBuffer);
00061 
00062     const GtKernelExecDesc& KernelExecDesc() const { return _kernelExecDesc; } ///< @return Descriptor of this kernel dispatch
00063     uint32_t        Size()      const { return (uint32_t)_rawTrace.size(); }   ///< @return Trace size in bytes
00064     const uint8_t*  Data()      const { return _rawTrace.data(); }             ///< @return Trace data collected in this dispatch
00065     uint8_t*        Data()            { return _rawTrace.data(); }             ///< @return Trace data collected in this dispatch
00066     bool            IsEmpty()   const;                                         ///< @return true if the trace is empty
00067     bool            IsTrimmed() const { return _isTrimmed; }                   ///< @return true if the trace has been trimmed
00068 
00069 private:
00070     GtKernelExecDesc        _kernelExecDesc; ///< Kernel execution descriptor
00071     std::vector<uint8_t>    _rawTrace;       ///< Trace data collected in this kernel dispatch
00072     bool                    _isTrimmed;      ///< true if the trace has been trimmed to avoid buffer overflow
00073 };
00074 
00075 /* ============================================================================================= */
00076 // Class PerfTraceKernel
00077 /* ============================================================================================= */
00078 class PerfTraceKernel
00079 {
00080 public:
00081     PerfTraceKernel() = default;
00082 
00083     explicit PerfTraceKernel(const IGtKernelInstrument& kernelInstrument, const uint32_t recordSize, uint32_t numTiles);
00084 
00085     /*!
00086      * Read a trace recorded by the specified kernel dispatch. Create and add the corresponding PerfTraceDispatch
00087      * instance to this object
00088      */
00089     PerfTraceDispatch& AddPerfTrace(IGtKernelDispatch& kernelDispatch);
00090 
00091     std::string           Name()            const { return _name; }               ///< @return Kernel's name
00092     std::string           ExtendedName()    const { return _extName; }            ///< @return Kernel's extended name
00093     std::string           UniqueName()      const { return _uniqueName; }            ///< @return Kernel's unique name
00094     const GtGpuPlatform   Platform()        const { return _platform; }           ///< @return Kernel's platform
00095     const IGtGenModel&    GenModel()        const { return GetGenModel(_genId); } ///< @return Kernel's GEN model
00096     const GtProfileTrace& TraceAccessor()   const { return _traceAccessor; }      ///< @return Trace accessor
00097     void  DumpAsm()                         const;                                ///< Dump kernel's assembly text to file
00098 
00099      /// @return true, if tracing of this kernel is enabled
00100     uint32_t IsEnabled() const { return (_traceAccessor.MaxTraceSize() != 0); }
00101 
00102     /// @return Traces collected in kernel's dispatches
00103     typedef std::list<PerfTraceDispatch>  Traces;
00104     const Traces& GetTraces() const { return _traces; }
00105 
00106     /// @return Number of tiles
00107     uint32_t NumTiles() const { return _numTiles; }
00108 
00109 private:
00110     std::string         _name;              ///< Kernel's name
00111     std::string         _uniqueName;        ///< Kernel's unique name
00112     std::string         _extName;           ///< Kernel's extended name
00113     GtGpuPlatform       _platform;          ///< Kernel's platform
00114     GtGenModelId        _genId;             ///< Identifier of the GEN model, the kernel is compiled for
00115     std::string         _asmText;           ///< Kernel's assembly text
00116     GtProfileTrace      _traceAccessor;     ///< Trace accessor
00117     Traces              _traces;            ///< Traces collected in kernel's dispatches
00118     uint32_t            _numTiles;          ///< The number of supported tiles
00119 };
00120 
00121 /* ============================================================================================= */
00122 // Class PerfTrace
00123 /* ============================================================================================= */
00124 /*!
00125  * Implementation of the IGtTool interface for the Perftrace tool
00126  */
00127 class PerfTrace : public GtTool
00128 {
00129 public:
00130     /// Implementation of the IGtTool interface
00131     const char* Name() const { return "perftrace"; }
00132 
00133     void OnKernelBuild(IGtKernelInstrument& instrumentor);
00134     void OnKernelRun(IGtKernelDispatch& dispatcher);
00135     void OnKernelComplete(IGtKernelDispatch& dispatcher);
00136 
00137 public:
00138     virtual uint32_t RecordSize() const = 0; ///< @return Record size aligned to OWORD
00139 
00140 protected:
00141     /*!
00142      * Generate instrumentation for kernel
00143      * @param[in] instrumentor      Interface of the kernel being instrumented
00144      * @param[in] perfTraceKernel   Object that holds information about the kernel
00145      */
00146     virtual void Instrument(IGtKernelInstrument& instrumentor, const PerfTraceKernel& perfTraceKernel) = 0;
00147 
00148 protected:
00149     PerfTrace() = default;
00150     PerfTrace(const PerfTrace&) = delete;
00151     PerfTrace& operator = (const PerfTrace&) = delete;
00152     ~PerfTrace() = default;
00153 
00154 protected:
00155     std::map<GtKernelId, PerfTraceKernel>   _kernels;  ///< Collection of traces per kernel
00156 };
00157 
00158 /* ============================================================================================= */
00159 // Class PerfTraceFuntime
00160 /* ============================================================================================= */
00161 class PerfTraceFuntime : public PerfTrace
00162 {
00163 public:
00164     static PerfTraceFuntime* Instance();
00165     static void OnFini(void);
00166 
00167     inline uint32_t RecordSize() const { return sizeof(PerfTraceFuntimeRecord); }
00168 
00169 private:
00170 
00171     /*!
00172      * Generate instrumentation for kernel
00173      * @param[in] instrumentor      Interface of the kernel being instrumented
00174      * @param[in] perfTraceKernel   Object that holds information about the kernel
00175      */
00176     void Instrument(IGtKernelInstrument& instrumentor, const PerfTraceKernel& perfTraceKernel);
00177 
00178     /*!
00179      * Generate instrumentation for the specified basic block
00180      * @param[in] instrumentor      Interface of the kernel being instrumented
00181      * @param[in] bbl               Basic block to be instrumented
00182      * @param[in] perfTraceKernel   Object that holds information about the kernel
00183      */
00184     void InstrumentBbl(IGtKernelInstrument& instrumentor, const IGtBbl& bbl, const PerfTraceKernel& perfTraceKernel);
00185 
00186     void GeneratePreCode(GtGenProcedure& proc, const IGtGenCoder& coder);
00187     void GeneratePostCode(GtGenProcedure& proc, const IGtGenCoder& coder, const PerfTraceKernel& perfTraceKernel);
00188 
00189 private:
00190     PerfTraceFuntime() : PerfTrace() {}
00191     PerfTraceFuntime(const PerfTraceFuntime&) = delete;
00192     PerfTraceFuntime& operator = (const PerfTraceFuntime&) = delete;
00193     ~PerfTraceFuntime() {}
00194 
00195 private:
00196 
00197     GtReg _timeReg; ///< Virtual timer register
00198 };
00199 
00200 /* ============================================================================================= */
00201 // Class PerfTraceMemoryLatency
00202 /* ============================================================================================= */
00203 class PerfTraceMemoryLatency : public PerfTrace
00204 {
00205 public:
00206     static PerfTraceMemoryLatency* Instance();
00207 
00208     inline uint32_t RecordSize() const { return sizeof(PerfTraceMemoryLatencyRecord); }
00209 
00210     static void OnFini(void);
00211 
00212 private:
00213 
00214     /*!
00215      * Generate instrumentation for kernel
00216      * @param[in] instrumentor      Interface of the kernel being instrumented
00217      * @param[in] perfTraceKernel   Object that holds information about the kernel
00218      */
00219     void Instrument(IGtKernelInstrument& instrumentor, const PerfTraceKernel& perfTraceKernel);
00220 
00221     /*!
00222      * Generate instrumentation for the specified basic block
00223      * @param[in] instrumentor      Interface of the kernel being instrumented
00224      * @param[in] bbl               Basic block to be instrumented
00225      * @param[in] perfTraceKernel   Object that holds information about the kernel
00226      */
00227     void InstrumentBbl(IGtKernelInstrument& instrumentor, const IGtBbl& bbl, const PerfTraceKernel& perfTraceKernel);
00228 
00229     void GeneratePreCode(GtGenProcedure& proc, const IGtGenCoder& coder);
00230     void GeneratePostCode(GtGenProcedure& proc, const IGtGenCoder& coder);
00231     void GenerateFiniCode(GtGenProcedure& proc, const IGtGenCoder& coder, const PerfTraceKernel& perfTraceKernel);
00232 
00233 private:
00234     PerfTraceMemoryLatency() : PerfTrace() {}
00235     PerfTraceMemoryLatency(const PerfTraceMemoryLatency&) = delete;
00236     PerfTraceMemoryLatency& operator = (const PerfTraceMemoryLatency&) = delete;
00237     ~PerfTraceMemoryLatency() {}
00238 
00239 private:
00240 
00241     GtReg _timeReg;         ///< Virtual register to read time register
00242     GtReg _cyclesAccumReg;  ///< Virtual register to accumulate cycles values
00243 };
00244 
00245 
00246 /* ============================================================================================= */
00247 // Class PerfTracePreProcessor
00248 /* ============================================================================================= */
00249 /*!
00250  * Class that computes per-kernel trace sizes in the preprocessing phase, and provides access to
00251  * this data in the trace gathering phase
00252  */
00253 class PerfTracePreProcessor : public KernelWeight
00254 {
00255 public:
00256     uint64_t TraceSize(const std::string& extKernelName) const; ///< Given extended kernel name, return the trace size in bytes
00257     static void OnFini();                                       ///< Callback function registered with atexit()
00258 
00259     static PerfTracePreProcessor* Instance();
00260 
00261     /*!
00262      * @return Weight of the specified basic block in the kernel.
00263      * By default this function returns the bbl.NumIns() value, which means that the tool counts the number of executed
00264      * instructions.
00265      * Derived classes may give a different interpretation of the basic block's "weight" by overriding this function.
00266      * For example, in order to count the number of executed basic blocks. the override function may return 1 (one).
00267      */
00268     uint32_t GetBblWeight(IGtKernelInstrument& ki, const IGtBbl& bbl) const;
00269 
00270 private:
00271     PerfTracePreProcessor();
00272     PerfTracePreProcessor(const PerfTracePreProcessor&) = delete;
00273     PerfTracePreProcessor& operator = (const PerfTracePreProcessor&) = delete;
00274 
00275 private:
00276     void AggregateDispatchCounters(KernelWeightCounters& kc, KernelWeightCounters dc) const;
00277 
00278 protected:
00279     KernelWeightProfileData _kernelCounters; ///< Per-kernel counters of required trace records; collected in preprocessing phase
00280 
00281     static const char* _kernelPreProcessFileName;   ///< Name of the file that contains preprocessing data per kernel
00282     static const char* _dispatchPreProcessFileName; ///< Name of the file that contains preprocessing data per kernel dispatch
00283 
00284     PerfTrace*   _perfTrace;      ///< Corresponding PerfTrace object
00285 };
00286 
00287 /* ============================================================================================= */
00288 // Class PerfTracePostProcessor
00289 /* ============================================================================================= */
00290 /*!
00291  * Function object that processes kernel traces - stores them in files within the profile directory:
00292  *
00293  *    kernel_name
00294  *    |
00295  *        |- kernel_dispatch_1
00296  *           |- memorytrace_compressed.bin
00297  *        |- kernel_dispatch_2
00298  *           |- memorytrace_compressed.bin
00299  * The .bin trace files can be uncompressed by the uncompress_memtrace.exe utility.
00300  *
00301  * Format of .bin trace files:
00302  * - Static information:
00303  *     - Number of BBLs that access memory
00304  *     - For each BBL that accesses memory:
00305  *        - BBL ID
00306  *        - Number of SEND instructions in this basic block
00307  *        - For each SEND instruction:
00308  *            - Decoded information including offset, address model, address type, address payload size, etc
00309  * 
00310  * - Dynamic trace data:
00311  *      - Number of HW threads in which the trace was collected
00312  *      - For each HW thread:
00313  *          - HW Thread ID (in the format of sr0.0)
00314  *          - Number of records collected for this HW thread
00315  *          - All the records collected for this HW thread
00316  */
00317 class PerfTracePostProcessor
00318 {
00319 public:
00320     /// Construct a PerfTracePostProcessor object for the specified collection of kernel traces
00321     PerfTracePostProcessor(const IGtCore& gtpinCore, const PerfTraceKernel&  perfTraceKernel, const PerfTrace* perfTrace);
00322 
00323     /// Process all kernel traces associated with this object - store them in files within the profile directory
00324     bool operator()();
00325 
00326 protected:
00327     /// Derives global tid from the trace record
00328     virtual uint32_t GetGlobalTid(const uint8_t* traceRecord) const = 0;
00329 
00330     /// Derives global tid from the trace record
00331     virtual uint32_t GetTileId(const uint8_t* traceRecord) const = 0;
00332 
00333     /// Stores data from the trace record
00334     virtual void StoreTraceRecordData(const uint8_t* traceRecord, std::ofstream& fs) const = 0;
00335 
00336     /// Store the specified trace in the specified file stream
00337     void StoreTrace(const PerfTraceDispatch& trace, std::ofstream& fs);
00338 
00339     /// Store Global Thread Identifier
00340     void StoreGlobalTid(uint32_t gtid, std::ofstream& fs);
00341 
00342     /// Store the specified value in the specified file stream in the binary format
00343     template <typename T> void Store(const T& val, std::ofstream& fs) { fs.write((const char*)&val, sizeof(val)); }
00344 
00345 protected:
00346     struct TraceRecord       ///< Reference to the trace record
00347     {
00348         const void* record;  ///< Pointer to the record
00349         uint32_t    size;    ///< Size of the record in bytes, including header
00350     };
00351     using TraceRecordList     = std::list<TraceRecord>;       ///< List of references to trace records
00352     using PerTileTraceRecords = std::vector<TraceRecordList>; ///< Per tile trace records
00353 
00354 protected:
00355     const PerfTraceKernel*           _kernel;            ///< Kernel&traces to be processed
00356     std::string                      _kernelDir;         ///< Directory to store kernel's trace files
00357     std::vector<PerTileTraceRecords> _threadTraceRecords;///< Map of tile ID to Lists of trace records, indexed by the thread ID
00358 
00359     static const char*               _traceFileName;     ///< Name of the file to store trace in
00360 
00361     const PerfTrace*                 _perfTrace;         ///< Pointer to a corresponding instance of PerfTrace
00362 };
00363 
00364 /* ============================================================================================= */
00365 // Class PerfTracePostProcessorFuntime
00366 /* ============================================================================================= */
00367 class PerfTracePostProcessorFuntime : public PerfTracePostProcessor
00368 {
00369 public:
00370     /// Construct a PerfTracePostProcessorFuntime object for the specified collection of kernel traces
00371     PerfTracePostProcessorFuntime(const IGtCore& gtpinCore, const PerfTraceKernel&  perfTraceKernel) :
00372         PerfTracePostProcessor(gtpinCore, perfTraceKernel, PerfTraceFuntime::Instance()) {}
00373 
00374 private:
00375 
00376     /// Derives global tid from the trace record
00377     uint32_t GetGlobalTid(const uint8_t* traceRecord) const;
00378 
00379     /// Derives global tid from the trace record
00380     uint32_t GetTileId(const uint8_t* traceRecord) const;
00381 
00382     /// Stores data from the trace record
00383     void StoreTraceRecordData(const uint8_t* traceRecord, std::ofstream& fs) const;
00384 };
00385 
00386 /* ============================================================================================= */
00387 // Class PerfTracePostProcessorMemoryLatency
00388 /* ============================================================================================= */
00389 class PerfTracePostProcessorMemoryLatency : public PerfTracePostProcessor
00390 {
00391 public:
00392     /// Construct a PerfTracePostProcessorMemoryLatency object for the specified collection of kernel traces
00393     PerfTracePostProcessorMemoryLatency(const IGtCore& gtpinCore, const PerfTraceKernel&  perfTraceKernel) :
00394         PerfTracePostProcessor(gtpinCore, perfTraceKernel, PerfTraceMemoryLatency::Instance()) {}
00395 
00396 private:
00397 
00398     /// Derives global tid from the trace record
00399     uint32_t GetGlobalTid(const uint8_t* traceRecord) const;
00400 
00401     /// Derives global tid from the trace record
00402     uint32_t GetTileId(const uint8_t* traceRecord) const;
00403 
00404     /// Stores data from the trace record
00405     void StoreTraceRecordData(const uint8_t* traceRecord, std::ofstream& fs) const;
00406 };
00407 
00408 #pragma pack(pop)
00409 
00410 #endif

perftrace.cpp

00001 /*========================== begin_copyright_notice ============================
00002 Copyright (C) 2018-2025 Intel Corporation
00003 
00004 SPDX-License-Identifier: MIT
00005 ============================= end_copyright_notice ===========================*/
00006 
00007 /*!
00008  * @file Implementation of the Perftrace tool
00009  */
00010 
00011 #include "gtpin_api.h"
00012 #include "gtpin_tool_utils.h"
00013 #include "perftrace.h"
00014 
00015 using std::list;
00016 using std::vector;
00017 using std::string;
00018 using std::map;
00019 using std::ofstream;
00020 
00021 // Profiling Mode enum
00022 enum PERFTRACE_MODE
00023 {
00024     PERFTRACE_FUNTIME,
00025     PERFTRACE_MEMLATENCY,
00026     PERFTRACE_OCCUPANCY,
00027 };
00028 
00029 // globals
00030 Knob<int>  gKnobMode("mode", 0, "Trace instrumentation scope\n { 0 - funtime, 1 - memory read instructions, 2 - occupancy} ");
00031 Knob<int>  gKnobPhase("phase", 0, "tracing tool - processing phase\n { 1 - pre-processing, 2 - processing - trace gathering} ");
00032 Knob<int>  gKnobMaxTraceBufferInMB("max_buffer_mb", 300, "perftrace - the max allowed size of the trace buffer per kernel in MB\n");
00033 
00034 
00035 /* ============================================================================================= */
00036 // PerfTraceDispatch implementation
00037 /* ============================================================================================= */
00038 bool PerfTraceDispatch::ReadTrace(const GtProfileTrace& traceAccessor, const IGtProfileBuffer& profileBuffer)
00039 {
00040     uint32_t traceSize = traceAccessor.Size(profileBuffer);
00041     _rawTrace.resize(traceSize);
00042     _isTrimmed = traceAccessor.IsTruncated(profileBuffer);
00043     return traceAccessor.Read(profileBuffer, _rawTrace.data(), 0, traceSize);
00044 }
00045 
00046 bool PerfTraceDispatch::IsEmpty() const
00047 {
00048     return _rawTrace.size() == 0;
00049 }
00050 
00051 /* ============================================================================================= */
00052 // PerfTraceKernel implementation
00053 /* ============================================================================================= */
00054 PerfTraceKernel::PerfTraceKernel(const IGtKernelInstrument& kernelInstrument, const uint32_t recordSize, uint32_t numTiles) : _numTiles(numTiles)
00055 {
00056     const IGtKernel& kernel = kernelInstrument.Kernel();
00057     const IGtCfg&    cfg    = kernelInstrument.Cfg();
00058 
00059     _name       = GlueString(kernel.Name());
00060     _extName    = ExtendedKernelName(kernel);
00061     _platform   = kernel.GpuPlatform();
00062     _genId      = kernel.GenModel().Id();
00063     _asmText    = CfgAsmText(cfg);
00064     _uniqueName = kernel.UniqueName();
00065 
00066     // Initialize trace accessor. The trace capacity is expected to be computed during the preprocessing phase.
00067     uint64_t traceCapacity =  PerfTracePreProcessor::Instance()->TraceSize(_extName);
00068     if (traceCapacity == 0)
00069     {
00070         // Unknown trace capacity
00071         GTPIN_WARNING("PERFTRACE: unknown trace capacity for kernel " + _name + ". Assuming the kernel is filtered out. "
00072             "Allocating a buffer of 8KB size. If the kernel is supposed to run, expect buffer overflow. "
00073             "In this case, please re-run phase 1 and make sure the kernel is not filtered out.");
00074         traceCapacity = 0x2000;
00075     }
00076     else
00077     {
00078         traceCapacity += 0x2000; // Add some space to account for possible fluctuation of trace sizes between phases
00079         if (traceCapacity > UINT32_MAX)
00080         {
00081             GTPIN_WARNING("PERFTRACE: The kernel " + _name + " exceedeed maximum trace capacity.");
00082             traceCapacity = UINT32_MAX;
00083         }
00084     }
00085     if (traceCapacity > (uint64_t(gKnobMaxTraceBufferInMB) * 0x100000))
00086     {
00087         GTPIN_WARNING("PERFTRACE: required capacity (" + DecStr(traceCapacity) + ") for kernel " + _name + " is too big - cut to " + DecStr(gKnobMaxTraceBufferInMB) + "MB. "
00088                       "Expect the final trace to contain partial data.");
00089         traceCapacity = uint64_t(gKnobMaxTraceBufferInMB) * 0x100000;
00090     }
00091     _traceAccessor = GtProfileTrace((uint32_t)traceCapacity, recordSize);
00092     _traceAccessor.Allocate(kernelInstrument.ProfileBufferAllocator());
00093 }
00094 
00095 PerfTraceDispatch& PerfTraceKernel::AddPerfTrace(IGtKernelDispatch& kernelDispatch)
00096 {
00097     // Create a new PerfTraceDispatch object and store the entire trace within this object
00098     _traces.emplace_back(kernelDispatch);
00099     PerfTraceDispatch& perfTraceDispatch  = _traces.back();
00100     if (!perfTraceDispatch.ReadTrace(_traceAccessor, *kernelDispatch.GetProfileBuffer()))
00101     {
00102         GTPIN_ERROR_MSG("PERFTRACE: Failed to read profile buffer for kernel " + _name);
00103     }
00104     return perfTraceDispatch;
00105 }
00106 
00107 void PerfTraceKernel::DumpAsm() const
00108 {
00109     DumpKernelAsmText(_name, _uniqueName, _asmText);
00110 }
00111 
00112 /* ============================================================================================= */
00113 // PerfTrace implementation
00114 /* ============================================================================================= */
00115 void PerfTrace::OnKernelBuild(IGtKernelInstrument& instrumentor)
00116 {
00117     const IGtKernel& kernel = instrumentor.Kernel();
00118     uint32_t numTiles = (instrumentor.Coder().IsTileIdSupported()) ? GTPin_GetCore()->GenArch().MaxTiles(kernel.GpuPlatform()) : 1;
00119 
00120     // Create new KernelData object and add it to the data base
00121     auto ret = _kernels.emplace(std::piecewise_construct, std::forward_as_tuple(kernel.Id()), std::forward_as_tuple(instrumentor, RecordSize(), numTiles));
00122     if (ret.second)
00123     {
00124         PerfTraceKernel& perfTraceKernel = (*ret.first).second;
00125         if (!perfTraceKernel.IsEnabled())
00126         {
00127             GTPIN_WARNING("PERFTRACE: The trace won't be generated for kernel " + perfTraceKernel.Name());
00128             return;
00129         }
00130         Instrument(instrumentor, perfTraceKernel);
00131     }
00132 }
00133 
00134 void PerfTrace::OnKernelRun(IGtKernelDispatch& dispatcher)
00135 {
00136     bool isProfileEnabled = false;
00137 
00138     const IGtKernel& kernel = dispatcher.Kernel();
00139     GtKernelExecDesc execDesc; dispatcher.GetExecDescriptor(execDesc);
00140     if (kernel.IsInstrumented() && IsKernelExecProfileEnabled(execDesc, kernel.GpuPlatform(), kernel.Name().Get()))
00141     {
00142         auto it = _kernels.find(kernel.Id());
00143         if (it != _kernels.end())
00144         {
00145             const PerfTraceKernel&  perfTraceKernel = it->second;
00146             if (perfTraceKernel.IsEnabled())
00147             {
00148                 IGtProfileBuffer*     buffer        = dispatcher.CreateProfileBuffer(); GTPIN_ASSERT(buffer);
00149                 const GtProfileTrace& traceAccessor = perfTraceKernel.TraceAccessor();
00150                 if (traceAccessor.Initialize(*buffer))
00151                 {
00152                     isProfileEnabled = true;
00153                 }
00154                 else
00155                 {
00156                     GTPIN_ERROR_MSG("PERFTRACE: Failed to write into memory buffer for kernel " + string(kernel.Name()));
00157                 }
00158             }
00159         }
00160     }
00161     dispatcher.SetProfilingMode(isProfileEnabled);
00162 }
00163 
00164 void PerfTrace::OnKernelComplete(IGtKernelDispatch& dispatcher)
00165 {
00166     if (!dispatcher.IsProfilingEnabled())
00167     {
00168         return; // Do nothing if kernel profiling has not been applied/failed
00169     }
00170 
00171     const IGtKernel& kernel = dispatcher.Kernel();
00172     auto it = _kernels.find(kernel.Id());
00173     if (it != _kernels.end())
00174     {
00175         // Read the trace from the profile buffer
00176         PerfTraceKernel&  perfTraceKernel = it->second;
00177         perfTraceKernel.AddPerfTrace(dispatcher);
00178     }
00179 }
00180 
00181 void PerfTraceFuntime::GeneratePreCode(GtGenProcedure& proc, const IGtGenCoder& coder)
00182 {
00183     IGtInsFactory&  insF = coder.InstructionFactory();
00184 
00185     proc += insF.MakeMov(GtDstRegion(_timeReg, 1, GED_DATA_TYPE_ud),
00186                          GtRegRegion(TimeStampReg(), GtStride(2, 2, 1), GED_DATA_TYPE_ud), { 2 });
00187 
00188     if (!proc.empty()) { proc.front()->AppendAnnotation(__func__); }
00189 }
00190 
00191 void PerfTraceMemoryLatency::GeneratePreCode(GtGenProcedure& proc, const IGtGenCoder& coder)
00192 {
00193     coder.StartTimer(proc, _timeReg);
00194     if (!proc.empty()) { proc.front()->AppendAnnotation(__func__); }
00195 }
00196 
00197 void PerfTraceFuntime::GeneratePostCode(GtGenProcedure& proc, const IGtGenCoder& coder, const PerfTraceKernel& perfTraceKernel)
00198 {
00199     // Initialize virtual registers
00200     IGtVregFactory&   vregs = coder.VregFactory();
00201     GtReg addrReg   = vregs.MakeMsgAddrScratch();
00202     GtReg dataReg   = vregs.MakeMsgDataScratch(VREG_TYPE_HWORD);
00203     GtReg offsetReg = vregs.MakeScratch(VREG_TYPE_DWORD);
00204     GtReg tileIdReg = vregs.MakeScratch(VREG_TYPE_DWORD);
00205 
00206     auto fieldReg = [&](uint32_t fieldOffset) -> GtReg { return GtReg(dataReg, sizeof(uint32_t), fieldOffset / sizeof(uint32_t)); };
00207 
00208     uint32_t recordSize = RecordSize();
00209 
00210     GtReg sr0FieldReg    = fieldReg(offsetof(PerfTraceFuntimeRecord, sr0));
00211     GtReg tileIdFieldReg = fieldReg(offsetof(PerfTraceFuntimeRecord, tileId));
00212     GtReg timeStartReg   = fieldReg(offsetof(PerfTraceFuntimeRecord, timeStart));
00213     GtReg timeEndReg     = fieldReg(offsetof(PerfTraceFuntimeRecord, timeEnd));
00214 
00215     IGtInsFactory&  insF = coder.InstructionFactory();
00216     GtPredicate     predicate(FlagReg(0));
00217 
00218     // dataReg[1-4] = { preTm[0-1], postTm[0-1] }
00219     proc += insF.MakeMov(timeEndReg, GtRegRegion(TimeStampReg(), GtStride(2, 2, 1), GED_DATA_TYPE_ud), { 2 });
00220     proc += insF.MakeMov(timeStartReg, GtRegRegion(_timeReg, GtStride(2, 2, 1), GED_DATA_TYPE_ud), { 2 });
00221 
00222     // Set values of PerfTraceRecordHeader fields in dataReg
00223     proc += insF.MakeMov(sr0FieldReg, StateReg(0)); // sr0.0
00224 
00225     coder.LoadTileId(proc, tileIdReg);
00226     proc += insF.MakeMov(tileIdFieldReg, tileIdReg); // tile ID
00227 
00228     // Allocate new record in the trace.
00229     // Set offsetReg = offset of the allocated record in the profile buffer, addrReg = address of the allocated record
00230     perfTraceKernel.TraceAccessor().ComputeNewRecordOffset(coder, proc, recordSize, offsetReg);
00231     coder.ComputeAddress(proc, addrReg, offsetReg);
00232 
00233     //if (!predicate) { STORE buffer[offsetReg] = dataReg;
00234     coder.StoreMemBlock(proc, addrReg, dataReg, recordSize, !predicate);
00235 
00236     if (!proc.empty()) { proc.front()->AppendAnnotation(__func__); }
00237 }
00238 
00239 void PerfTraceMemoryLatency::GeneratePostCode(GtGenProcedure& proc, const IGtGenCoder& coder)
00240 {
00241     IGtInsFactory&  insF  = coder.InstructionFactory();
00242 
00243     coder.StopTimer(proc, _timeReg);
00244     proc += insF.MakeAdd(_cyclesAccumReg, _cyclesAccumReg, _timeReg);
00245     
00246     if (!proc.empty()) { proc.front()->AppendAnnotation(__func__); }
00247 }
00248 
00249 void PerfTraceMemoryLatency::GenerateFiniCode(GtGenProcedure& proc, const IGtGenCoder& coder, const PerfTraceKernel& perfTraceKernel)
00250 {
00251     // Initialize virtual registers
00252     IGtVregFactory&   vregs = coder.VregFactory();
00253     GtReg addrReg   = vregs.MakeMsgAddrScratch();
00254     GtReg dataReg   = vregs.MakeMsgDataScratch(VREG_TYPE_HWORD);
00255     GtReg offsetReg = vregs.MakeScratch(VREG_TYPE_DWORD);
00256     GtReg tileIdReg = vregs.MakeScratch(VREG_TYPE_DWORD);
00257 
00258     auto fieldReg = [&](uint32_t fieldOffset) -> GtReg { return GtReg(dataReg, sizeof(uint32_t), fieldOffset / sizeof(uint32_t)); };
00259     GtReg sr0FieldReg    = fieldReg(offsetof(PerfTraceMemoryLatencyRecord, sr0));
00260     GtReg tileIdFieldReg = fieldReg(offsetof(PerfTraceMemoryLatencyRecord, tileId));
00261     GtReg cyclesReg      = fieldReg(offsetof(PerfTraceMemoryLatencyRecord, cycles));
00262 
00263     uint32_t recordSize = RecordSize();
00264 
00265     IGtInsFactory&  insF = coder.InstructionFactory();
00266     GtPredicate     predicate(FlagReg(0));
00267 
00268     // Set values of PerfTraceRecordHeader fields in dataReg
00269     proc += insF.MakeMov(sr0FieldReg, StateReg(0));   // sr0.0
00270     coder.LoadTileId(proc, tileIdReg);
00271     proc += insF.MakeMov(tileIdFieldReg, tileIdReg);  // tile ID
00272     proc += insF.MakeMov(cyclesReg, _cyclesAccumReg); // accumReg
00273 
00274     // Allocate new record in the trace.
00275     // Set offsetReg = offset of the allocated record in the profile buffer, addrReg = address of the allocated record
00276     perfTraceKernel.TraceAccessor().ComputeNewRecordOffset(coder, proc, recordSize, offsetReg);
00277     coder.ComputeAddress(proc, addrReg, offsetReg);
00278 
00279     //if (!predicate) { STORE buffer[offsetReg] = dataReg;
00280     coder.StoreMemBlock(proc, addrReg, dataReg, recordSize, !predicate);
00281 
00282     if (!proc.empty()) { proc.front()->AppendAnnotation(__func__); }
00283 }
00284 
00285 void PerfTraceFuntime::Instrument(IGtKernelInstrument& instrumentor, const PerfTraceKernel& perfTraceKernel)
00286 {
00287     const IGtCfg&      cfg   = instrumentor.Cfg();
00288     const IGtGenCoder& coder = instrumentor.Coder();
00289     IGtVregFactory&    vregs = coder.VregFactory();
00290 
00291     _timeReg = vregs.Make(VREG_TYPE_QWORD);
00292 
00293     // Generate code that starts/stops timer at entry/exit of the kernel
00294     GtGenProcedure preCode;  GeneratePreCode(preCode, coder);
00295     GtGenProcedure postCode; GeneratePostCode(postCode, coder, perfTraceKernel);
00296 
00297     // Instrument kernel entries
00298     instrumentor.InstrumentEntries(preCode);
00299 
00300     // Instrument kernel exits
00301     for (auto bblPtr : cfg.ExitBbls())
00302     {
00303         const IGtIns& ins = bblPtr->LastIns(); GTPIN_ASSERT(ins.IsEot());
00304         GtGenProcedure fakeConsumers;
00305         coder.GenerateFakeSrcConsumers(fakeConsumers, ins);
00306         instrumentor.InstrumentInstruction(ins, GtIpoint::Before(), fakeConsumers);
00307         instrumentor.InstrumentInstruction(ins, GtIpoint::Before(), postCode);
00308     }
00309 }
00310 
00311 void PerfTraceMemoryLatency::Instrument(IGtKernelInstrument& instrumentor, const PerfTraceKernel& perfTraceKernel)
00312 {
00313     const IGtCfg&     cfg = instrumentor.Cfg();
00314     IGtVregFactory& vregs = instrumentor.Coder().VregFactory();
00315 
00316     _cyclesAccumReg = vregs.Make(VREG_TYPE_DWORD);
00317 
00318     // Instrument kernel exits
00319     for (auto bblPtr : cfg.Bbls())
00320     {
00321         InstrumentBbl(instrumentor, *bblPtr, perfTraceKernel);
00322     }
00323 }
00324 
00325 void PerfTraceMemoryLatency::InstrumentBbl(IGtKernelInstrument& instrumentor, const IGtBbl& bbl, const PerfTraceKernel& perfTraceKernel)
00326 {
00327     const IGtGenCoder& coder = instrumentor.Coder();
00328 
00329     for (auto& insPtr : bbl.Instructions())
00330     {
00331         const IGtIns& ins = *insPtr;
00332 
00333         if (ins.IsEot())
00334         {
00335             GtGenProcedure finiCode; GenerateFiniCode(finiCode, coder, perfTraceKernel);
00336             instrumentor.InstrumentInstruction(ins, GtIpoint::Before(), finiCode);
00337         }
00338         if (ins.IsMemRead())
00339         {
00340             GtGenProcedure preCode;  GeneratePreCode(preCode, coder);
00341             GtGenProcedure postCode; GeneratePostCode(postCode, coder);
00342             GtGenProcedure fakeConsumers;
00343             coder.GenerateFakeDstConsumers(fakeConsumers, ins);
00344 
00345             instrumentor.InstrumentInstruction(ins, GtIpoint::Before(), preCode);
00346             instrumentor.InstrumentInstruction(ins, GtIpoint::After(), fakeConsumers);
00347             instrumentor.InstrumentInstruction(ins, GtIpoint::After(), postCode);
00348         }
00349     }
00350 }
00351 
00352 void PerfTraceFuntime::OnFini(void)
00353 {
00354     PerfTraceFuntime& me = *Instance();
00355     IGtCore* gtpinCore = GTPin_GetCore();
00356     for (auto& ref : me._kernels)
00357     {
00358         const PerfTraceKernel&  perfTraceKernel = ref.second;
00359         PerfTracePostProcessorFuntime(*gtpinCore, perfTraceKernel)();
00360     }
00361 }
00362 
00363 void PerfTraceMemoryLatency::OnFini(void)
00364 {
00365     PerfTraceMemoryLatency& me = *Instance();
00366     IGtCore* gtpinCore = GTPin_GetCore();
00367     for (auto& ref : me._kernels)
00368     {
00369         const PerfTraceKernel&  perfTraceKernel = ref.second;
00370         PerfTracePostProcessorMemoryLatency(*gtpinCore, perfTraceKernel)();
00371     }
00372 }
00373 
00374 PerfTraceFuntime* PerfTraceFuntime::Instance()
00375 {
00376     static PerfTraceFuntime perfTrace;
00377     return &perfTrace;
00378 }
00379 
00380 PerfTraceMemoryLatency* PerfTraceMemoryLatency::Instance()
00381 {
00382     static PerfTraceMemoryLatency perfTrace;
00383     return &perfTrace;
00384 }
00385 
00386 /* ============================================================================================= */
00387 // PerfTracePreProcessor implementation
00388 /* ============================================================================================= */
00389 const char* PerfTracePreProcessor::_kernelPreProcessFileName   = "perftrace_pre_process.txt";
00390 const char* PerfTracePreProcessor::_dispatchPreProcessFileName = "perftrace_pre_process_dispatch.txt";
00391 
00392 PerfTracePreProcessor::PerfTracePreProcessor()
00393 {
00394     if (gKnobPhase == 2)
00395     {
00396         // Read the data collected during the preprocessing phase
00397         std::ifstream is(_kernelPreProcessFileName);
00398         GTPIN_ASSERT_MSG(is, string("File ") + _kernelPreProcessFileName + " does not exist. The trace won't be generated");
00399         is >> _kernelCounters;
00400     }
00401     else if (gKnobPhase == 1)
00402     {
00403         // Create pre_process files or remove old pre_process files's content if they exist
00404         CreateCleanFile(_kernelPreProcessFileName);
00405         CreateCleanFile(_dispatchPreProcessFileName);
00406     }
00407 
00408     if (gKnobMode == PERFTRACE_FUNTIME || gKnobMode == PERFTRACE_OCCUPANCY)
00409     {
00410         _perfTrace = PerfTraceFuntime::Instance();
00411     }
00412     if (gKnobMode == PERFTRACE_MEMLATENCY)
00413     {
00414         _perfTrace = PerfTraceMemoryLatency::Instance();
00415     }
00416 }
00417 
00418 PerfTracePreProcessor* PerfTracePreProcessor::Instance()
00419 {
00420     static PerfTracePreProcessor instance;
00421     return &instance;
00422 }
00423 
00424 void PerfTracePreProcessor::OnFini()
00425 {
00426     PerfTracePreProcessor&  tool = *Instance();
00427     tool.DumpKernelProfiles(_kernelPreProcessFileName);
00428     tool.DumpDispatchProfiles(_dispatchPreProcessFileName);
00429 }
00430 
00431 uint64_t PerfTracePreProcessor::TraceSize(const string& extKernelName) const
00432 {
00433     auto it = _kernelCounters.find(extKernelName);
00434     return ((it == _kernelCounters.end()) ? 0 : it->second.weight);
00435 }
00436 
00437 uint32_t PerfTracePreProcessor::GetBblWeight(IGtKernelInstrument&, const IGtBbl& bbl) const
00438 {
00439     return bbl.IsEot() ? _perfTrace->RecordSize() : 0;
00440 }
00441 
00442 void PerfTracePreProcessor::AggregateDispatchCounters(KernelWeightCounters& kc, KernelWeightCounters dc) const
00443 {
00444     kc.weight = std::max(kc.weight, dc.weight);
00445     kc.freq += dc.freq;
00446 }
00447 
00448 /* ============================================================================================= */
00449 // PerfTracePostProcessor implementation
00450 /* ============================================================================================= */
00451 const char* PerfTracePostProcessor::_traceFileName = "perftrace_compressed.bin";
00452 
00453 PerfTracePostProcessor::PerfTracePostProcessor(const IGtCore& gtpinCore, const PerfTraceKernel& perfTraceKernel, const PerfTrace* perfTrace) :
00454     _kernel(&perfTraceKernel), _kernelDir(JoinPath(string(gtpinCore.ProfileDir()), perfTraceKernel.UniqueName())), _perfTrace(perfTrace) 
00455 {
00456     GTPIN_ASSERT(_perfTrace);
00457 }
00458 
00459 bool PerfTracePostProcessor::operator()()
00460 {
00461     if (!MakeDirectory(_kernelDir))
00462     {
00463         GTPIN_WARNING("PERFTRACE: Could not create directory " + _kernelDir);
00464         return false;
00465     }
00466 
00467     // Process traces recorded in kernel dispatches
00468     for (const auto& trace : _kernel->GetTraces())
00469     {
00470         if (!trace.IsEmpty())
00471         {
00472             if (trace.IsTrimmed())
00473             {
00474                 GTPIN_WARNING("PERFTRACE: Detected trace buffer overflow in kernel " + _kernel->Name());
00475             }
00476 
00477             string subdir   = trace.KernelExecDesc().ToString(_kernel->Platform(), ExecDescFileNameFormat());
00478             string dir      = MakeSubDirectory(_kernelDir, subdir);
00479             string filePath = JoinPath(dir, _traceFileName);
00480 
00481             ofstream fs(filePath, std::ios::binary);
00482             if (!fs)
00483             {
00484                 GTPIN_WARNING("PERFTRACE: Could not create file " + filePath);
00485                 continue;
00486             }
00487             StoreTrace(trace, fs);
00488         }
00489     }
00490     return true;
00491 }
00492 
00493 void PerfTracePostProcessor::StoreTrace(const PerfTraceDispatch& trace, std::ofstream& fs)
00494 {
00495     const uint8_t* traceData  = trace.Data();
00496     uint32_t       traceSize  = trace.Size();
00497 
00498     uint32_t       recordSize = _perfTrace->RecordSize();
00499 
00500     // Associate trace records with threads - populate _threadTraceRecords array
00501     uint32_t maxThreads = _kernel->GenModel().MaxThreads(); // Max number of HW threads
00502     _threadTraceRecords.resize(_kernel->NumTiles());
00503 
00504     for (uint32_t tile = 0; tile < _kernel->NumTiles(); tile++)
00505     {
00506         _threadTraceRecords[tile].clear();
00507         _threadTraceRecords[tile].resize(maxThreads);
00508     }
00509 
00510     std::vector<uint32_t> numProfiledThreads; // Number of profiled (active) threads
00511     numProfiledThreads.resize(_kernel->NumTiles(), 0);
00512 
00513     for (uint32_t recordOffset = 0; recordOffset + recordSize <= traceSize;)
00514     {
00515         const uint8_t* recordPtr = traceData + recordOffset;
00516         // Retrive thread ID from the record
00517         uint32_t tid = GetGlobalTid(recordPtr);
00518         uint32_t tileId = GetTileId(recordPtr); GTPIN_ASSERT(tileId < _kernel->NumTiles());
00519         if (recordOffset + recordSize > traceSize)
00520         {
00521             break; // end of trace
00522         }
00523 
00524         auto& tileThreadRecords = _threadTraceRecords[tileId];
00525         auto& threadTraceRecords = tileThreadRecords[tid];
00526 
00527         // Add a new trace record reference to _threadTraceRecords
00528         if (threadTraceRecords.empty()) { ++numProfiledThreads[tileId]; } // Increment thread count on the first relevant record
00529         threadTraceRecords.emplace_back(TraceRecord{ recordPtr, recordSize });
00530 
00531         recordOffset += recordSize;
00532     }
00533 
00534     uint32_t mode = gKnobMode;
00535     Store(mode, fs);               // Store profiling mode
00536 
00537     // Compute and store the number of involved tiles
00538     uint32_t numOfTiles = 0;
00539     for (uint32_t i = 0; i < numProfiledThreads.size(); i++)
00540     {
00541         numOfTiles += (numProfiledThreads[i] == 0) ? 0 : 1;
00542     }
00543     Store(numOfTiles, fs);
00544 
00545     for (uint32_t tileId = 0; tileId < _threadTraceRecords.size(); tileId++)
00546     {
00547         if (numProfiledThreads[tileId] == 0) { continue; }
00548 
00549         Store(tileId, fs);
00550 
00551         // Store the number of profiled threads
00552         Store(numProfiledThreads[tileId], fs);
00553 
00554         // Store per-thread traces
00555         for (uint32_t tid = 0; tid < maxThreads; tid++)
00556         {
00557             const auto& tileThreadRecords = _threadTraceRecords[tileId];
00558             const auto& traceRecordList = tileThreadRecords[tid];
00559 
00560             if (traceRecordList.empty()) { continue; }
00561 
00562             StoreGlobalTid(tid, fs);    // Store Global Thread Identifier
00563 
00564             uint32_t numRecords = (uint32_t)traceRecordList.size();
00565             Store(numRecords, fs);      // Store #records collected in the thread
00566 
00567             // Store trace records
00568             for (const auto& r : traceRecordList)
00569             {
00570                 StoreTraceRecordData((const uint8_t*)r.record, fs);
00571             }
00572         }
00573     }
00574 }
00575 
00576 void PerfTracePostProcessor::StoreGlobalTid(uint32_t gtid, std::ofstream& fs)
00577 {
00578     const GtStateRegAccessor& sra = _kernel->GenModel().StateRegAccessor();
00579     uint32_t sr0 = sra.SetGlobalTid(0, gtid);
00580 
00581     auto storeSr0Field = [&](const ScatteredBitFieldU32& sbf)
00582     {
00583         uint32_t val = (sbf.IsEmpty() ? UINT32_MAX : sbf.GetValue(sr0));
00584         Store(val, fs);
00585     };
00586 
00587     storeSr0Field(sra.SliceIdField());
00588     storeSr0Field(sra.DualSubSliceIdField());
00589     storeSr0Field(sra.SubSliceIdField());
00590     storeSr0Field(sra.EuIdField());
00591     storeSr0Field(sra.ThreadSlotField());
00592 }
00593 
00594 uint32_t PerfTracePostProcessorFuntime::GetGlobalTid(const uint8_t* traceRecord) const
00595 {
00596     return _kernel->GenModel().StateRegAccessor().GetGlobalTid((((const PerfTraceFuntimeRecord*)traceRecord)->sr0) & 0xFFFF);
00597 }
00598 
00599 uint32_t PerfTracePostProcessorFuntime::GetTileId(const uint8_t* traceRecord) const
00600 {
00601     return ((const PerfTraceFuntimeRecord*)traceRecord)->tileId;
00602 }
00603 
00604 void PerfTracePostProcessorFuntime::StoreTraceRecordData(const uint8_t* traceRecord, std::ofstream& fs) const
00605 {
00606     const PerfTraceFuntimeRecord* record = (const PerfTraceFuntimeRecord*)traceRecord;
00607     fs.write((const char*)&record->timeStart, sizeof(PerfTraceFuntimeRecord::timeStart));
00608     fs.write((const char*)&record->timeEnd, sizeof(PerfTraceFuntimeRecord::timeEnd));
00609 }
00610 
00611 uint32_t PerfTracePostProcessorMemoryLatency::GetGlobalTid(const uint8_t* traceRecord) const
00612 {
00613     return _kernel->GenModel().StateRegAccessor().GetGlobalTid((((const PerfTraceMemoryLatencyRecord*)traceRecord)->sr0) & 0xFFFF);
00614 }
00615 
00616 uint32_t PerfTracePostProcessorMemoryLatency::GetTileId(const uint8_t* traceRecord) const
00617 {
00618     return ((const PerfTraceMemoryLatencyRecord*)traceRecord)->tileId;
00619 }
00620 
00621 void PerfTracePostProcessorMemoryLatency::StoreTraceRecordData(const uint8_t* traceRecord, std::ofstream& fs) const
00622 {
00623     const PerfTraceMemoryLatencyRecord* record = (const PerfTraceMemoryLatencyRecord*)traceRecord;
00624     fs.write((const char*)&record->cycles, sizeof(PerfTraceMemoryLatencyRecord::cycles));
00625 }
00626 
00627 /* ============================================================================================= */
00628 // GTPin_Entry
00629 /* ============================================================================================= */
00630 EXPORT_C_FUNC void GTPin_Entry(int argc, const char *argv[])
00631 {
00632     ConfigureGTPin(argc, argv);
00633     GTPIN_ASSERT_MSG((gKnobMode == 0 || gKnobMode == 1 || gKnobMode == 2), "PERFTRACE: Invalid mode value. Should be 0, 1 or 2, provided " + std::to_string(gKnobMode));
00634 
00635     if (gKnobPhase == 1)
00636     {
00637         PerfTracePreProcessor::Instance()->Register();
00638         atexit(PerfTracePreProcessor::OnFini);
00639     }
00640     else
00641     {
00642         GTPIN_ASSERT_MSG((gKnobPhase == 2), "PERFTRACE: Invalid phase value. Should be 1 or 2, provided " + std::to_string(gKnobPhase));
00643 
00644         if (gKnobMode == PERFTRACE_FUNTIME || gKnobMode == PERFTRACE_OCCUPANCY)
00645         {
00646             PerfTraceFuntime::Instance()->Register();
00647             atexit(PerfTraceFuntime::OnFini);
00648         }
00649         else if (gKnobMode == PERFTRACE_MEMLATENCY)
00650         {
00651             PerfTraceMemoryLatency::Instance()->Register();
00652             atexit(PerfTraceMemoryLatency::OnFini);
00653         }
00654         else
00655         {
00656             GTPIN_ASSERT(0);
00657         }
00658     }
00659 }

(Back to the list of all GTPin Sample Tools)


 All Data Structures Functions Variables Typedefs Enumerations Enumerator


  Copyright (C) 2013-2025 Intel Corporation
SPDX-License-Identifier: MIT