|
GTPin
|
The Perftrace tool generates a dynamic trace of execution cycles for each kernel SW thread invocation on any given HW thread
The trace is provided for each kernel, each Draw/Enqueue granularity, and each separate HW thread.
The Perftrace tool (as well as all GTPin tracing tools) works in two phases, which should be run separately:
To run the pre-processing phase of the perftrace tool in its default configuration, use the following command:
Profilers/Bin/gtpin -t perftrace --phase 1 -- app
To run the trace gathering phase of the perftrace tool in its default configuration, use the following command:
Profilers/Bin/gtpin -t perftrace --phase 2 -- app
The Perftrace tool supports these execution modes:
To run Perftrace in a specific mode, you can specify the mode in the command line, using the "--mode n" argument, where n = 0, 1, or 2. If you do not specify the argument, GTPin will use default mode 0 (Funtime).
When you run the in-house GTPin Perftrace tool - in its default configuration - for the pre-processing phase (phase 1), the tool generates the directory GTPIN_PROFILE_PERFTRACE0. In addition, the following two files are created within the current directory:
perftrace_pre_process.txt: Refers to a buffer for kernels dispatched to be executed on the device. The perftrace_pre_process.txt file contains the maximum number of trace records generated by each kernel, out of all instances of Draw/Enqueue commands that this kernel executed (on the device). For example, if the kernel generates between 5 and 15 records when executed, the allocated buffer for that kernel should be large enough to hold 15 records.This file is an input to the trace gathering phase. It has the following format:
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 262144
where, for each kernel, the maximum number of required trace records is provided.
perftrace_pre_process_sw_threads.txt: Refers to a information about the SW threads created by each kernel when the kernel is executed on the device. Each line in the .txt file contains the name of the kernel executed on the device; the number of SW threads generated by that kernel’s execution of a Draw or Enqueue command; the ID of the Draw or Enqueue command; and some other metadata.This file contains informational data only, and has the following format:
BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 262144 OpenCL 0 0 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 131072 OpenCL 0 1 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 262144 OpenCL 0 2 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 131072 OpenCL 0 3 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 131072 OpenCL 0 4 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 262144 OpenCL 0 5 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 131072 OpenCL 0 6 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 131072 OpenCL 0 7 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 131072 OpenCL 0 8 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 262144 OpenCL 0 9 BitonicSort___CS_asmf641279bbb4bc39f_simd32_7a6d7d0b064fb7e1 131072 OpenCL 0 10
where each line corresponds to a single kernel for a single Draw/Enqueue command. The fields have the following meaning (from left to right):
When Perftrace is run for trace gathering (phase 2), the tool generates the directory: GTPIN_PROFILE_PERFTRACE1. GTPin saves the profiling results in the folder: GTPIN_PROFILE_PERFTRACE1\Session_Final. he traces for each kernel are saved in a separate sub-folder that has the same name as the kernel. Each Draw/Enqueue command has a separate trace, which is saved in a corresponding sub-directory, as shown in the following screenshot:
Each trace is saved in a compressed binary format, in a file called perftrace_compressed.bin, as shown above. To uncompress the trace, you must run a Profilers\Scripts\uncompress_perftrace.py Python Software Foundation Python* script, in the following manner (Python 3.5 or above is required):
python3 Profilers\Scripts\uncompress_perftrace.py --input_dir GTPIN_PROFILE_PERFTRACE1\Session_Final\BitonicSort\device_0__enqueue_0 --occupancy --gen 9
Use the --funtime, --memlatency, or --occupancy flags to specify the mode for the generated trace.
Running the script opens the compressed trace into separate traces for each HW thread, as shown in the following screenshot:
where the trace generated on each HW thread is saved in a text file named: occupancy___s_0_ss_0_eu_1_tid_5.out. The file name indicates the HW thread topology ID (where S means Slice, DSS means DualSubSlice, SS means SubSlice, EU means Execution Unit, and TID refers to the HW thread ID). The resulting trace (such as the occupancy trace below) is provided in the following format:
Invocation Start End
================================
0 0x00000000007692 0x0000000000899a duration = 4872
1 0x00000000008ff0 0x00000000009962 duration = 2418
2 0x00000000009d84 0x0000000000a614 duration = 2192
3 0x0000000000aae4 0x0000000000b40c duration = 2344
4 0x0000000000b818 0x0000000000c0e2 duration = 2250
5 0x0000000000c6a8 0x0000000000d096 duration = 2542
6 0x0000000000d4fe 0x0000000000df18 duration = 2586
7 0x0000000000e420 0x0000000000ed78 duration = 2392
8 0x00000000000f8a 0x0000000000223a duration = 4784
9 0x00000000002d32 0x00000000003e7a duration = 4424
10 0x000000000047de 0x000000000058ec duration = 4366
11 0x00000000006040 0x0000000000729a duration = 4698
12 0x00000000007c56 0x000000000089e2 duration = 3468
13 0x000000000099d0 0x0000000000abba duration = 4586
14 0x0000000000b52e 0x0000000000ca84 duration = 5462
15 0x0000000000d478 0x0000000000e62e duration = 4534
16 0x0000000000ef04 0x00000000010430 duration = 5420
17 0x00000000010970 0x00000000011e2a duration = 5306
18 0x00000000012408 0x000000000137fc duration = 5108
19 0x00000000014008 0x00000000015748 duration = 5952
20 0x000000000160d2 0x00000000017974 duration = 6306
21 0x00000000017f66 0x0000000001962a duration = 5828
22 0x00000000019936 0x0000000001b30e duration = 6616
23 0x0000000001b886 0x0000000001d4c0 duration = 7226
where each line corresponds to a single invocation of the kernel (a single kernel SW thread), on a specific HW thread, during the execution of a Draw/Enqueue command. The left column indicates the invocation number. Then the start and end timestamp counter values are provided; and finally, the delta of these two numbers (in other words, the duration of the SW thread).
This information allows you create, for example, an occupancy graph of the workload:
(Back to the list of all GTPin Sample Tools)
00001 /*========================== begin_copyright_notice ============================ 00002 Copyright (C) 2021-2023 Intel Corporation 00003 00004 SPDX-License-Identifier: MIT 00005 ============================= end_copyright_notice ===========================*/ 00006 00007 /*! 00008 * @file Pertrace tool definitions 00009 */ 00010 00011 #ifndef PERFTRACE_H_ 00012 #define PERFTRACE_H_ 00013 00014 #include <list> 00015 #include <map> 00016 #include <vector> 00017 00018 #include "gtpin_api.h" 00019 #include "gtpin_tool_utils.h" 00020 #include "kernel_weight.h" 00021 00022 using namespace gtpin; 00023 00024 #pragma pack(push, 1) 00025 00026 /* ============================================================================================= */ 00027 // Struct PerfTraceRecord 00028 /* ============================================================================================= */ 00029 /*! 00030 * Structure of the trace records. 00031 */ 00032 struct alignas(16) PerfTraceFuntimeRecord 00033 { 00034 uint32_t sr0; ///< State register sr0.0:ud 00035 uint32_t tileId; ///< Tile ID 00036 uint64_t timeStart; ///< Time stamp counter before the first instruction of the kernel 00037 uint64_t timeEnd; ///< Time stamp counter before the last (EOT) instruction of the kernel 00038 }; 00039 00040 struct alignas(16) PerfTraceMemoryLatencyRecord 00041 { 00042 uint32_t sr0; ///< State register sr0.0:ud 00043 uint32_t tileId; ///< Tile ID 00044 uint32_t cycles; ///< Accumulated cycles 00045 }; 00046 00047 /* ============================================================================================= */ 00048 // Class PerfTraceDispatch 00049 /* ============================================================================================= */ 00050 /*! 00051 * Class that holds memory trace collected during a single kernel dispatch 00052 */ 00053 class PerfTraceDispatch 00054 { 00055 public: 00056 /// Construct a PerfTraceDispatch object with the empty trace 00057 explicit PerfTraceDispatch(const IGtKernelDispatch& dispatch) : _isTrimmed(false) { dispatch.GetExecDescriptor(_kernelExecDesc); } 00058 00059 /// Read the entire trace from the specified profile buffer into this object 00060 bool ReadTrace(const GtProfileTrace& traceAccessor, const IGtProfileBuffer& profileBuffer); 00061 00062 const GtKernelExecDesc& KernelExecDesc() const { return _kernelExecDesc; } ///< @return Descriptor of this kernel dispatch 00063 uint32_t Size() const { return (uint32_t)_rawTrace.size(); } ///< @return Trace size in bytes 00064 const uint8_t* Data() const { return _rawTrace.data(); } ///< @return Trace data collected in this dispatch 00065 uint8_t* Data() { return _rawTrace.data(); } ///< @return Trace data collected in this dispatch 00066 bool IsEmpty() const; ///< @return true if the trace is empty 00067 bool IsTrimmed() const { return _isTrimmed; } ///< @return true if the trace has been trimmed 00068 00069 private: 00070 GtKernelExecDesc _kernelExecDesc; ///< Kernel execution descriptor 00071 std::vector<uint8_t> _rawTrace; ///< Trace data collected in this kernel dispatch 00072 bool _isTrimmed; ///< true if the trace has been trimmed to avoid buffer overflow 00073 }; 00074 00075 /* ============================================================================================= */ 00076 // Class PerfTraceKernel 00077 /* ============================================================================================= */ 00078 class PerfTraceKernel 00079 { 00080 public: 00081 PerfTraceKernel() = default; 00082 00083 explicit PerfTraceKernel(const IGtKernelInstrument& kernelInstrument, const uint32_t recordSize, uint32_t numTiles); 00084 00085 /*! 00086 * Read a trace recorded by the specified kernel dispatch. Create and add the corresponding PerfTraceDispatch 00087 * instance to this object 00088 */ 00089 PerfTraceDispatch& AddPerfTrace(IGtKernelDispatch& kernelDispatch); 00090 00091 std::string Name() const { return _name; } ///< @return Kernel's name 00092 std::string ExtendedName() const { return _extName; } ///< @return Kernel's extended name 00093 std::string UniqueName() const { return _uniqueName; } ///< @return Kernel's unique name 00094 const GtGpuPlatform Platform() const { return _platform; } ///< @return Kernel's platform 00095 const IGtGenModel& GenModel() const { return GetGenModel(_genId); } ///< @return Kernel's GEN model 00096 const GtProfileTrace& TraceAccessor() const { return _traceAccessor; } ///< @return Trace accessor 00097 void DumpAsm() const; ///< Dump kernel's assembly text to file 00098 00099 /// @return true, if tracing of this kernel is enabled 00100 uint32_t IsEnabled() const { return (_traceAccessor.MaxTraceSize() != 0); } 00101 00102 /// @return Traces collected in kernel's dispatches 00103 typedef std::list<PerfTraceDispatch> Traces; 00104 const Traces& GetTraces() const { return _traces; } 00105 00106 /// @return Number of tiles 00107 uint32_t NumTiles() const { return _numTiles; } 00108 00109 private: 00110 std::string _name; ///< Kernel's name 00111 std::string _uniqueName; ///< Kernel's unique name 00112 std::string _extName; ///< Kernel's extended name 00113 GtGpuPlatform _platform; ///< Kernel's platform 00114 GtGenModelId _genId; ///< Identifier of the GEN model, the kernel is compiled for 00115 std::string _asmText; ///< Kernel's assembly text 00116 GtProfileTrace _traceAccessor; ///< Trace accessor 00117 Traces _traces; ///< Traces collected in kernel's dispatches 00118 uint32_t _numTiles; ///< The number of supported tiles 00119 }; 00120 00121 /* ============================================================================================= */ 00122 // Class PerfTrace 00123 /* ============================================================================================= */ 00124 /*! 00125 * Implementation of the IGtTool interface for the Perftrace tool 00126 */ 00127 class PerfTrace : public GtTool 00128 { 00129 public: 00130 /// Implementation of the IGtTool interface 00131 const char* Name() const { return "perftrace"; } 00132 00133 void OnKernelBuild(IGtKernelInstrument& instrumentor); 00134 void OnKernelRun(IGtKernelDispatch& dispatcher); 00135 void OnKernelComplete(IGtKernelDispatch& dispatcher); 00136 00137 public: 00138 virtual uint32_t RecordSize() const = 0; ///< @return Record size aligned to OWORD 00139 00140 protected: 00141 /*! 00142 * Generate instrumentation for kernel 00143 * @param[in] instrumentor Interface of the kernel being instrumented 00144 * @param[in] perfTraceKernel Object that holds information about the kernel 00145 */ 00146 virtual void Instrument(IGtKernelInstrument& instrumentor, const PerfTraceKernel& perfTraceKernel) = 0; 00147 00148 protected: 00149 PerfTrace() = default; 00150 PerfTrace(const PerfTrace&) = delete; 00151 PerfTrace& operator = (const PerfTrace&) = delete; 00152 ~PerfTrace() = default; 00153 00154 protected: 00155 std::map<GtKernelId, PerfTraceKernel> _kernels; ///< Collection of traces per kernel 00156 }; 00157 00158 /* ============================================================================================= */ 00159 // Class PerfTraceFuntime 00160 /* ============================================================================================= */ 00161 class PerfTraceFuntime : public PerfTrace 00162 { 00163 public: 00164 static PerfTraceFuntime* Instance(); 00165 static void OnFini(void); 00166 00167 inline uint32_t RecordSize() const { return sizeof(PerfTraceFuntimeRecord); } 00168 00169 private: 00170 00171 /*! 00172 * Generate instrumentation for kernel 00173 * @param[in] instrumentor Interface of the kernel being instrumented 00174 * @param[in] perfTraceKernel Object that holds information about the kernel 00175 */ 00176 void Instrument(IGtKernelInstrument& instrumentor, const PerfTraceKernel& perfTraceKernel); 00177 00178 /*! 00179 * Generate instrumentation for the specified basic block 00180 * @param[in] instrumentor Interface of the kernel being instrumented 00181 * @param[in] bbl Basic block to be instrumented 00182 * @param[in] perfTraceKernel Object that holds information about the kernel 00183 */ 00184 void InstrumentBbl(IGtKernelInstrument& instrumentor, const IGtBbl& bbl, const PerfTraceKernel& perfTraceKernel); 00185 00186 void GeneratePreCode(GtGenProcedure& proc, const IGtGenCoder& coder); 00187 void GeneratePostCode(GtGenProcedure& proc, const IGtGenCoder& coder, const PerfTraceKernel& perfTraceKernel); 00188 00189 private: 00190 PerfTraceFuntime() : PerfTrace() {} 00191 PerfTraceFuntime(const PerfTraceFuntime&) = delete; 00192 PerfTraceFuntime& operator = (const PerfTraceFuntime&) = delete; 00193 ~PerfTraceFuntime() {} 00194 00195 private: 00196 00197 GtReg _timeReg; ///< Virtual timer register 00198 }; 00199 00200 /* ============================================================================================= */ 00201 // Class PerfTraceMemoryLatency 00202 /* ============================================================================================= */ 00203 class PerfTraceMemoryLatency : public PerfTrace 00204 { 00205 public: 00206 static PerfTraceMemoryLatency* Instance(); 00207 00208 inline uint32_t RecordSize() const { return sizeof(PerfTraceMemoryLatencyRecord); } 00209 00210 static void OnFini(void); 00211 00212 private: 00213 00214 /*! 00215 * Generate instrumentation for kernel 00216 * @param[in] instrumentor Interface of the kernel being instrumented 00217 * @param[in] perfTraceKernel Object that holds information about the kernel 00218 */ 00219 void Instrument(IGtKernelInstrument& instrumentor, const PerfTraceKernel& perfTraceKernel); 00220 00221 /*! 00222 * Generate instrumentation for the specified basic block 00223 * @param[in] instrumentor Interface of the kernel being instrumented 00224 * @param[in] bbl Basic block to be instrumented 00225 * @param[in] perfTraceKernel Object that holds information about the kernel 00226 */ 00227 void InstrumentBbl(IGtKernelInstrument& instrumentor, const IGtBbl& bbl, const PerfTraceKernel& perfTraceKernel); 00228 00229 void GeneratePreCode(GtGenProcedure& proc, const IGtGenCoder& coder); 00230 void GeneratePostCode(GtGenProcedure& proc, const IGtGenCoder& coder); 00231 void GenerateFiniCode(GtGenProcedure& proc, const IGtGenCoder& coder, const PerfTraceKernel& perfTraceKernel); 00232 00233 private: 00234 PerfTraceMemoryLatency() : PerfTrace() {} 00235 PerfTraceMemoryLatency(const PerfTraceMemoryLatency&) = delete; 00236 PerfTraceMemoryLatency& operator = (const PerfTraceMemoryLatency&) = delete; 00237 ~PerfTraceMemoryLatency() {} 00238 00239 private: 00240 00241 GtReg _timeReg; ///< Virtual register to read time register 00242 GtReg _cyclesAccumReg; ///< Virtual register to accumulate cycles values 00243 }; 00244 00245 00246 /* ============================================================================================= */ 00247 // Class PerfTracePreProcessor 00248 /* ============================================================================================= */ 00249 /*! 00250 * Class that computes per-kernel trace sizes in the preprocessing phase, and provides access to 00251 * this data in the trace gathering phase 00252 */ 00253 class PerfTracePreProcessor : public KernelWeight 00254 { 00255 public: 00256 uint64_t TraceSize(const std::string& extKernelName) const; ///< Given extended kernel name, return the trace size in bytes 00257 static void OnFini(); ///< Callback function registered with atexit() 00258 00259 static PerfTracePreProcessor* Instance(); 00260 00261 /*! 00262 * @return Weight of the specified basic block in the kernel. 00263 * By default this function returns the bbl.NumIns() value, which means that the tool counts the number of executed 00264 * instructions. 00265 * Derived classes may give a different interpretation of the basic block's "weight" by overriding this function. 00266 * For example, in order to count the number of executed basic blocks. the override function may return 1 (one). 00267 */ 00268 uint32_t GetBblWeight(IGtKernelInstrument& ki, const IGtBbl& bbl) const; 00269 00270 private: 00271 PerfTracePreProcessor(); 00272 PerfTracePreProcessor(const PerfTracePreProcessor&) = delete; 00273 PerfTracePreProcessor& operator = (const PerfTracePreProcessor&) = delete; 00274 00275 private: 00276 void AggregateDispatchCounters(KernelWeightCounters& kc, KernelWeightCounters dc) const; 00277 00278 protected: 00279 KernelWeightProfileData _kernelCounters; ///< Per-kernel counters of required trace records; collected in preprocessing phase 00280 00281 static const char* _kernelPreProcessFileName; ///< Name of the file that contains preprocessing data per kernel 00282 static const char* _dispatchPreProcessFileName; ///< Name of the file that contains preprocessing data per kernel dispatch 00283 00284 PerfTrace* _perfTrace; ///< Corresponding PerfTrace object 00285 }; 00286 00287 /* ============================================================================================= */ 00288 // Class PerfTracePostProcessor 00289 /* ============================================================================================= */ 00290 /*! 00291 * Function object that processes kernel traces - stores them in files within the profile directory: 00292 * 00293 * kernel_name 00294 * | 00295 * |- kernel_dispatch_1 00296 * |- memorytrace_compressed.bin 00297 * |- kernel_dispatch_2 00298 * |- memorytrace_compressed.bin 00299 * The .bin trace files can be uncompressed by the uncompress_memtrace.exe utility. 00300 * 00301 * Format of .bin trace files: 00302 * - Static information: 00303 * - Number of BBLs that access memory 00304 * - For each BBL that accesses memory: 00305 * - BBL ID 00306 * - Number of SEND instructions in this basic block 00307 * - For each SEND instruction: 00308 * - Decoded information including offset, address model, address type, address payload size, etc 00309 * 00310 * - Dynamic trace data: 00311 * - Number of HW threads in which the trace was collected 00312 * - For each HW thread: 00313 * - HW Thread ID (in the format of sr0.0) 00314 * - Number of records collected for this HW thread 00315 * - All the records collected for this HW thread 00316 */ 00317 class PerfTracePostProcessor 00318 { 00319 public: 00320 /// Construct a PerfTracePostProcessor object for the specified collection of kernel traces 00321 PerfTracePostProcessor(const IGtCore& gtpinCore, const PerfTraceKernel& perfTraceKernel, const PerfTrace* perfTrace); 00322 00323 /// Process all kernel traces associated with this object - store them in files within the profile directory 00324 bool operator()(); 00325 00326 protected: 00327 /// Derives global tid from the trace record 00328 virtual uint32_t GetGlobalTid(const uint8_t* traceRecord) const = 0; 00329 00330 /// Derives global tid from the trace record 00331 virtual uint32_t GetTileId(const uint8_t* traceRecord) const = 0; 00332 00333 /// Stores data from the trace record 00334 virtual void StoreTraceRecordData(const uint8_t* traceRecord, std::ofstream& fs) const = 0; 00335 00336 /// Store the specified trace in the specified file stream 00337 void StoreTrace(const PerfTraceDispatch& trace, std::ofstream& fs); 00338 00339 /// Store Global Thread Identifier 00340 void StoreGlobalTid(uint32_t gtid, std::ofstream& fs); 00341 00342 /// Store the specified value in the specified file stream in the binary format 00343 template <typename T> void Store(const T& val, std::ofstream& fs) { fs.write((const char*)&val, sizeof(val)); } 00344 00345 protected: 00346 struct TraceRecord ///< Reference to the trace record 00347 { 00348 const void* record; ///< Pointer to the record 00349 uint32_t size; ///< Size of the record in bytes, including header 00350 }; 00351 using TraceRecordList = std::list<TraceRecord>; ///< List of references to trace records 00352 using PerTileTraceRecords = std::vector<TraceRecordList>; ///< Per tile trace records 00353 00354 protected: 00355 const PerfTraceKernel* _kernel; ///< Kernel&traces to be processed 00356 std::string _kernelDir; ///< Directory to store kernel's trace files 00357 std::vector<PerTileTraceRecords> _threadTraceRecords;///< Map of tile ID to Lists of trace records, indexed by the thread ID 00358 00359 static const char* _traceFileName; ///< Name of the file to store trace in 00360 00361 const PerfTrace* _perfTrace; ///< Pointer to a corresponding instance of PerfTrace 00362 }; 00363 00364 /* ============================================================================================= */ 00365 // Class PerfTracePostProcessorFuntime 00366 /* ============================================================================================= */ 00367 class PerfTracePostProcessorFuntime : public PerfTracePostProcessor 00368 { 00369 public: 00370 /// Construct a PerfTracePostProcessorFuntime object for the specified collection of kernel traces 00371 PerfTracePostProcessorFuntime(const IGtCore& gtpinCore, const PerfTraceKernel& perfTraceKernel) : 00372 PerfTracePostProcessor(gtpinCore, perfTraceKernel, PerfTraceFuntime::Instance()) {} 00373 00374 private: 00375 00376 /// Derives global tid from the trace record 00377 uint32_t GetGlobalTid(const uint8_t* traceRecord) const; 00378 00379 /// Derives global tid from the trace record 00380 uint32_t GetTileId(const uint8_t* traceRecord) const; 00381 00382 /// Stores data from the trace record 00383 void StoreTraceRecordData(const uint8_t* traceRecord, std::ofstream& fs) const; 00384 }; 00385 00386 /* ============================================================================================= */ 00387 // Class PerfTracePostProcessorMemoryLatency 00388 /* ============================================================================================= */ 00389 class PerfTracePostProcessorMemoryLatency : public PerfTracePostProcessor 00390 { 00391 public: 00392 /// Construct a PerfTracePostProcessorMemoryLatency object for the specified collection of kernel traces 00393 PerfTracePostProcessorMemoryLatency(const IGtCore& gtpinCore, const PerfTraceKernel& perfTraceKernel) : 00394 PerfTracePostProcessor(gtpinCore, perfTraceKernel, PerfTraceMemoryLatency::Instance()) {} 00395 00396 private: 00397 00398 /// Derives global tid from the trace record 00399 uint32_t GetGlobalTid(const uint8_t* traceRecord) const; 00400 00401 /// Derives global tid from the trace record 00402 uint32_t GetTileId(const uint8_t* traceRecord) const; 00403 00404 /// Stores data from the trace record 00405 void StoreTraceRecordData(const uint8_t* traceRecord, std::ofstream& fs) const; 00406 }; 00407 00408 #pragma pack(pop) 00409 00410 #endif
00001 /*========================== begin_copyright_notice ============================ 00002 Copyright (C) 2018-2025 Intel Corporation 00003 00004 SPDX-License-Identifier: MIT 00005 ============================= end_copyright_notice ===========================*/ 00006 00007 /*! 00008 * @file Implementation of the Perftrace tool 00009 */ 00010 00011 #include "gtpin_api.h" 00012 #include "gtpin_tool_utils.h" 00013 #include "perftrace.h" 00014 00015 using std::list; 00016 using std::vector; 00017 using std::string; 00018 using std::map; 00019 using std::ofstream; 00020 00021 // Profiling Mode enum 00022 enum PERFTRACE_MODE 00023 { 00024 PERFTRACE_FUNTIME, 00025 PERFTRACE_MEMLATENCY, 00026 PERFTRACE_OCCUPANCY, 00027 }; 00028 00029 // globals 00030 Knob<int> gKnobMode("mode", 0, "Trace instrumentation scope\n { 0 - funtime, 1 - memory read instructions, 2 - occupancy} "); 00031 Knob<int> gKnobPhase("phase", 0, "tracing tool - processing phase\n { 1 - pre-processing, 2 - processing - trace gathering} "); 00032 Knob<int> gKnobMaxTraceBufferInMB("max_buffer_mb", 300, "perftrace - the max allowed size of the trace buffer per kernel in MB\n"); 00033 00034 00035 /* ============================================================================================= */ 00036 // PerfTraceDispatch implementation 00037 /* ============================================================================================= */ 00038 bool PerfTraceDispatch::ReadTrace(const GtProfileTrace& traceAccessor, const IGtProfileBuffer& profileBuffer) 00039 { 00040 uint32_t traceSize = traceAccessor.Size(profileBuffer); 00041 _rawTrace.resize(traceSize); 00042 _isTrimmed = traceAccessor.IsTruncated(profileBuffer); 00043 return traceAccessor.Read(profileBuffer, _rawTrace.data(), 0, traceSize); 00044 } 00045 00046 bool PerfTraceDispatch::IsEmpty() const 00047 { 00048 return _rawTrace.size() == 0; 00049 } 00050 00051 /* ============================================================================================= */ 00052 // PerfTraceKernel implementation 00053 /* ============================================================================================= */ 00054 PerfTraceKernel::PerfTraceKernel(const IGtKernelInstrument& kernelInstrument, const uint32_t recordSize, uint32_t numTiles) : _numTiles(numTiles) 00055 { 00056 const IGtKernel& kernel = kernelInstrument.Kernel(); 00057 const IGtCfg& cfg = kernelInstrument.Cfg(); 00058 00059 _name = GlueString(kernel.Name()); 00060 _extName = ExtendedKernelName(kernel); 00061 _platform = kernel.GpuPlatform(); 00062 _genId = kernel.GenModel().Id(); 00063 _asmText = CfgAsmText(cfg); 00064 _uniqueName = kernel.UniqueName(); 00065 00066 // Initialize trace accessor. The trace capacity is expected to be computed during the preprocessing phase. 00067 uint64_t traceCapacity = PerfTracePreProcessor::Instance()->TraceSize(_extName); 00068 if (traceCapacity == 0) 00069 { 00070 // Unknown trace capacity 00071 GTPIN_WARNING("PERFTRACE: unknown trace capacity for kernel " + _name + ". Assuming the kernel is filtered out. " 00072 "Allocating a buffer of 8KB size. If the kernel is supposed to run, expect buffer overflow. " 00073 "In this case, please re-run phase 1 and make sure the kernel is not filtered out."); 00074 traceCapacity = 0x2000; 00075 } 00076 else 00077 { 00078 traceCapacity += 0x2000; // Add some space to account for possible fluctuation of trace sizes between phases 00079 if (traceCapacity > UINT32_MAX) 00080 { 00081 GTPIN_WARNING("PERFTRACE: The kernel " + _name + " exceedeed maximum trace capacity."); 00082 traceCapacity = UINT32_MAX; 00083 } 00084 } 00085 if (traceCapacity > (uint64_t(gKnobMaxTraceBufferInMB) * 0x100000)) 00086 { 00087 GTPIN_WARNING("PERFTRACE: required capacity (" + DecStr(traceCapacity) + ") for kernel " + _name + " is too big - cut to " + DecStr(gKnobMaxTraceBufferInMB) + "MB. " 00088 "Expect the final trace to contain partial data."); 00089 traceCapacity = uint64_t(gKnobMaxTraceBufferInMB) * 0x100000; 00090 } 00091 _traceAccessor = GtProfileTrace((uint32_t)traceCapacity, recordSize); 00092 _traceAccessor.Allocate(kernelInstrument.ProfileBufferAllocator()); 00093 } 00094 00095 PerfTraceDispatch& PerfTraceKernel::AddPerfTrace(IGtKernelDispatch& kernelDispatch) 00096 { 00097 // Create a new PerfTraceDispatch object and store the entire trace within this object 00098 _traces.emplace_back(kernelDispatch); 00099 PerfTraceDispatch& perfTraceDispatch = _traces.back(); 00100 if (!perfTraceDispatch.ReadTrace(_traceAccessor, *kernelDispatch.GetProfileBuffer())) 00101 { 00102 GTPIN_ERROR_MSG("PERFTRACE: Failed to read profile buffer for kernel " + _name); 00103 } 00104 return perfTraceDispatch; 00105 } 00106 00107 void PerfTraceKernel::DumpAsm() const 00108 { 00109 DumpKernelAsmText(_name, _uniqueName, _asmText); 00110 } 00111 00112 /* ============================================================================================= */ 00113 // PerfTrace implementation 00114 /* ============================================================================================= */ 00115 void PerfTrace::OnKernelBuild(IGtKernelInstrument& instrumentor) 00116 { 00117 const IGtKernel& kernel = instrumentor.Kernel(); 00118 uint32_t numTiles = (instrumentor.Coder().IsTileIdSupported()) ? GTPin_GetCore()->GenArch().MaxTiles(kernel.GpuPlatform()) : 1; 00119 00120 // Create new KernelData object and add it to the data base 00121 auto ret = _kernels.emplace(std::piecewise_construct, std::forward_as_tuple(kernel.Id()), std::forward_as_tuple(instrumentor, RecordSize(), numTiles)); 00122 if (ret.second) 00123 { 00124 PerfTraceKernel& perfTraceKernel = (*ret.first).second; 00125 if (!perfTraceKernel.IsEnabled()) 00126 { 00127 GTPIN_WARNING("PERFTRACE: The trace won't be generated for kernel " + perfTraceKernel.Name()); 00128 return; 00129 } 00130 Instrument(instrumentor, perfTraceKernel); 00131 } 00132 } 00133 00134 void PerfTrace::OnKernelRun(IGtKernelDispatch& dispatcher) 00135 { 00136 bool isProfileEnabled = false; 00137 00138 const IGtKernel& kernel = dispatcher.Kernel(); 00139 GtKernelExecDesc execDesc; dispatcher.GetExecDescriptor(execDesc); 00140 if (kernel.IsInstrumented() && IsKernelExecProfileEnabled(execDesc, kernel.GpuPlatform(), kernel.Name().Get())) 00141 { 00142 auto it = _kernels.find(kernel.Id()); 00143 if (it != _kernels.end()) 00144 { 00145 const PerfTraceKernel& perfTraceKernel = it->second; 00146 if (perfTraceKernel.IsEnabled()) 00147 { 00148 IGtProfileBuffer* buffer = dispatcher.CreateProfileBuffer(); GTPIN_ASSERT(buffer); 00149 const GtProfileTrace& traceAccessor = perfTraceKernel.TraceAccessor(); 00150 if (traceAccessor.Initialize(*buffer)) 00151 { 00152 isProfileEnabled = true; 00153 } 00154 else 00155 { 00156 GTPIN_ERROR_MSG("PERFTRACE: Failed to write into memory buffer for kernel " + string(kernel.Name())); 00157 } 00158 } 00159 } 00160 } 00161 dispatcher.SetProfilingMode(isProfileEnabled); 00162 } 00163 00164 void PerfTrace::OnKernelComplete(IGtKernelDispatch& dispatcher) 00165 { 00166 if (!dispatcher.IsProfilingEnabled()) 00167 { 00168 return; // Do nothing if kernel profiling has not been applied/failed 00169 } 00170 00171 const IGtKernel& kernel = dispatcher.Kernel(); 00172 auto it = _kernels.find(kernel.Id()); 00173 if (it != _kernels.end()) 00174 { 00175 // Read the trace from the profile buffer 00176 PerfTraceKernel& perfTraceKernel = it->second; 00177 perfTraceKernel.AddPerfTrace(dispatcher); 00178 } 00179 } 00180 00181 void PerfTraceFuntime::GeneratePreCode(GtGenProcedure& proc, const IGtGenCoder& coder) 00182 { 00183 IGtInsFactory& insF = coder.InstructionFactory(); 00184 00185 proc += insF.MakeMov(GtDstRegion(_timeReg, 1, GED_DATA_TYPE_ud), 00186 GtRegRegion(TimeStampReg(), GtStride(2, 2, 1), GED_DATA_TYPE_ud), { 2 }); 00187 00188 if (!proc.empty()) { proc.front()->AppendAnnotation(__func__); } 00189 } 00190 00191 void PerfTraceMemoryLatency::GeneratePreCode(GtGenProcedure& proc, const IGtGenCoder& coder) 00192 { 00193 coder.StartTimer(proc, _timeReg); 00194 if (!proc.empty()) { proc.front()->AppendAnnotation(__func__); } 00195 } 00196 00197 void PerfTraceFuntime::GeneratePostCode(GtGenProcedure& proc, const IGtGenCoder& coder, const PerfTraceKernel& perfTraceKernel) 00198 { 00199 // Initialize virtual registers 00200 IGtVregFactory& vregs = coder.VregFactory(); 00201 GtReg addrReg = vregs.MakeMsgAddrScratch(); 00202 GtReg dataReg = vregs.MakeMsgDataScratch(VREG_TYPE_HWORD); 00203 GtReg offsetReg = vregs.MakeScratch(VREG_TYPE_DWORD); 00204 GtReg tileIdReg = vregs.MakeScratch(VREG_TYPE_DWORD); 00205 00206 auto fieldReg = [&](uint32_t fieldOffset) -> GtReg { return GtReg(dataReg, sizeof(uint32_t), fieldOffset / sizeof(uint32_t)); }; 00207 00208 uint32_t recordSize = RecordSize(); 00209 00210 GtReg sr0FieldReg = fieldReg(offsetof(PerfTraceFuntimeRecord, sr0)); 00211 GtReg tileIdFieldReg = fieldReg(offsetof(PerfTraceFuntimeRecord, tileId)); 00212 GtReg timeStartReg = fieldReg(offsetof(PerfTraceFuntimeRecord, timeStart)); 00213 GtReg timeEndReg = fieldReg(offsetof(PerfTraceFuntimeRecord, timeEnd)); 00214 00215 IGtInsFactory& insF = coder.InstructionFactory(); 00216 GtPredicate predicate(FlagReg(0)); 00217 00218 // dataReg[1-4] = { preTm[0-1], postTm[0-1] } 00219 proc += insF.MakeMov(timeEndReg, GtRegRegion(TimeStampReg(), GtStride(2, 2, 1), GED_DATA_TYPE_ud), { 2 }); 00220 proc += insF.MakeMov(timeStartReg, GtRegRegion(_timeReg, GtStride(2, 2, 1), GED_DATA_TYPE_ud), { 2 }); 00221 00222 // Set values of PerfTraceRecordHeader fields in dataReg 00223 proc += insF.MakeMov(sr0FieldReg, StateReg(0)); // sr0.0 00224 00225 coder.LoadTileId(proc, tileIdReg); 00226 proc += insF.MakeMov(tileIdFieldReg, tileIdReg); // tile ID 00227 00228 // Allocate new record in the trace. 00229 // Set offsetReg = offset of the allocated record in the profile buffer, addrReg = address of the allocated record 00230 perfTraceKernel.TraceAccessor().ComputeNewRecordOffset(coder, proc, recordSize, offsetReg); 00231 coder.ComputeAddress(proc, addrReg, offsetReg); 00232 00233 //if (!predicate) { STORE buffer[offsetReg] = dataReg; 00234 coder.StoreMemBlock(proc, addrReg, dataReg, recordSize, !predicate); 00235 00236 if (!proc.empty()) { proc.front()->AppendAnnotation(__func__); } 00237 } 00238 00239 void PerfTraceMemoryLatency::GeneratePostCode(GtGenProcedure& proc, const IGtGenCoder& coder) 00240 { 00241 IGtInsFactory& insF = coder.InstructionFactory(); 00242 00243 coder.StopTimer(proc, _timeReg); 00244 proc += insF.MakeAdd(_cyclesAccumReg, _cyclesAccumReg, _timeReg); 00245 00246 if (!proc.empty()) { proc.front()->AppendAnnotation(__func__); } 00247 } 00248 00249 void PerfTraceMemoryLatency::GenerateFiniCode(GtGenProcedure& proc, const IGtGenCoder& coder, const PerfTraceKernel& perfTraceKernel) 00250 { 00251 // Initialize virtual registers 00252 IGtVregFactory& vregs = coder.VregFactory(); 00253 GtReg addrReg = vregs.MakeMsgAddrScratch(); 00254 GtReg dataReg = vregs.MakeMsgDataScratch(VREG_TYPE_HWORD); 00255 GtReg offsetReg = vregs.MakeScratch(VREG_TYPE_DWORD); 00256 GtReg tileIdReg = vregs.MakeScratch(VREG_TYPE_DWORD); 00257 00258 auto fieldReg = [&](uint32_t fieldOffset) -> GtReg { return GtReg(dataReg, sizeof(uint32_t), fieldOffset / sizeof(uint32_t)); }; 00259 GtReg sr0FieldReg = fieldReg(offsetof(PerfTraceMemoryLatencyRecord, sr0)); 00260 GtReg tileIdFieldReg = fieldReg(offsetof(PerfTraceMemoryLatencyRecord, tileId)); 00261 GtReg cyclesReg = fieldReg(offsetof(PerfTraceMemoryLatencyRecord, cycles)); 00262 00263 uint32_t recordSize = RecordSize(); 00264 00265 IGtInsFactory& insF = coder.InstructionFactory(); 00266 GtPredicate predicate(FlagReg(0)); 00267 00268 // Set values of PerfTraceRecordHeader fields in dataReg 00269 proc += insF.MakeMov(sr0FieldReg, StateReg(0)); // sr0.0 00270 coder.LoadTileId(proc, tileIdReg); 00271 proc += insF.MakeMov(tileIdFieldReg, tileIdReg); // tile ID 00272 proc += insF.MakeMov(cyclesReg, _cyclesAccumReg); // accumReg 00273 00274 // Allocate new record in the trace. 00275 // Set offsetReg = offset of the allocated record in the profile buffer, addrReg = address of the allocated record 00276 perfTraceKernel.TraceAccessor().ComputeNewRecordOffset(coder, proc, recordSize, offsetReg); 00277 coder.ComputeAddress(proc, addrReg, offsetReg); 00278 00279 //if (!predicate) { STORE buffer[offsetReg] = dataReg; 00280 coder.StoreMemBlock(proc, addrReg, dataReg, recordSize, !predicate); 00281 00282 if (!proc.empty()) { proc.front()->AppendAnnotation(__func__); } 00283 } 00284 00285 void PerfTraceFuntime::Instrument(IGtKernelInstrument& instrumentor, const PerfTraceKernel& perfTraceKernel) 00286 { 00287 const IGtCfg& cfg = instrumentor.Cfg(); 00288 const IGtGenCoder& coder = instrumentor.Coder(); 00289 IGtVregFactory& vregs = coder.VregFactory(); 00290 00291 _timeReg = vregs.Make(VREG_TYPE_QWORD); 00292 00293 // Generate code that starts/stops timer at entry/exit of the kernel 00294 GtGenProcedure preCode; GeneratePreCode(preCode, coder); 00295 GtGenProcedure postCode; GeneratePostCode(postCode, coder, perfTraceKernel); 00296 00297 // Instrument kernel entries 00298 instrumentor.InstrumentEntries(preCode); 00299 00300 // Instrument kernel exits 00301 for (auto bblPtr : cfg.ExitBbls()) 00302 { 00303 const IGtIns& ins = bblPtr->LastIns(); GTPIN_ASSERT(ins.IsEot()); 00304 GtGenProcedure fakeConsumers; 00305 coder.GenerateFakeSrcConsumers(fakeConsumers, ins); 00306 instrumentor.InstrumentInstruction(ins, GtIpoint::Before(), fakeConsumers); 00307 instrumentor.InstrumentInstruction(ins, GtIpoint::Before(), postCode); 00308 } 00309 } 00310 00311 void PerfTraceMemoryLatency::Instrument(IGtKernelInstrument& instrumentor, const PerfTraceKernel& perfTraceKernel) 00312 { 00313 const IGtCfg& cfg = instrumentor.Cfg(); 00314 IGtVregFactory& vregs = instrumentor.Coder().VregFactory(); 00315 00316 _cyclesAccumReg = vregs.Make(VREG_TYPE_DWORD); 00317 00318 // Instrument kernel exits 00319 for (auto bblPtr : cfg.Bbls()) 00320 { 00321 InstrumentBbl(instrumentor, *bblPtr, perfTraceKernel); 00322 } 00323 } 00324 00325 void PerfTraceMemoryLatency::InstrumentBbl(IGtKernelInstrument& instrumentor, const IGtBbl& bbl, const PerfTraceKernel& perfTraceKernel) 00326 { 00327 const IGtGenCoder& coder = instrumentor.Coder(); 00328 00329 for (auto& insPtr : bbl.Instructions()) 00330 { 00331 const IGtIns& ins = *insPtr; 00332 00333 if (ins.IsEot()) 00334 { 00335 GtGenProcedure finiCode; GenerateFiniCode(finiCode, coder, perfTraceKernel); 00336 instrumentor.InstrumentInstruction(ins, GtIpoint::Before(), finiCode); 00337 } 00338 if (ins.IsMemRead()) 00339 { 00340 GtGenProcedure preCode; GeneratePreCode(preCode, coder); 00341 GtGenProcedure postCode; GeneratePostCode(postCode, coder); 00342 GtGenProcedure fakeConsumers; 00343 coder.GenerateFakeDstConsumers(fakeConsumers, ins); 00344 00345 instrumentor.InstrumentInstruction(ins, GtIpoint::Before(), preCode); 00346 instrumentor.InstrumentInstruction(ins, GtIpoint::After(), fakeConsumers); 00347 instrumentor.InstrumentInstruction(ins, GtIpoint::After(), postCode); 00348 } 00349 } 00350 } 00351 00352 void PerfTraceFuntime::OnFini(void) 00353 { 00354 PerfTraceFuntime& me = *Instance(); 00355 IGtCore* gtpinCore = GTPin_GetCore(); 00356 for (auto& ref : me._kernels) 00357 { 00358 const PerfTraceKernel& perfTraceKernel = ref.second; 00359 PerfTracePostProcessorFuntime(*gtpinCore, perfTraceKernel)(); 00360 } 00361 } 00362 00363 void PerfTraceMemoryLatency::OnFini(void) 00364 { 00365 PerfTraceMemoryLatency& me = *Instance(); 00366 IGtCore* gtpinCore = GTPin_GetCore(); 00367 for (auto& ref : me._kernels) 00368 { 00369 const PerfTraceKernel& perfTraceKernel = ref.second; 00370 PerfTracePostProcessorMemoryLatency(*gtpinCore, perfTraceKernel)(); 00371 } 00372 } 00373 00374 PerfTraceFuntime* PerfTraceFuntime::Instance() 00375 { 00376 static PerfTraceFuntime perfTrace; 00377 return &perfTrace; 00378 } 00379 00380 PerfTraceMemoryLatency* PerfTraceMemoryLatency::Instance() 00381 { 00382 static PerfTraceMemoryLatency perfTrace; 00383 return &perfTrace; 00384 } 00385 00386 /* ============================================================================================= */ 00387 // PerfTracePreProcessor implementation 00388 /* ============================================================================================= */ 00389 const char* PerfTracePreProcessor::_kernelPreProcessFileName = "perftrace_pre_process.txt"; 00390 const char* PerfTracePreProcessor::_dispatchPreProcessFileName = "perftrace_pre_process_dispatch.txt"; 00391 00392 PerfTracePreProcessor::PerfTracePreProcessor() 00393 { 00394 if (gKnobPhase == 2) 00395 { 00396 // Read the data collected during the preprocessing phase 00397 std::ifstream is(_kernelPreProcessFileName); 00398 GTPIN_ASSERT_MSG(is, string("File ") + _kernelPreProcessFileName + " does not exist. The trace won't be generated"); 00399 is >> _kernelCounters; 00400 } 00401 else if (gKnobPhase == 1) 00402 { 00403 // Create pre_process files or remove old pre_process files's content if they exist 00404 CreateCleanFile(_kernelPreProcessFileName); 00405 CreateCleanFile(_dispatchPreProcessFileName); 00406 } 00407 00408 if (gKnobMode == PERFTRACE_FUNTIME || gKnobMode == PERFTRACE_OCCUPANCY) 00409 { 00410 _perfTrace = PerfTraceFuntime::Instance(); 00411 } 00412 if (gKnobMode == PERFTRACE_MEMLATENCY) 00413 { 00414 _perfTrace = PerfTraceMemoryLatency::Instance(); 00415 } 00416 } 00417 00418 PerfTracePreProcessor* PerfTracePreProcessor::Instance() 00419 { 00420 static PerfTracePreProcessor instance; 00421 return &instance; 00422 } 00423 00424 void PerfTracePreProcessor::OnFini() 00425 { 00426 PerfTracePreProcessor& tool = *Instance(); 00427 tool.DumpKernelProfiles(_kernelPreProcessFileName); 00428 tool.DumpDispatchProfiles(_dispatchPreProcessFileName); 00429 } 00430 00431 uint64_t PerfTracePreProcessor::TraceSize(const string& extKernelName) const 00432 { 00433 auto it = _kernelCounters.find(extKernelName); 00434 return ((it == _kernelCounters.end()) ? 0 : it->second.weight); 00435 } 00436 00437 uint32_t PerfTracePreProcessor::GetBblWeight(IGtKernelInstrument&, const IGtBbl& bbl) const 00438 { 00439 return bbl.IsEot() ? _perfTrace->RecordSize() : 0; 00440 } 00441 00442 void PerfTracePreProcessor::AggregateDispatchCounters(KernelWeightCounters& kc, KernelWeightCounters dc) const 00443 { 00444 kc.weight = std::max(kc.weight, dc.weight); 00445 kc.freq += dc.freq; 00446 } 00447 00448 /* ============================================================================================= */ 00449 // PerfTracePostProcessor implementation 00450 /* ============================================================================================= */ 00451 const char* PerfTracePostProcessor::_traceFileName = "perftrace_compressed.bin"; 00452 00453 PerfTracePostProcessor::PerfTracePostProcessor(const IGtCore& gtpinCore, const PerfTraceKernel& perfTraceKernel, const PerfTrace* perfTrace) : 00454 _kernel(&perfTraceKernel), _kernelDir(JoinPath(string(gtpinCore.ProfileDir()), perfTraceKernel.UniqueName())), _perfTrace(perfTrace) 00455 { 00456 GTPIN_ASSERT(_perfTrace); 00457 } 00458 00459 bool PerfTracePostProcessor::operator()() 00460 { 00461 if (!MakeDirectory(_kernelDir)) 00462 { 00463 GTPIN_WARNING("PERFTRACE: Could not create directory " + _kernelDir); 00464 return false; 00465 } 00466 00467 // Process traces recorded in kernel dispatches 00468 for (const auto& trace : _kernel->GetTraces()) 00469 { 00470 if (!trace.IsEmpty()) 00471 { 00472 if (trace.IsTrimmed()) 00473 { 00474 GTPIN_WARNING("PERFTRACE: Detected trace buffer overflow in kernel " + _kernel->Name()); 00475 } 00476 00477 string subdir = trace.KernelExecDesc().ToString(_kernel->Platform(), ExecDescFileNameFormat()); 00478 string dir = MakeSubDirectory(_kernelDir, subdir); 00479 string filePath = JoinPath(dir, _traceFileName); 00480 00481 ofstream fs(filePath, std::ios::binary); 00482 if (!fs) 00483 { 00484 GTPIN_WARNING("PERFTRACE: Could not create file " + filePath); 00485 continue; 00486 } 00487 StoreTrace(trace, fs); 00488 } 00489 } 00490 return true; 00491 } 00492 00493 void PerfTracePostProcessor::StoreTrace(const PerfTraceDispatch& trace, std::ofstream& fs) 00494 { 00495 const uint8_t* traceData = trace.Data(); 00496 uint32_t traceSize = trace.Size(); 00497 00498 uint32_t recordSize = _perfTrace->RecordSize(); 00499 00500 // Associate trace records with threads - populate _threadTraceRecords array 00501 uint32_t maxThreads = _kernel->GenModel().MaxThreads(); // Max number of HW threads 00502 _threadTraceRecords.resize(_kernel->NumTiles()); 00503 00504 for (uint32_t tile = 0; tile < _kernel->NumTiles(); tile++) 00505 { 00506 _threadTraceRecords[tile].clear(); 00507 _threadTraceRecords[tile].resize(maxThreads); 00508 } 00509 00510 std::vector<uint32_t> numProfiledThreads; // Number of profiled (active) threads 00511 numProfiledThreads.resize(_kernel->NumTiles(), 0); 00512 00513 for (uint32_t recordOffset = 0; recordOffset + recordSize <= traceSize;) 00514 { 00515 const uint8_t* recordPtr = traceData + recordOffset; 00516 // Retrive thread ID from the record 00517 uint32_t tid = GetGlobalTid(recordPtr); 00518 uint32_t tileId = GetTileId(recordPtr); GTPIN_ASSERT(tileId < _kernel->NumTiles()); 00519 if (recordOffset + recordSize > traceSize) 00520 { 00521 break; // end of trace 00522 } 00523 00524 auto& tileThreadRecords = _threadTraceRecords[tileId]; 00525 auto& threadTraceRecords = tileThreadRecords[tid]; 00526 00527 // Add a new trace record reference to _threadTraceRecords 00528 if (threadTraceRecords.empty()) { ++numProfiledThreads[tileId]; } // Increment thread count on the first relevant record 00529 threadTraceRecords.emplace_back(TraceRecord{ recordPtr, recordSize }); 00530 00531 recordOffset += recordSize; 00532 } 00533 00534 uint32_t mode = gKnobMode; 00535 Store(mode, fs); // Store profiling mode 00536 00537 // Compute and store the number of involved tiles 00538 uint32_t numOfTiles = 0; 00539 for (uint32_t i = 0; i < numProfiledThreads.size(); i++) 00540 { 00541 numOfTiles += (numProfiledThreads[i] == 0) ? 0 : 1; 00542 } 00543 Store(numOfTiles, fs); 00544 00545 for (uint32_t tileId = 0; tileId < _threadTraceRecords.size(); tileId++) 00546 { 00547 if (numProfiledThreads[tileId] == 0) { continue; } 00548 00549 Store(tileId, fs); 00550 00551 // Store the number of profiled threads 00552 Store(numProfiledThreads[tileId], fs); 00553 00554 // Store per-thread traces 00555 for (uint32_t tid = 0; tid < maxThreads; tid++) 00556 { 00557 const auto& tileThreadRecords = _threadTraceRecords[tileId]; 00558 const auto& traceRecordList = tileThreadRecords[tid]; 00559 00560 if (traceRecordList.empty()) { continue; } 00561 00562 StoreGlobalTid(tid, fs); // Store Global Thread Identifier 00563 00564 uint32_t numRecords = (uint32_t)traceRecordList.size(); 00565 Store(numRecords, fs); // Store #records collected in the thread 00566 00567 // Store trace records 00568 for (const auto& r : traceRecordList) 00569 { 00570 StoreTraceRecordData((const uint8_t*)r.record, fs); 00571 } 00572 } 00573 } 00574 } 00575 00576 void PerfTracePostProcessor::StoreGlobalTid(uint32_t gtid, std::ofstream& fs) 00577 { 00578 const GtStateRegAccessor& sra = _kernel->GenModel().StateRegAccessor(); 00579 uint32_t sr0 = sra.SetGlobalTid(0, gtid); 00580 00581 auto storeSr0Field = [&](const ScatteredBitFieldU32& sbf) 00582 { 00583 uint32_t val = (sbf.IsEmpty() ? UINT32_MAX : sbf.GetValue(sr0)); 00584 Store(val, fs); 00585 }; 00586 00587 storeSr0Field(sra.SliceIdField()); 00588 storeSr0Field(sra.DualSubSliceIdField()); 00589 storeSr0Field(sra.SubSliceIdField()); 00590 storeSr0Field(sra.EuIdField()); 00591 storeSr0Field(sra.ThreadSlotField()); 00592 } 00593 00594 uint32_t PerfTracePostProcessorFuntime::GetGlobalTid(const uint8_t* traceRecord) const 00595 { 00596 return _kernel->GenModel().StateRegAccessor().GetGlobalTid((((const PerfTraceFuntimeRecord*)traceRecord)->sr0) & 0xFFFF); 00597 } 00598 00599 uint32_t PerfTracePostProcessorFuntime::GetTileId(const uint8_t* traceRecord) const 00600 { 00601 return ((const PerfTraceFuntimeRecord*)traceRecord)->tileId; 00602 } 00603 00604 void PerfTracePostProcessorFuntime::StoreTraceRecordData(const uint8_t* traceRecord, std::ofstream& fs) const 00605 { 00606 const PerfTraceFuntimeRecord* record = (const PerfTraceFuntimeRecord*)traceRecord; 00607 fs.write((const char*)&record->timeStart, sizeof(PerfTraceFuntimeRecord::timeStart)); 00608 fs.write((const char*)&record->timeEnd, sizeof(PerfTraceFuntimeRecord::timeEnd)); 00609 } 00610 00611 uint32_t PerfTracePostProcessorMemoryLatency::GetGlobalTid(const uint8_t* traceRecord) const 00612 { 00613 return _kernel->GenModel().StateRegAccessor().GetGlobalTid((((const PerfTraceMemoryLatencyRecord*)traceRecord)->sr0) & 0xFFFF); 00614 } 00615 00616 uint32_t PerfTracePostProcessorMemoryLatency::GetTileId(const uint8_t* traceRecord) const 00617 { 00618 return ((const PerfTraceMemoryLatencyRecord*)traceRecord)->tileId; 00619 } 00620 00621 void PerfTracePostProcessorMemoryLatency::StoreTraceRecordData(const uint8_t* traceRecord, std::ofstream& fs) const 00622 { 00623 const PerfTraceMemoryLatencyRecord* record = (const PerfTraceMemoryLatencyRecord*)traceRecord; 00624 fs.write((const char*)&record->cycles, sizeof(PerfTraceMemoryLatencyRecord::cycles)); 00625 } 00626 00627 /* ============================================================================================= */ 00628 // GTPin_Entry 00629 /* ============================================================================================= */ 00630 EXPORT_C_FUNC void GTPin_Entry(int argc, const char *argv[]) 00631 { 00632 ConfigureGTPin(argc, argv); 00633 GTPIN_ASSERT_MSG((gKnobMode == 0 || gKnobMode == 1 || gKnobMode == 2), "PERFTRACE: Invalid mode value. Should be 0, 1 or 2, provided " + std::to_string(gKnobMode)); 00634 00635 if (gKnobPhase == 1) 00636 { 00637 PerfTracePreProcessor::Instance()->Register(); 00638 atexit(PerfTracePreProcessor::OnFini); 00639 } 00640 else 00641 { 00642 GTPIN_ASSERT_MSG((gKnobPhase == 2), "PERFTRACE: Invalid phase value. Should be 1 or 2, provided " + std::to_string(gKnobPhase)); 00643 00644 if (gKnobMode == PERFTRACE_FUNTIME || gKnobMode == PERFTRACE_OCCUPANCY) 00645 { 00646 PerfTraceFuntime::Instance()->Register(); 00647 atexit(PerfTraceFuntime::OnFini); 00648 } 00649 else if (gKnobMode == PERFTRACE_MEMLATENCY) 00650 { 00651 PerfTraceMemoryLatency::Instance()->Register(); 00652 atexit(PerfTraceMemoryLatency::OnFini); 00653 } 00654 else 00655 { 00656 GTPIN_ASSERT(0); 00657 } 00658 } 00659 }
(Back to the list of all GTPin Sample Tools)
Copyright (C) 2013-2025 Intel Corporation
SPDX-License-Identifier: MIT
1.7.4