18. Queries
Queries provide a mechanism to return information about the processing of a sequence of Vulkan commands. Query operations are asynchronous, and as such, their results are not returned immediately. Instead, their results, and their availability status are stored in a Query Pool. The state of these queries can be read back on the host, or copied to a buffer object on the device.
The supported query types are Occlusion Queries, Pipeline Statistics Queries, Result Status Queries, Video Encode Feedback Queries and Timestamp Queries. Performance Queries are supported if the associated extension is available. Transform Feedback Queries are supported if the associated extension is available. Intel Performance Queries are supported if the associated extension is available. Mesh Shader Queries are supported if the associated extension is available.
Several additional queries with specific purposes associated with ray tracing are available if the corresponding extensions are supported, as described for VkQueryType.
18.1. Query Pools
Queries are managed using query pool objects. Each query pool is a collection of a specific number of queries of a particular type.
Query pools are represented by VkQueryPool handles:
// Provided by VK_VERSION_1_0
VK_DEFINE_NON_DISPATCHABLE_HANDLE(VkQueryPool)
To create a query pool, call:
// Provided by VK_VERSION_1_0
VkResult vkCreateQueryPool(
VkDevice device,
const VkQueryPoolCreateInfo* pCreateInfo,
const VkAllocationCallbacks* pAllocator,
VkQueryPool* pQueryPool);
-
deviceis the logical device that creates the query pool. -
pCreateInfois a pointer to a VkQueryPoolCreateInfo structure containing the number and type of queries to be managed by the pool. -
pAllocatorcontrols host memory allocation as described in the Memory Allocation chapter. -
pQueryPoolis a pointer to a VkQueryPool handle in which the resulting query pool object is returned.
The VkQueryPoolCreateInfo structure is defined as:
// Provided by VK_VERSION_1_0
typedef struct VkQueryPoolCreateInfo {
VkStructureType sType;
const void* pNext;
VkQueryPoolCreateFlags flags;
VkQueryType queryType;
uint32_t queryCount;
VkQueryPipelineStatisticFlags pipelineStatistics;
} VkQueryPoolCreateInfo;
-
sTypeis a VkStructureType value identifying this structure. -
pNextisNULLor a pointer to a structure extending this structure. -
flagsis reserved for future use. -
queryTypeis a VkQueryType value specifying the type of queries managed by the pool. -
queryCountis the number of queries managed by the pool. -
pipelineStatisticsis a bitmask of VkQueryPipelineStatisticFlagBits specifying which counters will be returned in queries on the new pool, as described below in Pipeline Statistics Queries.
pipelineStatistics is ignored if queryType is not
VK_QUERY_TYPE_PIPELINE_STATISTICS.
// Provided by VK_VERSION_1_0
typedef VkFlags VkQueryPoolCreateFlags;
VkQueryPoolCreateFlags is a bitmask type for setting a mask, but is
currently reserved for future use.
The VkQueryPoolPerformanceCreateInfoKHR structure is defined as:
// Provided by VK_KHR_performance_query
typedef struct VkQueryPoolPerformanceCreateInfoKHR {
VkStructureType sType;
const void* pNext;
uint32_t queueFamilyIndex;
uint32_t counterIndexCount;
const uint32_t* pCounterIndices;
} VkQueryPoolPerformanceCreateInfoKHR;
-
sTypeis a VkStructureType value identifying this structure. -
pNextisNULLor a pointer to a structure extending this structure. -
queueFamilyIndexis the queue family index to create this performance query pool for. -
counterIndexCountis the length of thepCounterIndicesarray. -
pCounterIndicesis a pointer to an array of indices into the vkEnumeratePhysicalDeviceQueueFamilyPerformanceQueryCountersKHR::pCountersto enable in this performance query pool.
To query the number of passes required to query a performance query pool on a physical device, call:
// Provided by VK_KHR_performance_query
void vkGetPhysicalDeviceQueueFamilyPerformanceQueryPassesKHR(
VkPhysicalDevice physicalDevice,
const VkQueryPoolPerformanceCreateInfoKHR* pPerformanceQueryCreateInfo,
uint32_t* pNumPasses);
-
physicalDeviceis the handle to the physical device whose queue family performance query counter properties will be queried. -
pPerformanceQueryCreateInfois a pointer to aVkQueryPoolPerformanceCreateInfoKHRof the performance query that is to be created. -
pNumPassesis a pointer to an integer related to the number of passes required to query the performance query pool, as described below.
The pPerformanceQueryCreateInfo member
VkQueryPoolPerformanceCreateInfoKHR::queueFamilyIndex must be a
queue family of physicalDevice.
The number of passes required to capture the counters specified in the
pPerformanceQueryCreateInfo member
VkQueryPoolPerformanceCreateInfoKHR::pCounters is returned in
pNumPasses.
To destroy a query pool, call:
// Provided by VK_VERSION_1_0
void vkDestroyQueryPool(
VkDevice device,
VkQueryPool queryPool,
const VkAllocationCallbacks* pAllocator);
-
deviceis the logical device that destroys the query pool. -
queryPoolis the query pool to destroy. -
pAllocatorcontrols host memory allocation as described in the Memory Allocation chapter.
|
Note
|
Applications can verify that |
Possible values of VkQueryPoolCreateInfo::queryType, specifying
the type of queries managed by the pool, are:
// Provided by VK_VERSION_1_0
typedef enum VkQueryType {
VK_QUERY_TYPE_OCCLUSION = 0,
VK_QUERY_TYPE_PIPELINE_STATISTICS = 1,
VK_QUERY_TYPE_TIMESTAMP = 2,
// Provided by VK_KHR_video_queue
VK_QUERY_TYPE_RESULT_STATUS_ONLY_KHR = 1000023000,
// Provided by VK_EXT_transform_feedback
VK_QUERY_TYPE_TRANSFORM_FEEDBACK_STREAM_EXT = 1000028004,
// Provided by VK_KHR_performance_query
VK_QUERY_TYPE_PERFORMANCE_QUERY_KHR = 1000116000,
// Provided by VK_KHR_acceleration_structure
VK_QUERY_TYPE_ACCELERATION_STRUCTURE_COMPACTED_SIZE_KHR = 1000150000,
// Provided by VK_KHR_acceleration_structure
VK_QUERY_TYPE_ACCELERATION_STRUCTURE_SERIALIZATION_SIZE_KHR = 1000150001,
// Provided by VK_NV_ray_tracing
VK_QUERY_TYPE_ACCELERATION_STRUCTURE_COMPACTED_SIZE_NV = 1000165000,
// Provided by VK_INTEL_performance_query
VK_QUERY_TYPE_PERFORMANCE_QUERY_INTEL = 1000210000,
// Provided by VK_KHR_video_encode_queue
VK_QUERY_TYPE_VIDEO_ENCODE_FEEDBACK_KHR = 1000299000,
// Provided by VK_EXT_mesh_shader
VK_QUERY_TYPE_MESH_PRIMITIVES_GENERATED_EXT = 1000328000,
// Provided by VK_EXT_primitives_generated_query
VK_QUERY_TYPE_PRIMITIVES_GENERATED_EXT = 1000382000,
// Provided by VK_KHR_ray_tracing_maintenance1
VK_QUERY_TYPE_ACCELERATION_STRUCTURE_SERIALIZATION_BOTTOM_LEVEL_POINTERS_KHR = 1000386000,
// Provided by VK_KHR_ray_tracing_maintenance1
VK_QUERY_TYPE_ACCELERATION_STRUCTURE_SIZE_KHR = 1000386001,
// Provided by VK_EXT_opacity_micromap
VK_QUERY_TYPE_MICROMAP_SERIALIZATION_SIZE_EXT = 1000396000,
// Provided by VK_EXT_opacity_micromap
VK_QUERY_TYPE_MICROMAP_COMPACTED_SIZE_EXT = 1000396001,
} VkQueryType;
-
VK_QUERY_TYPE_OCCLUSIONspecifies an occlusion query. -
VK_QUERY_TYPE_PIPELINE_STATISTICSspecifies a pipeline statistics query. -
VK_QUERY_TYPE_TIMESTAMPspecifies a timestamp query. -
VK_QUERY_TYPE_PERFORMANCE_QUERY_KHRspecifies a performance query. -
VK_QUERY_TYPE_TRANSFORM_FEEDBACK_STREAM_EXTspecifies a transform feedback query. -
VK_QUERY_TYPE_PRIMITIVES_GENERATED_EXTspecifies a primitives generated query. -
VK_QUERY_TYPE_ACCELERATION_STRUCTURE_COMPACTED_SIZE_KHRspecifies a acceleration structure size query for use with vkCmdWriteAccelerationStructuresPropertiesKHR or vkWriteAccelerationStructuresPropertiesKHR. -
VK_QUERY_TYPE_ACCELERATION_STRUCTURE_SERIALIZATION_SIZE_KHRspecifies a serialization acceleration structure size query. -
VK_QUERY_TYPE_ACCELERATION_STRUCTURE_SIZE_KHRspecifies an acceleration structure size query for use with vkCmdWriteAccelerationStructuresPropertiesKHR or vkWriteAccelerationStructuresPropertiesKHR. -
VK_QUERY_TYPE_ACCELERATION_STRUCTURE_SERIALIZATION_BOTTOM_LEVEL_POINTERS_KHRspecifies a serialization acceleration structure pointer count query. -
VK_QUERY_TYPE_ACCELERATION_STRUCTURE_COMPACTED_SIZE_NVspecifies an acceleration structure size query for use with vkCmdWriteAccelerationStructuresPropertiesNV. -
VK_QUERY_TYPE_PERFORMANCE_QUERY_INTELspecifies a Intel performance query. -
VK_QUERY_TYPE_RESULT_STATUS_ONLY_KHRspecifies a result status query. -
VK_QUERY_TYPE_VIDEO_ENCODE_FEEDBACK_KHRspecifies a video encode feedback query. -
VK_QUERY_TYPE_MESH_PRIMITIVES_GENERATED_EXTspecifies a generated mesh primitives query.
18.2. Query Operation
The operation of queries is controlled by the commands vkCmdBeginQuery, vkCmdEndQuery, vkCmdBeginQueryIndexedEXT, vkCmdEndQueryIndexedEXT, vkCmdResetQueryPool, vkCmdCopyQueryPoolResults, vkCmdWriteTimestamp2, and vkCmdWriteTimestamp.
In order for a VkCommandBuffer to record query management commands,
the queue family for which its VkCommandPool was created must support
the appropriate type of operations (graphics, compute) suitable for the
query type of a given query pool.
Each query in a query pool has a status that is either unavailable or available, and also has state to store the numerical results of a query operation of the type requested when the query pool was created. Resetting a query via vkCmdResetQueryPool or vkResetQueryPool sets the status to unavailable and makes the numerical results undefined. A query is made available by the operation of vkCmdEndQuery, vkCmdEndQueryIndexedEXT, vkCmdWriteTimestamp2, or vkCmdWriteTimestamp. Both the availability status and numerical results can be retrieved by calling either vkGetQueryPoolResults or vkCmdCopyQueryPoolResults.
After query pool creation, each query is in an uninitialized state and must be reset before it is used. Queries must also be reset between uses.
If a logical device includes multiple physical devices, then each command that writes a query must execute on a single physical device, and any call to vkCmdBeginQuery must execute the corresponding vkCmdEndQuery command on the same physical device.
To reset a range of queries in a query pool on a queue, call:
// Provided by VK_VERSION_1_0
void vkCmdResetQueryPool(
VkCommandBuffer commandBuffer,
VkQueryPool queryPool,
uint32_t firstQuery,
uint32_t queryCount);
-
commandBufferis the command buffer into which this command will be recorded. -
queryPoolis the handle of the query pool managing the queries being reset. -
firstQueryis the initial query index to reset. -
queryCountis the number of queries to reset.
When executed on a queue, this command sets the status of query indices
[firstQuery, firstQuery + queryCount - 1] to
unavailable.
This command defines an execution dependency between other query commands that reference the same query.
The first synchronization scope
includes all commands which reference the queries in queryPool
indicated by firstQuery and queryCount that occur earlier in
submission order.
The second synchronization scope
includes all commands which reference the queries in queryPool
indicated by firstQuery and queryCount that occur later in
submission order.
The operation of this command happens after the first scope and happens before the second scope.
If the queryType used to create queryPool was
VK_QUERY_TYPE_PERFORMANCE_QUERY_KHR, this command sets the status of
query indices [firstQuery, firstQuery +
queryCount - 1] to unavailable for each pass of queryPool, as
indicated by a call to
vkGetPhysicalDeviceQueueFamilyPerformanceQueryPassesKHR.
|
Note
|
Because |
To reset a range of queries in a query pool on the host, call:
// Provided by VK_VERSION_1_2
void vkResetQueryPool(
VkDevice device,
VkQueryPool queryPool,
uint32_t firstQuery,
uint32_t queryCount);
or the equivalent command
// Provided by VK_EXT_host_query_reset
void vkResetQueryPoolEXT(
VkDevice device,
VkQueryPool queryPool,
uint32_t firstQuery,
uint32_t queryCount);
-
deviceis the logical device that owns the query pool. -
queryPoolis the handle of the query pool managing the queries being reset. -
firstQueryis the initial query index to reset. -
queryCountis the number of queries to reset.
This command sets the status of query indices [firstQuery,
firstQuery + queryCount - 1] to unavailable.
If queryPool is VK_QUERY_TYPE_PERFORMANCE_QUERY_KHR this command
sets the status of query indices [firstQuery, firstQuery
+ queryCount - 1] to unavailable for each pass.
Once queries are reset and ready for use, query commands can be issued to a command buffer. Occlusion queries and pipeline statistics queries count events - drawn samples and pipeline stage invocations, respectively - resulting from commands that are recorded between a vkCmdBeginQuery command and a vkCmdEndQuery command within a specified command buffer, effectively scoping a set of drawing and/or dispatching commands. Timestamp queries write timestamps to a query pool. Performance queries record performance counters to a query pool.
A query must begin and end in the same command buffer, although if it is a
primary command buffer, and the inheritedQueries feature is enabled, it can execute secondary
command buffers during the query operation.
For a secondary command buffer to be executed while a query is active, it
must set the occlusionQueryEnable, queryFlags, and/or
pipelineStatistics members of VkCommandBufferInheritanceInfo to
conservative values, as described in the Command
Buffer Recording section.
A query must either begin and end inside the same subpass of a render pass
instance, or must both begin and end outside of a render pass instance
(i.e. contain entire render pass instances).
If queries are used while executing a render pass instance that has
multiview enabled, the query uses N consecutive query indices in the
query pool (starting at query) where N is the number of bits set
in the view mask in the subpass the query is used in.
How the numerical results of the query are distributed among the queries is
implementation-dependent.
For example, some implementations may write each view’s results to a
distinct query, while other implementations may write the total result to
the first query and write zero to the other queries.
However, the sum of the results in all the queries must accurately reflect
the total result of the query summed over all views.
Applications can sum the results from all the queries to compute the total
result.
Queries used with multiview rendering must not span subpasses, i.e. they must begin and end in the same subpass.
A query must either begin and end inside the same video coding scope, or must both begin and end outside of a video coding scope and must not contain entire video coding scopes.
To begin a query, call:
// Provided by VK_VERSION_1_0
void vkCmdBeginQuery(
VkCommandBuffer commandBuffer,
VkQueryPool queryPool,
uint32_t query,
VkQueryControlFlags flags);
-
commandBufferis the command buffer into which this command will be recorded. -
queryPoolis the query pool that will manage the results of the query. -
queryis the query index within the query pool that will contain the results. -
flagsis a bitmask of VkQueryControlFlagBits specifying constraints on the types of queries that can be performed.
If the queryType of the pool is VK_QUERY_TYPE_OCCLUSION and
flags contains VK_QUERY_CONTROL_PRECISE_BIT, an implementation
must return a result that matches the actual number of samples passed.
This is described in more detail in Occlusion Queries.
Calling vkCmdBeginQuery is equivalent to calling
vkCmdBeginQueryIndexedEXT with the index parameter set to zero.
After beginning a query, that query is considered active within the command buffer it was called in until that same query is ended. Queries active in a primary command buffer when secondary command buffers are executed are considered active for those secondary command buffers.
Furthermore, if the query is started within a video coding scope, the following command buffer states are initialized for the query type:
Each video coding operation stores a result to the query corresponding to the current active query index, followed by incrementing the active query index. If the active query index gets incremented past the last activatable query index, issuing any further video coding operations results in undefined behavior.
|
Note
|
In practice, this means that currently no more than a single video coding operation must be issued between a begin and end query pair. |
This command defines an execution dependency between other query commands that reference the same query.
The first synchronization scope
includes all commands which reference the queries in queryPool
indicated by query that occur earlier in
submission order.
The second synchronization scope
includes all commands which reference the queries in queryPool
indicated by query that occur later in
submission order.
The operation of this command happens after the first scope and happens before the second scope.
To begin an indexed query, call:
// Provided by VK_EXT_transform_feedback
void vkCmdBeginQueryIndexedEXT(
VkCommandBuffer commandBuffer,
VkQueryPool queryPool,
uint32_t query,
VkQueryControlFlags flags,
uint32_t index);
-
commandBufferis the command buffer into which this command will be recorded. -
queryPoolis the query pool that will manage the results of the query. -
queryis the query index within the query pool that will contain the results. -
flagsis a bitmask of VkQueryControlFlagBits specifying constraints on the types of queries that can be performed. -
indexis the query type specific index. When the query type isVK_QUERY_TYPE_TRANSFORM_FEEDBACK_STREAM_EXTorVK_QUERY_TYPE_PRIMITIVES_GENERATED_EXT, the index represents the vertex stream.
The vkCmdBeginQueryIndexedEXT command operates the same as the
vkCmdBeginQuery command, except that it also accepts a query type
specific index parameter.
This command defines an execution dependency between other query commands that reference the same query index.
The first synchronization scope
includes all commands which reference the queries in queryPool
indicated by query and index that occur earlier in
submission order.
The second synchronization scope
includes all commands which reference the queries in queryPool
indicated by query and index that occur later in
submission order.
The operation of this command happens after the first scope and happens before the second scope.
Bits which can be set in vkCmdBeginQuery::flags, specifying
constraints on the types of queries that can be performed, are:
// Provided by VK_VERSION_1_0
typedef enum VkQueryControlFlagBits {
VK_QUERY_CONTROL_PRECISE_BIT = 0x00000001,
} VkQueryControlFlagBits;
-
VK_QUERY_CONTROL_PRECISE_BITspecifies the precision of occlusion queries.
// Provided by VK_VERSION_1_0
typedef VkFlags VkQueryControlFlags;
VkQueryControlFlags is a bitmask type for setting a mask of zero or
more VkQueryControlFlagBits.
To end a query after the set of desired drawing or dispatching commands is executed, call:
// Provided by VK_VERSION_1_0
void vkCmdEndQuery(
VkCommandBuffer commandBuffer,
VkQueryPool queryPool,
uint32_t query);
-
commandBufferis the command buffer into which this command will be recorded. -
queryPoolis the query pool that is managing the results of the query. -
queryis the query index within the query pool where the result is stored.
The command completes the query in queryPool identified by
query, and marks it as available.
This command defines an execution dependency between other query commands that reference the same query.
The first synchronization scope
includes all commands which reference the queries in queryPool
indicated by query that occur earlier in
submission order.
The second synchronization scope includes only the operation of this command.
Calling vkCmdEndQuery is equivalent to calling
vkCmdEndQueryIndexedEXT with the index parameter set to zero.
To end an indexed query after the set of desired drawing or dispatching commands is recorded, call:
// Provided by VK_EXT_transform_feedback
void vkCmdEndQueryIndexedEXT(
VkCommandBuffer commandBuffer,
VkQueryPool queryPool,
uint32_t query,
uint32_t index);
-
commandBufferis the command buffer into which this command will be recorded. -
queryPoolis the query pool that is managing the results of the query. -
queryis the query index within the query pool where the result is stored. -
indexis the query type specific index.
The command completes the query in queryPool identified by query
and index, and marks it as available.
The vkCmdEndQueryIndexedEXT command operates the same as the
vkCmdEndQuery command, except that it also accepts a query type
specific index parameter.
This command defines an execution dependency between other query commands that reference the same query index.
The first synchronization scope
includes all commands which reference the queries in queryPool
indicated by query that occur earlier in
submission order.
The second synchronization scope includes only the operation of this command.
An application can retrieve results either by requesting they be written
into application-provided memory, or by requesting they be copied into a
VkBuffer.
In either case, the layout in memory is defined as follows:
-
The first query’s result is written starting at the first byte requested by the command, and each subsequent query’s result begins
stridebytes later. -
Occlusion queries, pipeline statistics queries, transform feedback queries, primitives generated queries, mesh shader queries, video encode feedback queries, and timestamp queries store results in a tightly packed array of unsigned integers, either 32- or 64-bits as requested by the command, storing the numerical results and, if requested, the availability status.
-
Performance queries store results in a tightly packed array whose type is determined by the
unitmember of the corresponding VkPerformanceCounterKHR. -
If
VK_QUERY_RESULT_WITH_AVAILABILITY_BITis used, the final element of each query’s result is an integer indicating whether the query’s result is available, with any non-zero value indicating that it is available. -
If
VK_QUERY_RESULT_WITH_STATUS_BIT_KHRis used, the final element of each query’s result is an integer value indicating that status of the query result. Positive values indicate success, negative values indicate failure, and 0 indicates that the result is not yet available. Specific error codes are encoded in the VkQueryResultStatusKHR enumeration. -
Occlusion queries write one integer value - the number of samples passed. Pipeline statistics queries write one integer value for each bit that is enabled in the
pipelineStatisticswhen the pool is created, and the statistics values are written in bit order starting from the least significant bit. Timestamp queries write one integer value. Performance queries write one VkPerformanceCounterResultKHR value for each VkPerformanceCounterKHR in the query. Transform feedback queries write two integers; the first integer is the number of primitives successfully written to the corresponding transform feedback buffer and the second is the number of primitives output to the vertex stream, regardless of whether they were successfully captured or not. In other words, if the transform feedback buffer was sized too small for the number of primitives output by the vertex stream, the first integer represents the number of primitives actually written and the second is the number that would have been written if all the transform feedback buffers associated with that vertex stream were large enough. Primitives generated queries write the number of primitives output to the vertex stream, regardless of whether transform feedback is active or not, or whether they were successfully captured by transform feedback or not. This is identical to the second integer of the transform feedback queries if transform feedback is active. Mesh shader queries write a single integer. Video encode feedback queries write one or more integer values for each bit that is enabled in VkQueryPoolVideoEncodeFeedbackCreateInfoKHR::encodeFeedbackFlagswhen the pool is created, and the feedback values are written in bit order starting from the least significant bit, as described here. -
If more than one query is retrieved and
strideis not at least as large as the size of the array of values corresponding to a single query, the values written to memory are undefined.
To retrieve status and results for a set of queries, call:
// Provided by VK_VERSION_1_0
VkResult vkGetQueryPoolResults(
VkDevice device,
VkQueryPool queryPool,
uint32_t firstQuery,
uint32_t queryCount,
size_t dataSize,
void* pData,
VkDeviceSize stride,
VkQueryResultFlags flags);
-
deviceis the logical device that owns the query pool. -
queryPoolis the query pool managing the queries containing the desired results. -
firstQueryis the initial query index. -
queryCountis the number of queries to read. -
dataSizeis the size in bytes of the buffer pointed to bypData. -
pDatais a pointer to an application-allocated buffer where the results will be written -
strideis the stride in bytes between results for individual queries withinpData. -
flagsis a bitmask of VkQueryResultFlagBits specifying how and when results are returned.
Any results written for a query are written according to a layout dependent on the query type.
If no bits are set in flags, and all requested queries are in the
available state, results are written as an array of 32-bit unsigned integer
values.
Behavior when not all queries are available is described
below.
If VK_QUERY_RESULT_WITH_AVAILABILITY_BIT is set, results for all
queries in queryPool identified by firstQuery and
queryCount are copied to pData, along with an extra availability
or status
value written directly after the results of each query and interpreted as an
unsigned integer.
A value of zero indicates that the results are not yet available, otherwise
the query is complete and results are available.
The size of the availability
or status
values is 64 bits if VK_QUERY_RESULT_64_BIT is set in flags.
Otherwise, it is 32 bits.
If VK_QUERY_RESULT_WITH_STATUS_BIT_KHR is set, results for all queries
in queryPool identified by firstQuery and queryCount are
copied to pData, along with an extra status value written directly
after the results of each query and interpreted as a signed integer.
A value of zero indicates that the results are not yet available.
Positive values indicate that the operations within the query completed
successfully, and the query results are valid.
Negative values indicate that the operations within the query completed
unsuccessfully.
VkQueryResultStatusKHR defines specific meaning for values returned here, though implementations are free to return other values.
If the status value written is negative, indicating that the operations within the query completed unsuccessfully, then all other results written by this command are undefined unless otherwise specified for any of the results of the used query type.
|
Note
|
If |
Results for any available query written by this command are final and
represent the final result of the query.
If VK_QUERY_RESULT_PARTIAL_BIT is set, then for any query that is
unavailable, an intermediate result between zero and the final result value
is written for that query.
Otherwise, any result written by this command is undefined.
If VK_QUERY_RESULT_64_BIT is set, results and, if returned,
availability
or status
values for all queries are written as an array of 64-bit values.
If the queryPool was created with
VK_QUERY_TYPE_PERFORMANCE_QUERY_KHR, results for each query are
written as an array of the type indicated by
VkPerformanceCounterKHR::storage for the counter being queried.
Otherwise, results and availability
or status
values are written as an array of 32-bit values.
If an unsigned integer query’s value overflows the result type, the value
may either wrap or saturate.
If the maintenance7 feature is enabled, for
an unsigned integer query, the 32-bit result value must be equal to the 32
least significant bits of the equivalent 64-bit result value.
If a signed integer query’s value overflows the result type, the value is
undefined.
If a floating-point query’s value is not representable as the result type,
the value is undefined.
If VK_QUERY_RESULT_WAIT_BIT is set, this command defines an execution
dependency with any earlier commands that writes one of the identified
queries.
The first synchronization scope
includes all instances of vkCmdEndQuery,
vkCmdEndQueryIndexedEXT,
vkCmdWriteTimestamp2,
and vkCmdWriteTimestamp that reference any query in queryPool
indicated by firstQuery and queryCount.
The second synchronization scope
includes the host operations of this command.
If VK_QUERY_RESULT_WAIT_BIT is not set, vkGetQueryPoolResults
may return VK_NOT_READY if there are queries in the unavailable
state.
|
Note
|
Applications must take care to ensure that use of the
For example, if a query has been used previously and a command buffer
records the commands The above also applies when A similar situation can arise with the
|
|
Note
|
Applications can double-buffer query pool usage, with a pool per frame, and reset queries at the end of the frame in which they are read. |
Bits which can be set in vkGetQueryPoolResults::flags and
vkCmdCopyQueryPoolResults::flags, specifying how and when
results are returned, are:
// Provided by VK_VERSION_1_0
typedef enum VkQueryResultFlagBits {
VK_QUERY_RESULT_64_BIT = 0x00000001,
VK_QUERY_RESULT_WAIT_BIT = 0x00000002,
VK_QUERY_RESULT_WITH_AVAILABILITY_BIT = 0x00000004,
VK_QUERY_RESULT_PARTIAL_BIT = 0x00000008,
// Provided by VK_KHR_video_queue
VK_QUERY_RESULT_WITH_STATUS_BIT_KHR = 0x00000010,
} VkQueryResultFlagBits;
-
VK_QUERY_RESULT_64_BITspecifies the results will be written as an array of 64-bit unsigned integer values. If this bit is not set, the results will be written as an array of 32-bit unsigned integer values. -
VK_QUERY_RESULT_WAIT_BITspecifies that Vulkan will wait for each query’s status to become available before retrieving its results. -
VK_QUERY_RESULT_WITH_AVAILABILITY_BITspecifies that the availability status accompanies the results. -
VK_QUERY_RESULT_PARTIAL_BITspecifies that returning partial results is acceptable. -
VK_QUERY_RESULT_WITH_STATUS_BIT_KHRspecifies that the last value returned in the query is a VkQueryResultStatusKHR value. See result status query for information on how an application can determine whether the use of this flag bit is supported.
// Provided by VK_VERSION_1_0
typedef VkFlags VkQueryResultFlags;
VkQueryResultFlags is a bitmask type for setting a mask of zero or
more VkQueryResultFlagBits.
Specific status codes that can be returned from a query are:
// Provided by VK_KHR_video_queue
typedef enum VkQueryResultStatusKHR {
VK_QUERY_RESULT_STATUS_ERROR_KHR = -1,
VK_QUERY_RESULT_STATUS_NOT_READY_KHR = 0,
VK_QUERY_RESULT_STATUS_COMPLETE_KHR = 1,
// Provided by VK_KHR_video_encode_queue
VK_QUERY_RESULT_STATUS_INSUFFICIENT_BITSTREAM_BUFFER_RANGE_KHR = -1000299000,
} VkQueryResultStatusKHR;
-
VK_QUERY_RESULT_STATUS_NOT_READY_KHRspecifies that the query result is not yet available. -
VK_QUERY_RESULT_STATUS_ERROR_KHRspecifies that operations did not complete successfully. -
VK_QUERY_RESULT_STATUS_COMPLETE_KHRspecifies that operations completed successfully and the query result is available. -
VK_QUERY_RESULT_STATUS_INSUFFICIENT_BITSTREAM_BUFFER_RANGE_KHRspecifies that a video encode operation did not complete successfully due to the destination video bitstream buffer range not being sufficiently large to fit the encoded bitstream data.
To copy query statuses and numerical results directly to buffer memory, call:
// Provided by VK_VERSION_1_0
void vkCmdCopyQueryPoolResults(
VkCommandBuffer commandBuffer,
VkQueryPool queryPool,
uint32_t firstQuery,
uint32_t queryCount,
VkBuffer dstBuffer,
VkDeviceSize dstOffset,
VkDeviceSize stride,
VkQueryResultFlags flags);
-
commandBufferis the command buffer into which this command will be recorded. -
queryPoolis the query pool managing the queries containing the desired results. -
firstQueryis the initial query index. -
queryCountis the number of queries.firstQueryandqueryCounttogether define a range of queries. -
dstBufferis a VkBuffer object that will receive the results of the copy command. -
dstOffsetis an offset intodstBuffer. -
strideis the stride in bytes between results for individual queries withindstBuffer. The required size of the backing memory fordstBufferis determined as described above for vkGetQueryPoolResults. -
flagsis a bitmask of VkQueryResultFlagBits specifying how and when results are returned.
Any results written for a query are written according to a layout dependent on the query type.
Results for any query in queryPool identified by firstQuery and
queryCount that is available are copied to dstBuffer.
If VK_QUERY_RESULT_WITH_AVAILABILITY_BIT is set, results for all
queries in queryPool identified by firstQuery and
queryCount are copied to dstBuffer, along with an extra
availability value written directly after the results of each query and
interpreted as an unsigned integer.
A value of zero indicates that the results are not yet available, otherwise
the query is complete and results are available.
If VK_QUERY_RESULT_WITH_STATUS_BIT_KHR is set, results for all queries
in queryPool identified by firstQuery and queryCount are
copied to dstBuffer, along with an extra status value written directly
after the results of each query and interpreted as a signed integer.
A value of zero indicates that the results are not yet available.
Positive values indicate that the operations within the query completed
successfully, and the query results are valid.
Negative values indicate that the operations within the query completed
unsuccessfully.
VkQueryResultStatusKHR defines specific meaning for values returned here, though implementations are free to return other values.
If the status value written is negative, indicating that the operations within the query completed unsuccessfully, then all other results written by this command are undefined unless otherwise specified for any of the results of the used query type.
Results for any available query written by this command are final and
represent the final result of the query.
If VK_QUERY_RESULT_PARTIAL_BIT is set, then for any query that is
unavailable, an intermediate result between zero and the final result value
is written for that query.
Otherwise, any result written by this command is undefined.
If VK_QUERY_RESULT_64_BIT is set, results and availability
or status
values for all queries are written as an array of 64-bit values.
If the queryPool was created with
VK_QUERY_TYPE_PERFORMANCE_QUERY_KHR, results for each query are
written as an array of the type indicated by
VkPerformanceCounterKHR::storage for the counter being queried.
Otherwise, results and availability
or status
values are written as an array of 32-bit values.
If an unsigned integer query’s value overflows the result type, the value
may either wrap or saturate.
If the maintenance7 feature is enabled, for
an unsigned integer query, the 32-bit result value must be equal to the 32
least significant bits of the equivalent 64-bit result value.
If a signed integer query’s value overflows the result type, the value is
undefined.
If a floating-point query’s value is not representable as the result type,
the value is undefined.
This command defines an execution dependency between other query commands that reference the same query.
The first synchronization scope
includes all commands which reference the queries in queryPool
indicated by query that occur earlier in
submission order.
If flags does not include VK_QUERY_RESULT_WAIT_BIT,
vkCmdEndQueryIndexedEXT,
vkCmdWriteTimestamp2,
vkCmdEndQuery, and vkCmdWriteTimestamp are excluded from this
scope.
The second synchronization scope
includes all commands which reference the queries in queryPool
indicated by query that occur later in
submission order.
The operation of this command happens after the first scope and happens before the second scope.
vkCmdCopyQueryPoolResults is considered to be a transfer operation,
and its writes to buffer memory must be synchronized using
VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT
before using the results.
Rendering operations such as clears, MSAA resolves, attachment load/store operations, and blits may count towards the results of queries. This behavior is implementation-dependent and may vary depending on the path used within an implementation. For example, some implementations have several types of clears, some of which may include vertices and some not.
18.3. Occlusion Queries
Occlusion queries track the number of samples that pass the per-fragment
tests for a set of drawing commands.
As such, occlusion queries are only available on queue families supporting
graphics operations.
The application can then use these results to inform future rendering
decisions.
An occlusion query is begun and ended by calling vkCmdBeginQuery and
vkCmdEndQuery, respectively.
When an occlusion query begins, the count of passing samples always starts
at zero.
For each drawing command, the count is incremented as described in
Sample Counting.
If flags does not contain VK_QUERY_CONTROL_PRECISE_BIT an
implementation may generate any non-zero result value for the query if the
count of passing samples is non-zero.
|
Note
|
Not setting Setting |
When an occlusion query finishes, the result for that query is marked as
available.
The application can then either copy the result to a buffer (via
vkCmdCopyQueryPoolResults) or request it be put into host memory (via
vkGetQueryPoolResults).
|
Note
|
If occluding geometry is not drawn first, samples can pass the depth test, but still not be visible in a final image. |
18.4. Pipeline Statistics Queries
Pipeline statistics queries allow the application to sample a specified set
of VkPipeline counters.
These counters are accumulated by Vulkan for a set of either drawing or
dispatching commands while a pipeline statistics query is active.
As such, pipeline statistics queries are available on queue families
supporting either graphics or compute operations.
The availability of pipeline statistics queries is indicated by the
pipelineStatisticsQuery member of the VkPhysicalDeviceFeatures
object (see vkGetPhysicalDeviceFeatures and vkCreateDevice for
detecting and requesting this query type on a VkDevice).
A pipeline statistics query is begun and ended by calling
vkCmdBeginQuery and vkCmdEndQuery, respectively.
When a pipeline statistics query begins, all statistics counters are set to
zero.
While the query is active, the pipeline type determines which set of
statistics are available, but these must be configured on the query pool
when it is created.
If a statistic counter is issued on a command buffer that does not support
the corresponding operation, or the counter corresponds to a shading stage
which is missing from any of the pipelines used while the query is active,
the value of that counter is undefined after the query has been made
available.
At least one statistic counter relevant to the operations supported on the
recording command buffer must be enabled.
Bits which can be set in
VkQueryPoolCreateInfo::pipelineStatistics for query pools and in
VkCommandBufferInheritanceInfo::pipelineStatistics for secondary
command buffers, individually enabling pipeline statistics counters, are:
// Provided by VK_VERSION_1_0
typedef enum VkQueryPipelineStatisticFlagBits {
VK_QUERY_PIPELINE_STATISTIC_INPUT_ASSEMBLY_VERTICES_BIT = 0x00000001,
VK_QUERY_PIPELINE_STATISTIC_INPUT_ASSEMBLY_PRIMITIVES_BIT = 0x00000002,
VK_QUERY_PIPELINE_STATISTIC_VERTEX_SHADER_INVOCATIONS_BIT = 0x00000004,
VK_QUERY_PIPELINE_STATISTIC_GEOMETRY_SHADER_INVOCATIONS_BIT = 0x00000008,
VK_QUERY_PIPELINE_STATISTIC_GEOMETRY_SHADER_PRIMITIVES_BIT = 0x00000010,
VK_QUERY_PIPELINE_STATISTIC_CLIPPING_INVOCATIONS_BIT = 0x00000020,
VK_QUERY_PIPELINE_STATISTIC_CLIPPING_PRIMITIVES_BIT = 0x00000040,
VK_QUERY_PIPELINE_STATISTIC_FRAGMENT_SHADER_INVOCATIONS_BIT = 0x00000080,
VK_QUERY_PIPELINE_STATISTIC_TESSELLATION_CONTROL_SHADER_PATCHES_BIT = 0x00000100,
VK_QUERY_PIPELINE_STATISTIC_TESSELLATION_EVALUATION_SHADER_INVOCATIONS_BIT = 0x00000200,
VK_QUERY_PIPELINE_STATISTIC_COMPUTE_SHADER_INVOCATIONS_BIT = 0x00000400,
// Provided by VK_EXT_mesh_shader
VK_QUERY_PIPELINE_STATISTIC_TASK_SHADER_INVOCATIONS_BIT_EXT = 0x00000800,
// Provided by VK_EXT_mesh_shader
VK_QUERY_PIPELINE_STATISTIC_MESH_SHADER_INVOCATIONS_BIT_EXT = 0x00001000,
// Provided by VK_HUAWEI_cluster_culling_shader
VK_QUERY_PIPELINE_STATISTIC_CLUSTER_CULLING_SHADER_INVOCATIONS_BIT_HUAWEI = 0x00002000,
} VkQueryPipelineStatisticFlagBits;
-
VK_QUERY_PIPELINE_STATISTIC_INPUT_ASSEMBLY_VERTICES_BITspecifies that queries managed by the pool will count the number of vertices processed by the input assembly stage. Vertices corresponding to incomplete primitives may contribute to the count. -
VK_QUERY_PIPELINE_STATISTIC_INPUT_ASSEMBLY_PRIMITIVES_BITspecifies that queries managed by the pool will count the number of primitives processed by the input assembly stage. If primitive restart is enabled, restarting the primitive topology has no effect on the count. Incomplete primitives may be counted. -
VK_QUERY_PIPELINE_STATISTIC_VERTEX_SHADER_INVOCATIONS_BITspecifies that queries managed by the pool will count the number of vertex shader invocations. This counter’s value is incremented each time a vertex shader is invoked. -
VK_QUERY_PIPELINE_STATISTIC_GEOMETRY_SHADER_INVOCATIONS_BITspecifies that queries managed by the pool will count the number of geometry shader invocations. This counter’s value is incremented each time a geometry shader is invoked. In the case of instanced geometry shaders, the geometry shader invocations count is incremented for each separate instanced invocation. -
VK_QUERY_PIPELINE_STATISTIC_GEOMETRY_SHADER_PRIMITIVES_BITspecifies that queries managed by the pool will count the number of primitives generated by geometry shader invocations. The counter’s value is incremented each time the geometry shader emits a primitive. Restarting primitive topology using the SPIR-V instructionsOpEndPrimitiveorOpEndStreamPrimitivehas no effect on the geometry shader output primitives count. -
VK_QUERY_PIPELINE_STATISTIC_CLIPPING_INVOCATIONS_BITspecifies that queries managed by the pool will count the number of primitives processed by the Primitive Clipping stage of the pipeline. The counter’s value is incremented each time a primitive reaches the primitive clipping stage. -
VK_QUERY_PIPELINE_STATISTIC_CLIPPING_PRIMITIVES_BITspecifies that queries managed by the pool will count the number of primitives output by the Primitive Clipping stage of the pipeline. The counter’s value is incremented each time a primitive passes the primitive clipping stage. The actual number of primitives output by the primitive clipping stage for a particular input primitive is implementation-dependent but must satisfy the following conditions:-
If at least one vertex of the input primitive lies inside the clipping volume, the counter is incremented by one or more.
-
Otherwise, the counter is incremented by zero or more.
-
-
VK_QUERY_PIPELINE_STATISTIC_FRAGMENT_SHADER_INVOCATIONS_BITspecifies that queries managed by the pool will count the number of fragment shader invocations. The counter’s value is incremented each time the fragment shader is invoked. -
VK_QUERY_PIPELINE_STATISTIC_TESSELLATION_CONTROL_SHADER_PATCHES_BITspecifies that queries managed by the pool will count the number of patches processed by the tessellation control shader. The counter’s value is incremented once for each patch for which a tessellation control shader is invoked. -
VK_QUERY_PIPELINE_STATISTIC_TESSELLATION_EVALUATION_SHADER_INVOCATIONS_BITspecifies that queries managed by the pool will count the number of invocations of the tessellation evaluation shader. The counter’s value is incremented each time the tessellation evaluation shader is invoked. -
VK_QUERY_PIPELINE_STATISTIC_COMPUTE_SHADER_INVOCATIONS_BITspecifies that queries managed by the pool will count the number of compute shader invocations. The counter’s value is incremented every time the compute shader is invoked. Implementations may skip the execution of certain compute shader invocations or execute additional compute shader invocations for implementation-dependent reasons as long as the results of rendering otherwise remain unchanged. -
VK_QUERY_PIPELINE_STATISTIC_TASK_SHADER_INVOCATIONS_BIT_EXTspecifies that queries managed by the pool will count the number of task shader invocations. The counter’s value is incremented every time the task shader is invoked. -
VK_QUERY_PIPELINE_STATISTIC_MESH_SHADER_INVOCATIONS_BIT_EXTspecifies that queries managed by the pool will count the number of mesh shader invocations. The counter’s value is incremented every time the mesh shader is invoked.
These values are intended to measure relative statistics on one implementation. Various device architectures will count these values differently. Any or all counters may be affected by the issues described in Query Operation.
This counting difference is especially true if the pipeline contains mesh or task shaders, which may affect several of the counters in unexpected ways.
|
Note
|
For example, tile-based rendering devices may need to replay the scene multiple times, affecting some of the counts. |
If a pipeline has rasterizerDiscardEnable enabled, implementations
may discard primitives after the final
pre-rasterization shader
stage.
As a result, if rasterizerDiscardEnable is enabled, the clipping input
and output primitives counters may not be incremented.
When a pipeline statistics query finishes, the result for that query is
marked as available.
The application can copy the result to a buffer (via
vkCmdCopyQueryPoolResults), or request it be put into host memory (via
vkGetQueryPoolResults).
// Provided by VK_VERSION_1_0
typedef VkFlags VkQueryPipelineStatisticFlags;
VkQueryPipelineStatisticFlags is a bitmask type for setting a mask of
zero or more VkQueryPipelineStatisticFlagBits.
18.5. Timestamp Queries
Timestamps provide applications with a mechanism for monotonically
tracking the execution of commands.
A timestamp is an integer value generated by the VkPhysicalDevice.
Unlike other queries, timestamps do not operate over a range, and so do not
use vkCmdBeginQuery or vkCmdEndQuery.
The mechanism is built around a set of commands that allow the application
to tell the VkPhysicalDevice to write timestamp values to a
query pool and then either read timestamp values on the
host (using vkGetQueryPoolResults) or copy timestamp values to a
VkBuffer (using vkCmdCopyQueryPoolResults).
The number of valid bits in a timestamp value is determined by the
VkQueueFamilyProperties::timestampValidBits property of the
queue on which the timestamp is written.
Timestamps are supported on any queue which reports a non-zero value for
timestampValidBits via vkGetPhysicalDeviceQueueFamilyProperties.
If the timestampComputeAndGraphics limit is VK_TRUE, timestamps are
supported by every queue family that supports either graphics or compute
operations (see VkQueueFamilyProperties).
The number of nanoseconds it takes for a timestamp value to be incremented
by 1 can be obtained from
VkPhysicalDeviceLimits::timestampPeriod after a call to
vkGetPhysicalDeviceProperties.
To request a timestamp and write the value to memory, call:
// Provided by VK_VERSION_1_3
void vkCmdWriteTimestamp2(
VkCommandBuffer commandBuffer,
VkPipelineStageFlags2 stage,
VkQueryPool queryPool,
uint32_t query);
or the equivalent command
// Provided by VK_KHR_synchronization2
void vkCmdWriteTimestamp2KHR(
VkCommandBuffer commandBuffer,
VkPipelineStageFlags2 stage,
VkQueryPool queryPool,
uint32_t query);
-
commandBufferis the command buffer into which the command will be recorded. -
stagespecifies a stage of the pipeline. -
queryPoolis the query pool that will manage the timestamp. -
queryis the query within the query pool that will contain the timestamp.
When vkCmdWriteTimestamp2 is submitted to a queue, it defines an
execution dependency on commands that were submitted before it, and writes a
timestamp to a query pool.
The first synchronization scope
includes all commands that occur earlier in
submission order.
The synchronization scope is limited to operations on the pipeline stage
specified by stage.
The second synchronization scope includes only the timestamp write operation.
|
Note
|
Implementations may write the timestamp at any stage that is
logically later than |
Any timestamp write that happens-after another timestamp write in the same submission must not
have a lower value unless its value overflows the maximum supported integer
bit width of the query.
If
VK_KHR_calibrated_timestamps
or
VK_EXT_calibrated_timestamps
is enabled, this extends to timestamp writes across all submissions on the
same logical device: any timestamp write that
happens-after another must not
have a lower value unless its value overflows the maximum supported integer
bit width of the query.
Timestamps written by this command must be in the
VK_TIME_DOMAIN_DEVICE_KHR
time domain.
If an overflow occurs, the timestamp value must wrap back to zero.
If vkCmdWriteTimestamp2 is called while executing a render pass
instance that has multiview enabled, the timestamp uses N consecutive
query indices in the query pool (starting at query) where N is
the number of bits set in the view mask of the subpass or dynamic render
pass the command is executed in.
The resulting query values are determined by an implementation-dependent
choice of one of the following behaviors:
-
The first query is a timestamp value and (if more than one bit is set in the view mask) zero is written to the remaining queries.
-
All N queries are timestamp values.
Either way, if two timestamps are written in the same subpass or dynamic render pass with multiview enabled, each of the N consecutive queries written for a timestamp must not have a lower value than the queries with corresponding indices written by the timestamp that happens-before unless the value overflows the maximum supported integer bit width of the query.
To request a timestamp and write the value to memory, call:
// Provided by VK_VERSION_1_0
void vkCmdWriteTimestamp(
VkCommandBuffer commandBuffer,
VkPipelineStageFlagBits pipelineStage,
VkQueryPool queryPool,
uint32_t query);
-
commandBufferis the command buffer into which the command will be recorded. -
pipelineStageis a VkPipelineStageFlagBits value, specifying a stage of the pipeline. -
queryPoolis the query pool that will manage the timestamp. -
queryis the query within the query pool that will contain the timestamp.
When vkCmdWriteTimestamp is submitted to a queue, it defines an
execution dependency on commands that were submitted before it, and writes a
timestamp to a query pool.
The first synchronization scope
includes all commands that occur earlier in
submission order.
The synchronization scope is limited to operations on the pipeline stage
specified by pipelineStage.
The second synchronization scope includes only the timestamp write operation.
|
Note
|
Implementations may write the timestamp at any stage that is
logically later than |
Any timestamp write that happens-after another timestamp write in the same submission must not
have a lower value unless its value overflows the maximum supported integer
bit width of the query.
If
VK_KHR_calibrated_timestamps
or
VK_EXT_calibrated_timestamps
is enabled, this extends to timestamp writes across all submissions on the
same logical device: any timestamp write that
happens-after another must not
have a lower value unless its value overflows the maximum supported integer
bit width of the query.
Timestamps written by this command must be in the
VK_TIME_DOMAIN_DEVICE_KHR
time domain.
If an overflow occurs, the timestamp value must wrap back to zero.
If vkCmdWriteTimestamp is called while executing a render pass
instance that has multiview enabled, the timestamp uses N consecutive
query indices in the query pool (starting at query) where N is
the number of bits set in the view mask of the subpass or dynamic render
pass the command is executed in.
The resulting query values are determined by an implementation-dependent
choice of one of the following behaviors:
-
The first query is a timestamp value and (if more than one bit is set in the view mask) zero is written to the remaining queries.
-
All N queries are timestamp values.
Either way, if two timestamps are written in the same subpass or dynamic render pass with multiview enabled, each of the N consecutive queries written for a timestamp must not have a lower value than the queries with corresponding indices written by the timestamp that happens-before unless the value overflows the maximum supported integer bit width of the query.
18.6. Performance Queries
Performance queries provide applications with a mechanism for getting performance counter information about the execution of command buffers, render passes, and commands.
Each queue family advertises the performance counters that can be queried on a queue of that family via a call to vkEnumeratePhysicalDeviceQueueFamilyPerformanceQueryCountersKHR. Implementations may limit access to performance counters based on platform requirements or only to specialized drivers for development purposes.
|
Note
|
This may include no performance counters being enumerated, or a reduced set. Please refer to platform-specific documentation for guidance on any such restrictions. |
Performance queries use the existing vkCmdBeginQuery and vkCmdEndQuery to control what command buffers, render passes, or commands to get performance information for.
Implementations may require multiple passes where the command buffer, render passes, or commands being recorded are the same and are executed on the same queue to record performance counter data. This is achieved by submitting the same batch and providing a VkPerformanceQuerySubmitInfoKHR structure containing a counter pass index. The number of passes required for a given performance query pool can be queried via a call to vkGetPhysicalDeviceQueueFamilyPerformanceQueryPassesKHR.
|
Note
|
Command buffers created with
|
Performance counter results from a performance query pool can be obtained with the command vkGetQueryPoolResults.
The VkPerformanceCounterResultKHR union is defined as:
// Provided by VK_KHR_performance_query
typedef union VkPerformanceCounterResultKHR {
int32_t int32;
int64_t int64;
uint32_t uint32;
uint64_t uint64;
float float32;
double float64;
} VkPerformanceCounterResultKHR;
-
int32is a 32-bit signed integer value. -
int64is a 64-bit signed integer value. -
uint32is a 32-bit unsigned integer value. -
uint64is a 64-bit unsigned integer value. -
float32is a 32-bit floating-point value. -
float64is a 64-bit floating-point value.
Performance query results are returned in an array of
VkPerformanceCounterResultKHR unions containing the data associated
with each counter in the query, stored in the same order as the counters
supplied in pCounterIndices when creating the performance query.
VkPerformanceCounterKHR::storage specifies how to parse the
counter data.
18.6.1. Profiling Lock
To record and submit a command buffer containing a performance query pool the profiling lock must be held. The profiling lock must be acquired prior to any call to vkBeginCommandBuffer that will be using a performance query pool. The profiling lock must be held while any command buffer containing a performance query pool is in the recording, executable, or pending state. To acquire the profiling lock, call:
// Provided by VK_KHR_performance_query
VkResult vkAcquireProfilingLockKHR(
VkDevice device,
const VkAcquireProfilingLockInfoKHR* pInfo);
-
deviceis the logical device to profile. -
pInfois a pointer to a VkAcquireProfilingLockInfoKHR structure containing information about how the profiling is to be acquired.
Implementations may allow multiple actors to hold the profiling lock concurrently.
The VkAcquireProfilingLockInfoKHR structure is defined as:
// Provided by VK_KHR_performance_query
typedef struct VkAcquireProfilingLockInfoKHR {
VkStructureType sType;
const void* pNext;
VkAcquireProfilingLockFlagsKHR flags;
uint64_t timeout;
} VkAcquireProfilingLockInfoKHR;
-
sTypeis a VkStructureType value identifying this structure. -
pNextisNULLor a pointer to a structure extending this structure. -
flagsis reserved for future use. -
timeoutindicates how long the function waits, in nanoseconds, if the profiling lock is not available.
If timeout is 0, vkAcquireProfilingLockKHR will not block while
attempting to acquire the profiling lock.
If timeout is UINT64_MAX, the function will not return until the
profiling lock was acquired.
// Provided by VK_KHR_performance_query
typedef enum VkAcquireProfilingLockFlagBitsKHR {
} VkAcquireProfilingLockFlagBitsKHR;
// Provided by VK_KHR_performance_query
typedef VkFlags VkAcquireProfilingLockFlagsKHR;
VkAcquireProfilingLockFlagsKHR is a bitmask type for setting a mask, but is currently reserved for future use.
To release the profiling lock, call:
// Provided by VK_KHR_performance_query
void vkReleaseProfilingLockKHR(
VkDevice device);
-
deviceis the logical device to cease profiling on.
18.7. Transform Feedback Queries
Transform feedback queries track the number of primitives attempted to be
written and actually written, by the vertex stream being captured, to a
transform feedback buffer.
This query is updated during drawing commands while transform feedback is
active.
The number of primitives actually written will be less than the number
attempted to be written if the bound transform feedback buffer size was too
small for the number of primitives actually drawn.
Primitives are not written beyond the bound range of the transform feedback
buffer.
A transform feedback query is begun and ended by calling
vkCmdBeginQuery and vkCmdEndQuery, respectively to query for
vertex stream zero.
vkCmdBeginQueryIndexedEXT and vkCmdEndQueryIndexedEXT can be
used to begin and end transform feedback queries for any supported vertex
stream.
When a transform feedback query begins, the count of primitives written and
primitives needed starts from zero.
For each drawing command, the count is incremented as vertex attribute
outputs are captured to the transform feedback buffers while transform
feedback is active.
When a transform feedback query finishes, the result for that query is
marked as available.
The application can then either copy the result to a buffer (via
vkCmdCopyQueryPoolResults) or request it be put into host memory (via
vkGetQueryPoolResults).
18.8. Primitives Generated Queries
When a generated primitive query for a vertex stream is active, the
primitives-generated count is incremented every time a primitive emitted to
that stream reaches the transform feedback stage, whether or not transform
feedback is active.
A primitives generated query is begun and ended by calling
vkCmdBeginQuery and vkCmdEndQuery, respectively to query for
vertex stream zero.
vkCmdBeginQueryIndexedEXT and vkCmdEndQueryIndexedEXT can be
used to begin and end primitives generated queries for any supported vertex
stream.
When a primitives generated query begins, the count of primitives generated
starts from zero.
When a primitives generated query finishes, the result for that query is
marked as available.
The application can then either copy the result to a buffer (via
vkCmdCopyQueryPoolResults) or request it be put into host memory (via
vkGetQueryPoolResults).
|
Note
|
The result of this query is typically identical to
|
18.9. Mesh Shader Queries
When a generated mesh primitives query is active, the mesh-primitives-generated count is incremented every time a primitive emitted from the mesh shader stage reaches the fragment shader stage. When a generated mesh primitives query begins, the mesh-primitives-generated count starts from zero.
Mesh and task shader pipeline statistics queries function the same way that invocation queries work for other shader stages, counting the number of times the respective shader stage has been run. When the statistics query begins, the invocation counters start from zero.
18.10. Intel Performance Queries
Intel performance queries allow an application to capture performance data for a set of commands. Performance queries are used in a similar way than other types of queries. A main difference with existing queries is that the resulting data should be handed over to a library capable to produce human readable results rather than being read directly by an application.
Prior to creating a performance query pool, initialize the device for performance queries with the call:
// Provided by VK_INTEL_performance_query
VkResult vkInitializePerformanceApiINTEL(
VkDevice device,
const VkInitializePerformanceApiInfoINTEL* pInitializeInfo);
-
deviceis the logical device used for the queries. -
pInitializeInfois a pointer to a VkInitializePerformanceApiInfoINTEL structure specifying initialization parameters.
The VkInitializePerformanceApiInfoINTEL structure is defined as :
// Provided by VK_INTEL_performance_query
typedef struct VkInitializePerformanceApiInfoINTEL {
VkStructureType sType;
const void* pNext;
void* pUserData;
} VkInitializePerformanceApiInfoINTEL;
-
sTypeis a VkStructureType value identifying this structure. -
pNextisNULLor a pointer to a structure extending this structure. -
pUserDatais a pointer for application data.
Once performance query operations have completed, uninitialize the device for performance queries with the call:
// Provided by VK_INTEL_performance_query
void vkUninitializePerformanceApiINTEL(
VkDevice device);
-
deviceis the logical device used for the queries.
Some performance query features of a device can be discovered with the call:
// Provided by VK_INTEL_performance_query
VkResult vkGetPerformanceParameterINTEL(
VkDevice device,
VkPerformanceParameterTypeINTEL parameter,
VkPerformanceValueINTEL* pValue);
-
deviceis the logical device to query. -
parameteris the parameter to query. -
pValueis a pointer to a VkPerformanceValueINTEL structure in which the type and value of the parameter are returned.
Possible values of vkGetPerformanceParameterINTEL::parameter,
specifying a performance query feature, are:
// Provided by VK_INTEL_performance_query
typedef enum VkPerformanceParameterTypeINTEL {
VK_PERFORMANCE_PARAMETER_TYPE_HW_COUNTERS_SUPPORTED_INTEL = 0,
VK_PERFORMANCE_PARAMETER_TYPE_STREAM_MARKER_VALID_BITS_INTEL = 1,
} VkPerformanceParameterTypeINTEL;
-
VK_PERFORMANCE_PARAMETER_TYPE_HW_COUNTERS_SUPPORTED_INTELhas a boolean result which tells whether hardware counters can be captured. -
VK_PERFORMANCE_PARAMETER_TYPE_STREAM_MARKER_VALID_BITS_INTELhas a 32 bits integer result which tells how many bits can be written into theVkPerformanceValueINTELvalue.
The VkPerformanceValueINTEL structure is defined as:
// Provided by VK_INTEL_performance_query
typedef struct VkPerformanceValueINTEL {
VkPerformanceValueTypeINTEL type;
VkPerformanceValueDataINTEL data;
} VkPerformanceValueINTEL;
-
typeis a VkPerformanceValueTypeINTEL value specifying the type of the returned data. -
datais a VkPerformanceValueDataINTEL union specifying the value of the returned data.
Possible values of VkPerformanceValueINTEL::type, specifying the
type of the data returned in VkPerformanceValueINTEL::data, are:
-
VK_PERFORMANCE_VALUE_TYPE_UINT32_INTELspecifies that unsigned 32-bit integer data is returned indata.value32. -
VK_PERFORMANCE_VALUE_TYPE_UINT64_INTELspecifies that unsigned 64-bit integer data is returned indata.value64. -
VK_PERFORMANCE_VALUE_TYPE_FLOAT_INTELspecifies that floating-point data is returned indata.valueFloat. -
VK_PERFORMANCE_VALUE_TYPE_BOOL_INTELspecifies that VkBool32 data is returned indata.valueBool. -
VK_PERFORMANCE_VALUE_TYPE_STRING_INTELspecifies that a pointer to a null-terminated UTF-8 string is returned indata.valueString. The pointer is valid for the lifetime of thedeviceparameter passed to vkGetPerformanceParameterINTEL.
// Provided by VK_INTEL_performance_query
typedef enum VkPerformanceValueTypeINTEL {
VK_PERFORMANCE_VALUE_TYPE_UINT32_INTEL = 0,
VK_PERFORMANCE_VALUE_TYPE_UINT64_INTEL = 1,
VK_PERFORMANCE_VALUE_TYPE_FLOAT_INTEL = 2,
VK_PERFORMANCE_VALUE_TYPE_BOOL_INTEL = 3,
VK_PERFORMANCE_VALUE_TYPE_STRING_INTEL = 4,
} VkPerformanceValueTypeINTEL;
The VkPerformanceValueDataINTEL union is defined as:
// Provided by VK_INTEL_performance_query
typedef union VkPerformanceValueDataINTEL {
uint32_t value32;
uint64_t value64;
float valueFloat;
VkBool32 valueBool;
const char* valueString;
} VkPerformanceValueDataINTEL;
-
value32represents 32-bit integer data. -
value64represents 64-bit integer data. -
valueFloatrepresents floating-point data. -
valueBoolrepresents VkBool32 data. -
valueStringrepresents a pointer to a null-terminated UTF-8 string.
The correct member of the union is determined by the associated VkPerformanceValueTypeINTEL value.
The VkQueryPoolPerformanceQueryCreateInfoINTEL structure is defined
as:
// Provided by VK_INTEL_performance_query
typedef struct VkQueryPoolPerformanceQueryCreateInfoINTEL {
VkStructureType sType;
const void* pNext;
VkQueryPoolSamplingModeINTEL performanceCountersSampling;
} VkQueryPoolPerformanceQueryCreateInfoINTEL;
// Provided by VK_INTEL_performance_query
typedef VkQueryPoolPerformanceQueryCreateInfoINTEL VkQueryPoolCreateInfoINTEL;
To create a pool for Intel performance queries, set
VkQueryPoolCreateInfo::queryType to
VK_QUERY_TYPE_PERFORMANCE_QUERY_INTEL and add a
VkQueryPoolPerformanceQueryCreateInfoINTEL structure to the
pNext chain of the VkQueryPoolCreateInfo structure.
-
sTypeis a VkStructureType value identifying this structure. -
pNextisNULLor a pointer to a structure extending this structure. -
performanceCountersSamplingdescribe how performance queries should be captured.
Possible values of
VkQueryPoolPerformanceQueryCreateInfoINTEL::performanceCountersSampling
are:
// Provided by VK_INTEL_performance_query
typedef enum VkQueryPoolSamplingModeINTEL {
VK_QUERY_POOL_SAMPLING_MODE_MANUAL_INTEL = 0,
} VkQueryPoolSamplingModeINTEL;
-
VK_QUERY_POOL_SAMPLING_MODE_MANUAL_INTELis the default mode in which the application calls vkCmdBeginQuery and vkCmdEndQuery to record performance data.
To help associate query results with a particular point at which an application emitted commands, markers can be set into the command buffers with the call:
// Provided by VK_INTEL_performance_query
VkResult vkCmdSetPerformanceMarkerINTEL(
VkCommandBuffer commandBuffer,
const VkPerformanceMarkerInfoINTEL* pMarkerInfo);
The last marker set onto a command buffer before the end of a query will be part of the query result.
The VkPerformanceMarkerInfoINTEL structure is defined as:
// Provided by VK_INTEL_performance_query
typedef struct VkPerformanceMarkerInfoINTEL {
VkStructureType sType;
const void* pNext;
uint64_t marker;
} VkPerformanceMarkerInfoINTEL;
-
sTypeis a VkStructureType value identifying this structure. -
pNextisNULLor a pointer to a structure extending this structure. -
markeris the marker value that will be recorded into the opaque query results.
When monitoring the behavior of an application within the dataset generated by the entire set of applications running on the system, it is useful to identify draw calls within a potentially huge amount of performance data. To do so, application can generate stream markers that will be used to trace back a particular draw call with a particular performance data item.
// Provided by VK_INTEL_performance_query
VkResult vkCmdSetPerformanceStreamMarkerINTEL(
VkCommandBuffer commandBuffer,
const VkPerformanceStreamMarkerInfoINTEL* pMarkerInfo);
-
commandBufferis a VkCommandBuffer into which a stream marker is added. -
pMarkerInfois a pointer to a VkPerformanceStreamMarkerInfoINTEL structure describing the marker to insert.
The VkPerformanceStreamMarkerInfoINTEL structure is defined as:
// Provided by VK_INTEL_performance_query
typedef struct VkPerformanceStreamMarkerInfoINTEL {
VkStructureType sType;
const void* pNext;
uint32_t marker;
} VkPerformanceStreamMarkerInfoINTEL;
-
sTypeis a VkStructureType value identifying this structure. -
pNextisNULLor a pointer to a structure extending this structure. -
markeris the marker value that will be recorded into the reports consumed by an external application.
Some applications might want measure the effect of a set of commands with a different settings. It is possible to override a particular settings using :
// Provided by VK_INTEL_performance_query
VkResult vkCmdSetPerformanceOverrideINTEL(
VkCommandBuffer commandBuffer,
const VkPerformanceOverrideInfoINTEL* pOverrideInfo);
-
commandBufferis the command buffer where the override takes place. -
pOverrideInfois a pointer to a VkPerformanceOverrideInfoINTEL structure selecting the parameter to override.
The VkPerformanceOverrideInfoINTEL structure is defined as:
// Provided by VK_INTEL_performance_query
typedef struct VkPerformanceOverrideInfoINTEL {
VkStructureType sType;
const void* pNext;
VkPerformanceOverrideTypeINTEL type;
VkBool32 enable;
uint64_t parameter;
} VkPerformanceOverrideInfoINTEL;
-
sTypeis a VkStructureType value identifying this structure. -
pNextisNULLor a pointer to a structure extending this structure. -
typeis the particular VkPerformanceOverrideTypeINTEL to set. -
enabledefines whether the override is enabled. -
parameteris a potential required parameter for the override.
Possible values of VkPerformanceOverrideInfoINTEL::type,
specifying performance override types, are:
// Provided by VK_INTEL_performance_query
typedef enum VkPerformanceOverrideTypeINTEL {
VK_PERFORMANCE_OVERRIDE_TYPE_NULL_HARDWARE_INTEL = 0,
VK_PERFORMANCE_OVERRIDE_TYPE_FLUSH_GPU_CACHES_INTEL = 1,
} VkPerformanceOverrideTypeINTEL;
-
VK_PERFORMANCE_OVERRIDE_TYPE_NULL_HARDWARE_INTELturns all rendering operations into noop. -
VK_PERFORMANCE_OVERRIDE_TYPE_FLUSH_GPU_CACHES_INTELstalls the stream of commands until all previously emitted commands have completed and all caches been flushed and invalidated.
Before submitting command buffers containing performance queries commands to a device queue, the application must acquire and set a performance query configuration. The configuration can be released once all command buffers containing performance query commands are not in a pending state.
// Provided by VK_INTEL_performance_query
VK_DEFINE_NON_DISPATCHABLE_HANDLE(VkPerformanceConfigurationINTEL)
To acquire a device performance configuration, call:
// Provided by VK_INTEL_performance_query
VkResult vkAcquirePerformanceConfigurationINTEL(
VkDevice device,
const VkPerformanceConfigurationAcquireInfoINTEL* pAcquireInfo,
VkPerformanceConfigurationINTEL* pConfiguration);
-
deviceis the logical device that the performance query commands will be submitted to. -
pAcquireInfois a pointer to a VkPerformanceConfigurationAcquireInfoINTEL structure, specifying the performance configuration to acquire. -
pConfigurationis a pointer to aVkPerformanceConfigurationINTELhandle in which the resulting configuration object is returned.
The VkPerformanceConfigurationAcquireInfoINTEL structure is defined
as:
// Provided by VK_INTEL_performance_query
typedef struct VkPerformanceConfigurationAcquireInfoINTEL {
VkStructureType sType;
const void* pNext;
VkPerformanceConfigurationTypeINTEL type;
} VkPerformanceConfigurationAcquireInfoINTEL;
-
sTypeis a VkStructureType value identifying this structure. -
pNextisNULLor a pointer to a structure extending this structure. -
typeis one of the VkPerformanceConfigurationTypeINTEL type of performance configuration that will be acquired.
Possible values of
VkPerformanceConfigurationAcquireInfoINTEL::type, specifying
performance configuration types, are:
// Provided by VK_INTEL_performance_query
typedef enum VkPerformanceConfigurationTypeINTEL {
VK_PERFORMANCE_CONFIGURATION_TYPE_COMMAND_QUEUE_METRICS_DISCOVERY_ACTIVATED_INTEL = 0,
} VkPerformanceConfigurationTypeINTEL;
To set a performance configuration, call:
// Provided by VK_INTEL_performance_query
VkResult vkQueueSetPerformanceConfigurationINTEL(
VkQueue queue,
VkPerformanceConfigurationINTEL configuration);
-
queueis the queue on which the configuration will be used. -
configurationis the configuration to use.
To release a device performance configuration, call:
// Provided by VK_INTEL_performance_query
VkResult vkReleasePerformanceConfigurationINTEL(
VkDevice device,
VkPerformanceConfigurationINTEL configuration);
-
deviceis the device associated to the configuration object to release. -
configurationis the configuration object to release.
18.11. Result Status Queries
Result status queries serve a single purpose: allowing the application to
determine whether a set of operations have completed successfully or not, as
indicated by the VkQueryResultStatusKHR value written when retrieving
the result of a query using the VK_QUERY_RESULT_WITH_STATUS_BIT_KHR
flag.
Unlike other query types, result status queries do not track or maintain any other data beyond the completion status, thus no other data is written when retrieving their results.
Support for result status queries is indicated by
VkQueueFamilyQueryResultStatusPropertiesKHR::queryResultStatusSupport
, as returned by vkGetPhysicalDeviceQueueFamilyProperties2 for the
queue family in question.
18.12. Video Encode Feedback Queries
Video encode feedback queries allow the application to capture feedback
values generated by video encode operations.
As such, video encode feedback queries are available on queue families
supporting video encode operations.
The availability of individual video encode feedback values is indicated by
the bits of
VkVideoEncodeCapabilitiesKHR::supportedEncodeFeedbackFlags, as
returned by vkGetPhysicalDeviceVideoCapabilitiesKHR for the
video profile the queries are intended to be used with.
The set of enabled video encode feedback values must be configured on the
query pool when it is created using the encodeFeedbackFlags member of
the VkQueryPoolVideoEncodeFeedbackCreateInfoKHR included in the
pNext chain of VkQueryPoolCreateInfo.
The VkQueryPoolVideoEncodeFeedbackCreateInfoKHR structure is defined
as:
// Provided by VK_KHR_video_encode_queue
typedef struct VkQueryPoolVideoEncodeFeedbackCreateInfoKHR {
VkStructureType sType;
const void* pNext;
VkVideoEncodeFeedbackFlagsKHR encodeFeedbackFlags;
} VkQueryPoolVideoEncodeFeedbackCreateInfoKHR;
-
sTypeis a VkStructureType value identifying this structure. -
pNextisNULLor a pointer to a structure extending this structure. -
encodeFeedbackFlagsis a bitmask of VkVideoEncodeFeedbackFlagBitsKHR values specifying the set of enabled video encode feedback values captured by queries of the new pool.
Bits which can be set in
VkQueryPoolVideoEncodeFeedbackCreateInfoKHR::encodeFeedbackFlags
for video encode feedback query pools are:
// Provided by VK_KHR_video_encode_queue
typedef enum VkVideoEncodeFeedbackFlagBitsKHR {
VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_BUFFER_OFFSET_BIT_KHR = 0x00000001,
VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_BYTES_WRITTEN_BIT_KHR = 0x00000002,
VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_HAS_OVERRIDES_BIT_KHR = 0x00000004,
} VkVideoEncodeFeedbackFlagBitsKHR;
-
VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_BUFFER_OFFSET_BIT_KHRspecifies that queries managed by the pool will capture the byte offset of the bitstream data written by the video encode operation to the bitstream buffer specified in VkVideoEncodeInfoKHR::dstBufferrelative to the offset specified in VkVideoEncodeInfoKHR::dstBufferOffset. For the first video encode operation issued by any video encode command, this value will always be zero, meaning that bitstream data is always written to the buffer specified in VkVideoEncodeInfoKHR::dstBufferstarting from the offset specified in VkVideoEncodeInfoKHR::dstBufferOffset. -
VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_BYTES_WRITTEN_BIT_KHRspecifies that queries managed by the pool will capture the number of bytes written by the video encode operation to the bitstream buffer specified in VkVideoEncodeInfoKHR::dstBuffer. -
VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_HAS_OVERRIDES_BIT_KHRspecifies that queries managed by the pool will capture a boolean value indicating that the data written to the bitstream buffer specified in VkVideoEncodeInfoKHR::dstBuffercontains overridden parameters.
When retrieving the results of video encode feedback queries, the values
corresponding to each enabled video encode feedback are written in the order
of the bits defined above, followed by an optional value indicating
availability or result status if VK_QUERY_RESULT_WITH_AVAILABILITY_BIT
or VK_QUERY_RESULT_WITH_STATUS_BIT_KHR is specified, respectively.
If the result status of a video encode feedback query is negative, then the results of all enabled video encode feedback values will be undefined.
|
Note
|
Applications should always specify |
// Provided by VK_KHR_video_encode_queue
typedef VkFlags VkVideoEncodeFeedbackFlagsKHR;
VkVideoEncodeFeedbackFlagsKHR is a bitmask type for setting a mask of
zero or more VkVideoEncodeFeedbackFlagBitsKHR.