khronosgroup / ktx-specification Goto Github PK

KTX file format source

License: Other

Makefile 8.37% Shell 2.81% HTML 2.38% Batchfile 2.46% CSS 65.61% Ruby 8.00% C 10.37%

texture-file-format ktx ktx2 specification-source

ktx-specification's Introduction

Home of the KTX File Format Specification

KTX is a file format that can be used for storing GPU-ready texture data (with cubemaps, mip levels, etc). Like DDS but with more features and more formal specification. It supports Basis Universal transcodable formats and supercompression which can yield JPEG-sized universal textures. glTF will use Basis Universal textures in KTX v2 containers.

Click to see the latest published versions of the KTX File Format Specification or the KTX Fragment URI Specification from the Khronos KTX Registry (they look much better than the ersatz views provided by GitHub) or run

make

in a Unix-like environment with AsciiDoctor installed to generate the publishable specs. They are the files out/ktxspec.v2.html and out/ktx-frag.html. Everything needed is inlined.

The canonical KTX spec. text is in the file ktxspec.adoc. The canonical fragment URI spec. text is in the file ktx-frag.html.

If you have questions or comments that don't merit creating an issue such as "why did you do so-and-so?" use GitHub Discussions.

GPU texture format mappings

To ensure correct mappings from Vulkan's VkFormat to other GPU APIs, this repo additionally contains:

JSON database (schema) with mappings to OpenGL, Direct3D, and Metal enums.
Switch-case generator that produces 5 files with simple C-like case-return statements.

Usage: ./generate_format_switches.rb [<out_dir>]
Compile test of the case statements that serves as an example of use.

$Date$ keyword expansion

A few auxiliary files have $Date$ keywords. If you care about having the proper dates shown on files in your workspace, you must follow the instructions below.

$Date$ keywords are expanded via a smudge & clean filter. To install the filter, issue the following commands in the root of your clone.

On Unix (Linux, Mac OS X, etc.) platforms and Windows using Git for Windows' Git Bash or Cygwin's bash terminal:

./install-gitconfig.sh
rm TODO.md
git checkout TODO.md

On Windows with the Command Prompt (requires git.exe in a directory on your %PATH%):

install-gitconfig.bat
del TODO.md
git checkout TODO.md

The first command adds an [include] of the repo's .gitconfig to the local git config file .git/config, i.e. the one in your clone of the repo. .gitconfig contains the config of the "dater" filter. The remaining commands force a new checkout of the affected files to smudge them with the date. These two are unnecessary if you plan to edit these files.

ktx-specification's People

Contributors

Stargazers

Watchers

Forkers

markcallow karanlos yazici gmadsciencelab warrenm lexaknyazev luoluozz waywardmonkeys donmccurdy hl0071 seanpm2001

ktx-specification's Issues

Tighten levelImages layout

levelImages structure is:

for each array_element in numberOfArrayElements 
   for each face in numberOfFaces 
       for each z_slice in pixelDepth 
           for each row or row_of_blocks in pixelHeight 
               for each pixel or block_of_pixels in pixelWidth
                   Byte data[format-specific-number-of-bytes] 
               end
           end
       end
   end
end

Since format-specific-number-of-bytes may be unknown, an additional restriction is needed:
bytesOfUncompressedLevelImages % (numberOfArrayElements * numberOfFaces) == 0

Also z_slice should be changed to something like z_slice or rows_of_blocks to accommodate compressed 3D formats.

Clarify KTXglFormat usage for compressed formats

The metadata entry provides three enums, while compressed formats need only one.
The spec must say which field must be used and what values must unused fields have.

Support multi-sample images?

Over the years I've received several requests to support multi-sample images in KTX. Since it was not possible, without breaking compatibility, I never pursued details of what the requesters wanted.

We will ask the Vulkan Advisory Panel for insights but any other comments on this topic are welcome.

mipPadding & cubePadding are unnecessary

Since uncompressed image rows are required to have the default OpenGL UNPACK_ALIGN of 4, i.e, all rows will be padded to the next 4-byte boundary and all compressed formats have multiple of 4-bytes block sizes, the mipPadding and cubePadding fields are unnecessary and should be removed from the spec. Faces' data and levels' imageSizes will always be on 4-bytes boundaries.

Issues with specifications

Following issues/remarks apply to documentation in https://www.khronos.org/opengles/sdk/tools/KTX/file_format_spec/

Issues in An example KTX file: section

0x0D, 0x0A, 0x1A, 0x0A. // final four bytes of Byte[12] identifier should be 0x0D, 0x0A, 0x1A, 0x0A, // final four bytes of Byte[12] identifier
0x6A, 0x6F, 0x6B, 0x65, // UTF8 v: 'gles2\0' 0x6A, 0x6F, 0x6B, 0x65 means j o k e, not g l e s

Text above is copied from KhronosGroup/KTX-Software#43 as requested,

Is it necessary to mark metadata as "required"?

It has been suggested that KTX2 should provide a way to mark metadata as required or not. This issue is for tracking that discussion.

Streamline build process

Why does TODO.md need to be updated with the latest git log date given that it never gets published?
inlineimages.pl could be rewritten in Ruby given that Ruby is required for Asciidoctor anyway. This would reduce the number of build deps: perl and base64.

[Discussion] Allow for multiple copies of the same image to be transmitted in one KTX2 file

Earlier last year we reached a verbal consensus that a KTX2 file could contain multiple encodings of the same image file. This would be an option for when ETC1S transcodable format is not suitable, ie for lossless images or content that is not well suited for block compression.

The size of the file will obviously be larger with these multiple representations, but it will likely still be required

Note that we would not be permitted to use this technique to transmit multiple different images, such as packaging up all textures used by a material (albedo, normals, specular etc.)) as that would break the glTF material model (noted by Alexey)

DFD usage and implementation

The following feedback is based on DFD revision 1.2.

The DFD spec gives a lot of flexibility for providing formal definitions of "unknown" pixel formats such as uneven or padded bit widths, Bayer patterns, etc. Many DFD options duplicate fixed vkFormat definitions, so this leads to lots of implementation corner cases.

DFD fields are:

Per texel fields

color_model

    KHR_DF_MODEL_UNSPECIFIED
    KHR_DF_MODEL_RGBSDA
    KHR_DF_MODEL_YUVSDA
    KHR_DF_MODEL_YIQSDA
    KHR_DF_MODEL_LABSDA
    KHR_DF_MODEL_CMYKA
    KHR_DF_MODEL_XYZW
    KHR_DF_MODEL_HSVA_ANG
    KHR_DF_MODEL_HSLA_ANG
    KHR_DF_MODEL_HSVA_HEX
    KHR_DF_MODEL_HSLA_HEX
    KHR_DF_MODEL_YCGCOA
    KHR_DF_MODEL_YCCBCCRC
    KHR_DF_MODEL_ICTCP
    KHR_DF_MODEL_CIEXYZ
    KHR_DF_MODEL_CIEXYY
    KHR_DF_MODEL_BC1A
    KHR_DF_MODEL_BC2
    KHR_DF_MODEL_BC3
    KHR_DF_MODEL_BC4
    KHR_DF_MODEL_BC5
    KHR_DF_MODEL_BC6H
    KHR_DF_MODEL_BC7
    KHR_DF_MODEL_ETC1
    KHR_DF_MODEL_ETC2
    KHR_DF_MODEL_ASTC

Obviously, last ten options have 1:1 mappings with the corresponding compressed formats. Uncompressed formats can somewhat freely choose color_model from 16 other options and the application behavior is out of scope of KTX conformance. I think it's safe to assume that overwhelming majority of writers would use RGBSDA while most readers would ignore this filed altogether.

color_primaries

    KHR_DF_PRIMARIES_UNSPECIFIED
    KHR_DF_PRIMARIES_BT709 = KHR_DF_PRIMARIES_SRGB
    KHR_DF_PRIMARIES_BT601_EBU
    KHR_DF_PRIMARIES_BT601_SMPTE
    KHR_DF_PRIMARIES_BT2020
    KHR_DF_PRIMARIES_CIEXYZ
    KHR_DF_PRIMARIES_ACES
    KHR_DF_PRIMARIES_ACESCC
    KHR_DF_PRIMARIES_NTSC1953
    KHR_DF_PRIMARIES_PAL525
    KHR_DF_PRIMARIES_DISPLAYP3
    KHR_DF_PRIMARIES_ADOBERGB

Default choice should be KHR_DF_PRIMARIES_SRGB.

transfer_function

    KHR_DF_TRANSFER_UNSPECIFIED
    KHR_DF_TRANSFER_LINEAR
    KHR_DF_TRANSFER_SRGB
    KHR_DF_TRANSFER_ITU
    KHR_DF_TRANSFER_NTSC
    KHR_DF_TRANSFER_SLOG
    KHR_DF_TRANSFER_SLOG2
    KHR_DF_TRANSFER_BT1886
    KHR_DF_TRANSFER_HLG_OETF
    KHR_DF_TRANSFER_HLG_EOTF
    KHR_DF_TRANSFER_PQ_EOTF
    KHR_DF_TRANSFER_PQ_OETF
    KHR_DF_TRANSFER_DCIP3
    KHR_DF_TRANSFER_PAL_OETF
    KHR_DF_TRANSFER_PAL625_EOTF
    KHR_DF_TRANSFER_ST240
    KHR_DF_TRANSFER_ACESCC
    KHR_DF_TRANSFER_ACESCCT
    KHR_DF_TRANSFER_ADOBERGB

This one is quite difficult to implement properly. Hardware consistently supports only linear and sRGB filtering, with the latter available only for some specific formats (that have _SRGB in their names). Any other combination of DFD's transfer_function with vkFormat would require manual texture filtering which is out of scope for many platforms.

flags (alpha premultiplication)

Requires software pre-processing since this operation isn't available as an option on texture upload.

texel_block_dimension_0-3

These values are fixed for known vkFormats.

bytes_plane_0-7

All Vulkan 1.0 formats has only 1 plane, so these fields are mostly useless. Moreover, these fields are not sufficient to describe the layouts of "multi-plane" formats from VK_KHR_sampler_ycbcr_conversion extension (part of Vulkan 1.1).

Per-channel fields

These are called "Sample Information" in the DFD spec.

bit_offset / bit_length

Channel data location. Fixed for all known vkFormats.

channel_type

Contains mapping to color model (e.g. red or depth) and some typing flags (e.g. float). Fixed for all known vkFormats when color_model is RGBDSA.

sample_position_0-3

Spatial location within texel block. Makes sense mostly for formats with downsampled channels

sample_lower/upper

Range remapping. Used for some video formats.

Conclusions:

Unless someone needs to signal an uncommon color model (e.g. CMYK), primaries, or transfer function, DFD doesn't provide any additional information for regular Vulkan 1.0 formats. And even in case of uncommon color model, there's no reason to reiterate each channel's information.
There are few cases when DFD provides a little extra information for Vulkan 1.0 compressed formats:
- A _FLOAT flag with ASTC that signals the presence of HDR blocks and texel_block_dimension_2 to describe 3D blocks. These would become less relevant once we have corresponding vkFormat enums.
- An ETC1 color model that signals that texture data doesn't contain ETC2 modes (ETC2 is a superset of ETC1, so there's no separate vkFormat enum).
- Both these cases are covered with KTXglFormat with much less implementation complexity.
Only a subset of new multi-plane formats introduced in Vulkan 1.1 (initially from VK_KHR_sampler_ycbcr_conversion) can be described with DFD. At the same time, for many DFD options there are no API support.

In theory, DFD could be used as the only source of texture format. This would require readers to understand all DFD fields and being able to find the closest available API match or to perform some conversion. Such feature would considerably raise the complexity of the loaders. Writers would have to do the same in the other direction. This DFD / vkFormat redundancy hits the same points as vkFormat / glFormat discussion.

Order of metadata keys

Should KTX2 exporters/editors keep metadata keys sorted, so that checksum doesn't change on round-trip?

`keyValuePadding` spec. is wrong.

keyValuePadding is described as

keyValuePadding[7 - (bytesOfKeyValueData + 7) % 8]

but the number of padding bytes needed depends on the length of the preceding Data Format Descriptor, which is guaranteed to be a multiple of 4 only, and the length of the keyValue data. keyValuePadding should be renamed to ... oh how about, align8Padding which should be

align8Padding[7 - ( bytesOfDataFormatDescriptor + bytesOfKeyValueData + 7) % 8]

Consider specifying padding value

The spec doesn't require fields used for byte-alignment (blockPadding, valuePadding, keyValuePadding, sgdPadding, and mipPadding) to be filled with a particular value (such as 0x00 of 0xFF).

This may lead to two files with the same texture having different checksums because different implementations may use different padding values.

How to carry the Basis bitstream?

We have to figure out how to indicate the texture data is using Basis supercompression and can be transcoded. We'll likely have to use VK_FORMAT_UNKNOWN, because inflation and transcoding are done together. If they were done separately post-inflation would be ETC1S. IIUC as it is now you have to tell the transcoder you want ETC1 in order to get the ETC1S output.

The DFD will probably just provide the color model, primaries and transfer function and have no sample info.

So we need something else to identify it as Basis. Is identifying the supercompressionScheme as Basis enough?

Change offset and length names to match glTF

glTF will mirror part of the KTX2 header in a JSON schema. glTF has a convention to use byteOffset and byteLength as the names of the related items. For consistency and avoidance of confusion it would be good to use the same names within the KTX2 spec. Since these names do not appear in the KTX2 file, changing is not a problem.

So change offset to byteOffset and bytesOfXXX to XXXByteLength.

Also refer to the glTF KTX2 as an example of providing a separate KTX2 header.

Fix padding issues

ISV feedback:

“All this padding madness – can’t we just enforce that everything is 4-byte aligned and be done with it, and every “section” is 16-bytes aligned? Why does the previous section need to pad? Wouldn’t it be easier to just require that a section starts at an aligned position and have an offset stored to it? It seems weird that padding bytes are specifically spelled out in the specification. At the very least, I would expect padding bytes to be defined to be 0 to ensure that there is no ambiguity.”

How to handle RGBE?

There's a use for RGBE textures. What VkFormat should we use: RGBA_UNORM, RGBA_SNORM or UNKNOWN? The DFD can identifiy that the 4th channel is an exponent in the same way that R9G9B9E5 is described. For VkFormat we should probably go with whatever format is best used to upload the data to Vulkan or OpenGL.

How to specify handling of vkFormat == VK_UNKNOWN_FORMAT case

There are cases when we can have a valid DFD together with vkFormat == VK_UNKNOWN_FORMAT. Removing the GL format related fields as proposed in issue #8 will likely increase the chances of this happening. Examples might be older GL formats such the formats in OES_compressed_paletted_texture.

Several questions:

Do we try to specify anything about handling this case or leave it entirely up to the applications how to handle a valid DFD and vkFormat == VK_UNKNOWN_FORMAT?
Should a file be considered valid if the DFD is valid and there is a matching VK_FORMAT yet vkFormat == VK_UNKNOWN_FORMAT? Note that there is not a 1:1 mapping of DFD to VK_FORMAT. So if this case is allowed the format used by a loader will not be deterministic.

Notes about levelCount (section 3.7)

Some things I noticed:

Calculation for max

The current max calculation is missing a final floor(), to avoid fractional indexes
(see Vulkan).

Ambiguous wording

The current wording is (emphasis mine):

(levelCount = 1) means that a file contains only the first level [...]

However, because the levels are stored in reverse order, this could be confusing, as "first" could either be first of the pyramid (level_0), or first in the file (level_p).
I suggest to always refer to levels by their index.

Base level name

In KTX, the base level is always level 0. It therefore seems unnecessary to introduce the name level_base:

Mip level data is ordered from the level with the smallest size images, (level_p) to that with the largest size images, (level_base) where (p = levelCount - 1) and (base = 0).

Simpler alternative:

Mip level data is ordered from the level with the smallest size images, (level_p) to that with the largest size images, (level_0) where (p = levelCount - 1).

Sections that use level_base: 3.7, 3.9.7, 3.13

Specification for mipPadding in KTX is wrong.

The spec says Byte mipPadding[3 - ((imageSize + 3) % 4)]. This is incorrect because imageSize does not include any cubePadding and that could affect the amount of padding needed. It should be changed to Byte mipPadding[0-3].

Support GIF-style animation?

Should we make changes to allow a .ktx2 files to contain an animation, i.e indicate that the set of images is to be displayed sequentially?

An immediate question I have is why do people still use animated GIFs vs. some video file format? Is it about lack of video authoring tools or difficulty dealing with cross-platform video on the Web or something else?

This would be a ktx2 file with levels=1, faces=1 and layers=. Support is probably just a case of adding suitable metadata that indicates the layers should be displayed sequentially and specifying the interval between frames.

If the sequences are expected to be short, one can even imagine uploading the entire contents to the GPU as an array texture, redrawing at the appropriate interval, incrementing the third texture coordinate in between. It could probably be done entirely on the GPU.

How to handle type-awareness disparity between DFD and `VkFormat`

The Data Format Descriptor has no awareness of primitive types. A DFD describes the data as it is laid out in memory and is thus independent of endianness. However Vulkan & GL formats are type aware and, when the file writer and reader have different endianness, those with primitive type sizes of 2, 4 or 8 bytes must be byte-swapped into the reader's endianness before the data can be uploaded to GL or Vulkan using the vkFormat specified in the KTX file.

Note that there are a few VK_FORMATS which have opposite-endian equivalents that would enable uploads to be done without any byte swapping, e.g VK_FORMAT_R8G8B8A8_UNORM and VK_FORMAT_A8B8G8R8_UNORM_PACK32. These cases are very much the minority so it doesn't change the need to resolve this issue.

Changing the endianness of DFD-described data requires

recognizing that the data has a size that requires swapping which is somewhat involved.
modifying the DFD to describe the changed-endian data
byte-swapping the data.

Andrew G. has promised some sample code to do this.

The DFD spec. is supporting a lot wider range of data layouts than KTX2 needs which means Andrew finds making DFD type aware unattractive. However placing restrictions on the DFDs used in KTX files is theoretically possible. I'm not sure what restrictions we could make in practice that would help here.

Meanwhile the KTX2 header is also type-aware. It has 32- and 64-bit fields so we can't remove the endianness field and swapping requirement we already have.

How to we handle the disparity?

Require that the image data should always be little-endian. Drawbacks here are that big-endian systems would have to swap bytes on both writing and reading KTX files and writers on big-endian machines are very likely to overlook this requirement leading to invalid files.
When the endianness field indicates a mismatch require readers to determine from the DFD if the data needs swapping, byte swap it and rewrite the DFD.
Retain a typesize field which is used when vkFormat != VK_UNKNOWN_FORMAT. In this case the DFD must be rewritten as well. When it is VK_UNKNOWN_FORMAT` let the reader decide how it wants to handle the data.
Some other way?

Subject to seeing the code sample from Andrew, I am leaning towards 2.

Make supercompression schemes optional, except Basis

Following WG discussion the consensus is to make all except Basis optional. Subsidiary question is what optional schemes to include: LZMA only, LZMA & Zstd, LZMA, Zstd and Deflate?

LZMA has a higher compression ratio than Zstd but its decompression is slow (and reportedly much slower than Zstd).

ISV KTX usability feedback

ISV feedback (multiple sources). Repeat of request for indexability, use of JSON

“Overall, this seems like a format I wouldn’t like to support, because it’s more complicated than necessary and at the same time not as flexible as it should be. Something similar to a binary JSON representation would have been much easier for everyone, but at least it could be made into a format with a “index table” at the beginning to make it seek-friendly.”

Depth/stencil textures support

We need to verify whether KTX2 needs to support depth and/or stencil textures.

If it does:

Which texture types could be used with D/S data?
How does swizzling interact with D/S?

Update spec. when ASTC HDR & 3D Vulkan enums are available

Remove comments about using the DFD to distinguish these formats.

Format needs to be more directly seekable

ISV feedback:

“The format is non-seekable. Excellent work that I need to read through half the file so I can start skipping to the bits I want. Point in case, I just want the lowest level mip-map. I read the file header, then I need another request to read the bytesOfDataFormatDescriptor worth of data (and the bytesOfKeyValueData), which I can then use to skip to bytesOfSupercompressionGlobalData, then I can skip over that. Which means I need two seeks and reads, or, if I’m doing HTTP, three requests assuming I don’t download the whole file right away (assuming a file with both supercompressed and non-supercompressed data is valid.)“

Purpose of levelOrder field?

ISVs have requested clarification of why levelOrder needs to be specified? Can the order not just be fixed to highest->lowest mip? This way low bandwidth apps can take advantage of streaming low-to-high resolutions, and desktop apps can choose for themselves which level(s) to pull in

[Discussion] Desired texture/sampler state

Since KTX2 already has swizzling metadata, it's fair to explore what other runtime properties may need to be stored in the same way (using optional metadata). Keep in mind that API support is not consistent for these and also depends on pixel format used.

Border color and width.
Wrapping modes.
Min / mag filters.
LOD min/max/bias.

KTXorientation format should be same in KTX 1 & KTX 2

KTXorientation has to be handled by the application as only it knows what it needs to do to transform on convert the texture coordinates to display the image correctly. It will be simpler for applications, therefore, if the format of the KTXorientation metadata is the same in KTX 1 and KTX 2. With them different the app has be aware whether it is loading a KTX 1 or a KTX 2 file. If it is using the libktx loaders, unless it is relying on KTX 2 specific metadata, there is no other reason it needs to be aware.

Use 4 character code for compression type over uint

ISV feedback: Instead of using a UInt32 for the supercompression scheme, why not use 4 byte characters (or at least use those instead of 1, 2, 3, 4), i.e. ZLIB, ZSTD, CRNC. This would be strongly preferred

Tighten dimensions restrictions

Following existing restrictions on pixelHeight and pixelDepth, the spec should state that pixelWidth cannot be 0.

Common dimensions combinations could be organized like this:

Type	`pixelWidth`	`pixelHeight`	`pixelDepth`	`numOfArrayElements`	`numOfFaces`
1D	> 0	0	0	0	1
2D	> 0	> 0	0	0	1
3D	> 0	> 0	> 0	0	1
CUBE	> 0	> 0	0	0	6
1D_ARRAY	> 0	0	0	> 0	1
2D_ARRAY	> 0	> 0	0	> 0	1
CUBE_ARRAY	> 0	> 0	0	> 0	6

Any combination that doesn't fit the table above should be declared invalid and the loader should refuse to continue reading a file.

We should decide whether to allow (given that there's no API support for these):

incomplete cubemaps (numOfFaces > 1 && numOfFaces < 6);
3D arrays (pixelDepth > 0 && numOfArrayElements > 0);
cubemaps with 3D arrays (pixelDepth > 0 && numOfArrayElements > 0 && numOfFaces == 6).

Define keyValueDataOffset when metadata is not present

Likely something along (as in SGD offset):

The value must be 0 when bytesOfKeyValueData = 0.

Change supercompressionGlobalData offset to UInt32?

Currently it is 64-bits but the only things preceding the data in the file are the header, the index, the DFD and the key-value data. It is inconceivable that these things would total more than 4GB.

Better ASTC support

ASTC texture formats cover lots of use-cases and KTX2 spec requires additional details to fully support them.

vkFormat (as well as glFromat) gives only three properties of ASTC data:

Block dimension (2D or 3D)
Block footprint size (4x4, 5x4, ...)
sRGB output enabled or not

Blocking issue:

There's no Vulkan enums for ASTC 3D blocks.
- @dewilkinson or @MarkCallow, could you ask Vulkan WG about this?

Not every format value is valid for every possible KTX2 dimensions configuration. Moreover, LDR/HDR modes are signaled per-block within texture data.

ASTC is not supported for 1D textures.
- This is already enforced by KTX2 dimensions restrictions.
There are additional constraints based on block sizes - i.e., pixelWidth must be a multiple of block's width.
- We can enhance pixelWidth and friends section with that because it's relevant for all block formats.
Implementations supporting only LDR profile (without _sliced_3d extension) will not accept ASTC for 3D texture targets.
- Not a file format issue, but maybe it's worth mentioning.
sRGB formats cannot contain blocks with HDR modes (decoder will output error color).
- We can use DFD to tighten this. See (section 7.5):
  
  ... an ASTC texture that is guaranteed by the user to contain only LDR-encoded blocks should have the channel_id KHR_DF_SAMPLE_DATATYPE_FLOAT bit clear, ...
Some recent mobile GPUs support reducing ASTC's decoder precision for better cache performance and power saving. While the encoded data remains the same, the exact decoded bits will be different, so some pipeline tools may want to keep this information.
- If there's enough interest, we can add a new metadata entry (see VK_EXT_astc_decode_mode or EXT_texture_compression_astc_decode_mode).

Community Discussion - General

Please feel free to leave a comment here for general feedback or discussion on the KTX2 specification and implementation

Tighten Vulkan / GL enums

The KTX2 headers includes 3 OpenGL-originated fields describing data format (glType, glFormat, and glInternalFormat) and one from Vulkan (vkFormat). The presence of all four of them drastically increases the amount of possible combinations and may lead to various implementation issues unless strictly defined by the KTX2 spec.

Here're the most outstanding issues, as I see them.

`vkFormat` alone is usually enough to fully describe data

For example: VK_FORMAT_R8G8_UINT

glType: UNSIGNED_BYTE
glFormat: GL_RG_INTEGER
glInternalFormat: GL_RG8UI

It's not clear what implementations should do if they encounter mismatching GL values with known Vulkan format.

Desktop OpenGL allows in-driver data conversion on texture upload

So glInternalFormat don't have to match other fields to be consumed by OpenGL API. Behavior may vary by vendor and driver version.

Vulkan has a notion of "scaled" formats which has no direct equivalent in OpenGL ES

For example: VK_FORMAT_R8G8_USCALED. There's no valid glFormat / glInternalFormat combination to cover this. Since such formats are not mandatory in Vulkan, they could be disallowed in KTX2.

Vulkan supports sRGB with RED and RG formats

See VK_FORMAT_R8_SRGB and VK_FORMAT_R8G8_SRGB.
There's no specific glInternalFormat enums for them in the core OpenGL. Desktop OpenGL may be able to accept GL_SRGB8 if conversion works. For embedded, there're EXT_texture_sRGB_R8 and EXT_texture_sRGB_RG8 extensions. Which of these should KTX2 allow?

BGRA format requires different enums on OpenGL and OpenGL ES

Vulkan has all flavors of it: signed/unsigned/integer/normalized/sRGB. OpenGL supports GL_BGRA for glFormat but there's no corresponding internal format. On the other hand, OpenGL ES may accept GL_BGRA8_EXT as glInternalFormat with APPLE_texture_format_BGRA8888 extension. BGR without alpha has even worse API support.

Some compressed formats from OpenGL don't have matching `vkFormat`.

OES_compressed_paletted_texture
AMD_compressed_3DC_texture
AMD_compressed_ATC_texture
3DFX_texture_compression_FXT1
EXT_texture_compression_latc
3D formats from OES_texture_compression_astc

In case someone uses some of these formats, vkFormat could be extended with little effort.

With all that said, it seems that we have two options for achieving predictable behavior across different platforms.

List every valid combination of these four fields in the spec and implement all checks in the reference loader. Conforming implementations would need to perform such validation even when not using OpenGL (ES) to keep ecosystem consistent, i.e., refuse to load KTX2 file with invalid OpenGL fields even while running on Vulkan.
Keep only vkFormat, while deriving OpenGL values at runtime (so reference implementation will be almost the same as in option one). Clients running on Metal or Direct3D will be able to completely omit OpenGL-related validation from their codebase since it's much easier to get MTLPixelFormat or DXGI_FORMAT from VkFormat rather than from three OpenGL values.

/cc @MarkCallow @dewilkinson @pjcozzi

Fix broken links

Some links in the current spec are broken. This could be debugged with:

$ asciidoctor -v ...

We also should update Makefile to prevent merging broken links in master:

$ asciidoctor -v --failure-level INFO ...

mipMap=0 should not imply autogeneration

ISVs are pointing out that this Language is making strong assumptions about runtime behaviour. The choice to auto generate levels is the responsibility of the app at runtime , should not be part of the file format

Fill out format mapping table

Format mapping needs to be filled out from this data.

Clarify handling of multi-plane formats

Vulkan 1.1 has introduced multi-planar formats that need special layout. Namely, they consist of 1-3 planes that don't have to have the same dimensions across components. For example:

VK_FORMAT_G10X6_B10X6_R10X6_3PLANE_420_UNORM_3PACK16
Each plane is a one-component image with pixel data stored in the top 10 bits of each 16-bit word, bottom 6 bits are set to 0.

Plane 0: G component, full resolution
Plane 1: B component, half horizontal and half vertical resolution
Plane 2: R component, half horizontal and half vertical resolution

KTX2 must do one of:

disallow such formats;
require a specific layout for storing multi-plane images and document it;
explicitly delegate specification of multi-plane layout to DFD.

Relocate Supercompressed Global Data to start of bitstream block

It has been decided to remove the binary Global Data block from the current location under the header of KTX2 file, and move it to its own block at the start of the bitstream data, immediately before the highest mip level data offset. The datastream will therefore look like this:

Pos 0: Supercompressed Global Data [n] Pos n: Miplevel m [n(m)] Pos n + n(m) : Miplevel m-1 [n(m-1)] Pos n + n(m) + n (m-1): ....... ...

Basis information in the spec is outdated

Specifically, the spec repeatedly refers to Crunch CRN rather than Basis and should be updated to reflect the more recognizable public name. Additionally, there is a note in 3.8.1 that reads

A file that specifies Crunch CRN with base formats other than ETC, ETC2 and BC[1-3] (S3TC_DXT[1-5]) must be considered invalid.

It looks like base format is a reference to a no longer extant field, unless I'm missing something.

KTXwriter

I am having second thoughts about KTXwriter metadata. The problem is that it makes simple diffs less useful. KTX2 files could have identical content except for the KTXwriter value. Since the value includes the writer's version number, files created from the same sources, by the same user using the same tool could fail to compare equal. The user could easily overlook an updated tool version especially if the updates are automatic.

I see the attraction of identifying tools that create malformed files but I don't want to have to write custom diff tools for KTX2 that would ignore this field.

Comments? Suggestions?

MIME update

As of now, IANA image/ktx record points here.
The registration information contains 12 "magic" bytes that include KTX1 version string.

Since KTX2 uses different "magic", we need to decide:

whether to introduce a new file extension (like .ktx2) and a new MIME type (like image/ktx2)

add KTX2 "magic" to IANA registration to keep file extension and MIME type intact.

Purpose of Supercompressed Global Data block

Looking for clarification on this one.

IIUC the SupercompressedGlobalData array block is designed to hold codebooks for the Huffman encoded portions of the compressed data stream

My question: the codebook should be encapsulated within the encoder. Only the decoder, not the app can use this data. It is also already contained in the bitstream - so why are we copying it out into the body of the file format where it is of no practical use? The codebook will be extracted by the decoder from the bitstream directly and is the only place it would be used.

Am I missing something that would make it necessary to pull codebook data up to the file format level?

What I do see a use for is simply a global block of opaque binary data that can be optionally used and populated with binary data meaningful to the target app that may be useful - but certainly not limited to codebooks or even compression

In summary, I find Supercompressed Global Data block redundant , but I am in favor of a general purpose opaque ‘Global Data’ block.
So Effectively , the only change I’d propose would be the naming , having that binary block there is still useful...

Confirm supported ENV_MAP formats

ISV request to clarify what formats we support for environment maps

Key/Value size formula (section 3.11.)

As I understand it, there are some issues with the current size formula (excuse the formatting):

sum_(i=0)^(n-1)(ceil(keyAndValueByteLength[i] / 4)) * 4
  + keyAndValueByteLength[n]
  = kvdByteLength

It is missing the size of the keyAndValueByteLength fields themselves (... + n*4)
keyAndValueByteLength[n] is out of bounds (sum bound should be n-2?)

However, I wonder if this formula is necessary to begin with. Because keyAndValueByteLength is arbitrary, the pairs need to be iterated anyway.

Key/Value ambiguities (section 3.11)

There are some edge-case ambiguities for the key/value metadata (section 3.11), which might be worth to clarify.

Empty key

As I understand it, a key can be empty (keyAndValue begins with 0x00). Is this a good idea? I don't expect empty keys to break any implementation, but they could be quite confusing.

Duplicate keys

Currently, the spec does not say what should happen if a key is included multiple times, possibly with different values. The pragmatic approach is to require that keys must be unique per file. Otherwise, there is a need for precedence and sorting rules.

Is KTX2 specification finalized yet?

I've been trying to google this, and its very hard to even find the right hits for KTX.

I can see there was a draft in 2019, is the KTX2 spec out yet?

Provide formal syntax for KTXorientation metadata

The spec gives some examples and descriptions, but there's no formal parsing process defined.
Here's a list from the spec:

-   S=r,T=d
-   S=r,T=u
-   S=r,T=d,R=i
-   S=r,T=u,R=o

A robust parser for this kind of syntax has to cover lots of edge cases like:

-   S=r,S=l // overriding S
-   S=u // wrong value for S
-   S=r, // trailing comma
-   S=r, T=u // space after comma
-   S=r,Q=z // unknown parameter
-   R=o,T=u,S=r // inverse order of dimensions
-   T=r // S orientation is undefined

Follow-up questions:

Is it valid to define all 3 orientations for 1D/2D textures?
Is it valid to under-define orientations for 2D/3D textures?

To simplify implementations, I propose to change this syntax like this:

1D: exactly one byte: {r,l}.
2D: exactly two bytes: {r,l}{d,u}
3D: exactly three bytes: {r,l}{d,u}{o,i}