Implement ff_ssl_*_key_cert()
This can lead to confusion when trying to decide if the file contains an explicit duration, since we end up with a track who last event has a non-zero delta time from the previous one.
The b_adapt option allows users to control adaptive B-frame decision when lookahead is enabled in HEVC encoding.
Applying an LCD filter to spans rather than the entire image improves the performance of ClearType-like rendering by about 40% at 32 ppem and much more at larger sizes.
The existing dlinfo request type RTLD_DI_ORIGIN used for querying the value of the '$ORIGIN' dynamic string token is prone to buffer overflows. This commit adds a new request type named RTLD_DI_ORIGIN_PATH that returns a pointer to the dynamic string token (i.e.
This commit introduces a video filter `mestimate_d3d12` that provides hardware-accelerated motion estimation using DirectX 12 Video Encoding APIs.
Implement NEON optimization for HEVC dequant at 8-bit depth. The NEON implementation uses srshr (Signed Rounding Shift Right) which does both the add with offset and right shift in a single instruction. Optimization details:- 4x4 (16 coeffs): Single load-process-store sequence- 8x8 (64 coeffs): Fully unrolled, no loop overhead- 16x16 (256 coeffs): Pipelined load/compute/store to hide memory latency- 32x32 (1024 coeffs): Pipelined with all available NEON registers Performance benchmark on Apple M4: ./tests/checkasm/checkasm --test=hevc_dequant --bench hevc_dequant_4x4_8_c: 11.3 ( 1.00x) hevc_dequant_4x4_8_neon: 6.3 ( 1.78x) hevc_dequant_8x8_8_c: 33.9 ( 1.00x) hevc_dequant_8x8_8_neon: 6.6 ( 5.11x) hevc_dequant_16x16_8_c: 153.8 ( 1.00x) hevc_dequant_16x16_8_neon: 9.0 (17.02x) hevc_dequant_32x32_8_c: 78.1 ( 1.00x) hevc_dequant_32x32_8_neon: 31.9 ( 2.45x) Note on Performance Anomaly: The observation that hevc_dequant_32x32_8_c is faster than 16x16 (78.1 vs 153.8) is due to Clang auto-vectorizing only for sizes >= 32x32.
The HEVC dequantization uses: shift = 15 - bit_depth - log2_size When shift equals 0, the operation becomes an identity transform:- For shift > 0: output = (input + offset) >> shift- For shift < 0: output = input << (-shift)- For shift = 0: output = input << 0 = input (no change) This occurs in the following cases:- 10-bit, 32x32 block: shift = 15 - 10 - 5 = 0- 12-bit, 8x8 block: shift = 15 - 12 - 3 = 0 Previously, the code would still iterate through all coefficients and perform redundant read-modify-write operations even when shift=0. This patch adds an early return for shift=0, avoiding unnecessary memory operations.
Implement NEON optimization for HEVC dequant at 10-bit depth.
Implement NEON optimization for HEVC dequant at 12-bit depth. For 12-bit: shift = 15 - 12 - log2_size = 3 - log2_size.
- Currently supports Core family CPUs starting at Nehalem series, up to Coffee Lake, as well as some ATOM CPUs.
Add chroma_location option so that, in the subsampled-to-subsampled case, the destination's chroma siting can be changed from the source's without having to use other filters.
This is needed for the recently added EXIT_UNSUPPORTED return value.