Opened 12 years ago

Closed 9 years ago

#671 closed defect (fixed)

DeltaQpRD mismatch when multiple slices are used

Reported by: ksuehring Owned by: karlsharman
Priority: minor Milestone: HM-16.5
Component: HM Version: HM-8.0
Keywords: Cc: fbossen, ksuehring, davidf, jct-vc@…

Description

When DeltaQpRD is enabled and multiple slices (also dependent slices) are used, encoder and decoder mismatch.

Attachments (4)

depslices_cabac_reset.cfg (7.8 KB) - added by pieterkapsenberg 12 years ago.
RaceHorses_lowdelay_dep_slices_cabac_reset.bin (32.9 KB) - added by pieterkapsenberg 12 years ago.
precompress_slice_r4154.patch (8.5 KB) - added by karlsharman 9 years ago.
precompress_slice_r4392.patch (8.5 KB) - added by karlsharman 9 years ago.

Download all attachments as: .zip

Change History (15)

comment:1 Changed 12 years ago by DefaultCC Plugin

  • Cc fbossen ksuehring davidf jct-vc@… added

comment:2 Changed 12 years ago by ksuehring

Ticket #669 has been marked as a duplicate of this ticket.

Changed 12 years ago by pieterkapsenberg

Changed 12 years ago by pieterkapsenberg

comment:3 Changed 12 years ago by pieterkapsenberg

The attached files show a different dependent slices digest mismatch when decoding the first B picture.

comment:4 Changed 12 years ago by pieterkapsenberg

Here the problem may be that getSlice()->getMvdL1ZeroFlag() does not use the value from the first slice in the pic, rather it uses the default of 0.

comment:5 Changed 12 years ago by ksuehring

Can we try to keep different issues separate? It's hard to find a dependent slice issue in the comments under a different heading. Also if the main issue is solved, I would like to be able to close the ticket.

Please either file a new ticket for the dependent slice mismatch or reopen #669.

comment:6 Changed 12 years ago by pieterkapsenberg

Done. The other issue is filed under separate ticket #673

comment:7 Changed 9 years ago by karlsharman

The base problem, as seen by running:
./bin/TAppEncoderStatic -c cfg/encoder_intra_main.cfg

-c cfg/per-sequence/RaceHorses.cfg -f 1
--DeltaQpRD=1 --SliceMode=1 --SliceArgument=10
--SEIDecodedPictureHash=1

goes wrong due to the corruption of the reconstruction buffer in the encoder. In TEncSlice::precompressSlice, the loop filter is applied to the entire frame after trial encoding a slice at a particular QP. This will therefore update (corrupt) the reconstructed data of previously coded slices.

The code is also not ideal as future slices have not yet been coded and these impact the distortion being calculated.

A TODO was added recently in precompressSlice with a query about the validity of the software when multiple slice-segments are used.

comment:8 Changed 9 years ago by karlsharman

I have prepared a patch that handles multiple slices and slice-segments when DeltaQpRD is used. The patch, however, bypasses the deblocking filter (SAO was already being ignored) - there is a TODO to highlight this. If someone wants the loop-filters to be used, please update the patch.

In addition, the distortion is based upon the SSE distortion calculated during compressSlice (see TComRdCost::getDistPart), which has a weighting applied to the chroma channels. This is different to the current software where a simple SSE was used (of the whole reconstructed frame, some of which may not have been reconstructed). Which is the preferred measure?

Changed 9 years ago by karlsharman

comment:9 Changed 9 years ago by ksuehring

  • Milestone set to HM-16.5
  • Owner set to karlsharman
  • Status changed from new to assigned

Changed 9 years ago by karlsharman

comment:10 Changed 9 years ago by karlsharman

I have updated the patch due to other changes in the code base.

Just to clarify about the problems that are currently in the code.

Problem 1:

If there were multiple dependent slice segments, then the second
slice segment would be analysed separately and would
be given its own slice-level QP, which is not permitted.

So this patch causes precompress slice to optmise on a slice-by-slice
basis. If a dependent slice were input, then it will be ignored
as the slice to which it belongs was analysed in its entirety at
a previous call.

(In a previous patch, an assert was added that forbid a dependent
slice from changing its QP.)

Problem 2:

Imagine the first slice is the top row of CTUs, and the second slice is
everything else

For the first slice, the DeltaQPRD code calculates the
reconstruction for the first row of CTUs. It then applies
the deblocking algorithm on the entire picture.

Parts of the picture will become corrupted at the boundary
between the parts that have been coded and the parts that
are in slices that have yet to be coded.
The cost measure is also based upon the entire frame,
and so this will produce sub-optimal results.

Problem 3 (encoder-decoder mismatches):

Again, imagine the first slice is the top row of CTUs, and the second
slice is everything else

For the second slice, the code again calculates the reconstruction
for the last "Height-1" rows of CTUs. It then applies the deblocking
algorithm on the entire reconstruction picture.

It then decides what to do for the second slice and then
generates the actual reconstruction data for the second slice.

Later, once all slices are coded, the deblocking filter is
applied. However, the first row of CTUs have now had the
deblocking filter applied multiple times (once for each deltaQPRD
tested during the second slice).

This will lead to encoder-decoder mismatches.

To sort these 2 problems out, the patch no longer applies the deblocking filter
(SAO was not applied anyway).
In addition, the distortion is based upon the SSE distortion calculated during
compressSlice (see TComRdCost::getDistPart), which has a weighting applied to
the chroma channels. This is different to the current software where a simple
SSE was used (of the whole reconstructed frame, some of which may not have been
reconstructed).

Problem 4:

It is unclear what to do when the slice-mode is byte-limited.

Comparing the cost of coding 10 CTUs at one slice-QP with the
cost of coding 9 CTUs at another slice-QP may not be applicable.

An average may not be good enough either because all the
bits might be in the 10th CTU.

So for that reason, this combination of settings is disabled
with an assert and there is a TODO in the code.

comment:11 Changed 9 years ago by ksuehring

  • Resolution set to fixed
  • Status changed from assigned to closed

The latest patch has been applied in r4393

Note: See TracTickets for help on using tickets.

This list contains all users that will be notified about changes made to this ticket.

These roles will be notified: Reporter, Owner, Subscriber, Participant

  • David Flynn(Subscriber)
  • Frank Bossen(Subscriber)
  • jct-vc@…(Subscriber)
  • Karl Sharman(Owner, Participant)
  • karl.sharman@…(Always)
  • Karsten Suehring(Reporter, Subscriber, Participant, Always)
  • Pieter Kapsenberg(Participant)