The HM encoder currently fails to perform startcode emulation prevention on all NAL units.

Worse, the current method of doing this has lead to some horrible hacks to signal where slices start so that the anti-emulation code can skip over the startcodes.

A solution is to assemble separate NALunits (this also allows for trivial reordering of the NALunits in cases where SEI messages are calculated after the picture has been encoded) and deal with them as a vector (much like writev(2)).

The current HM-3.0-dev r909 software still has another emulation problem.
At write() in NALwrite.cpp, emulation_prevention_three_byte can be emulated.
The emulation is generated at least for BQTerrace, low delay, low complexity, Qp=27.
An example fix is shown as follows.

  for (vector<uint8_t>::const_iterator it = rbsp.begin(); it != rbsp.end();)
    /* 1) find the next emulated start_code_prefix
     * 2a) if not found, write all remaining bytes out, stop.
     * 2b) otherwise, write all non-emulated bytes out
     * 3) insert emulation_prevention_three_byte
    static const uint8_t two_zero_bytes[] = {0,0};
    vector<uint8_t>::const_iterator curr_it = it;
    vector<uint8_t>::const_iterator found;
    bool end_search = false;
      found = search(curr_it, rbsp.end(), two_zero_bytes, two_zero_bytes+2);
      unsigned num_last_bytes = rbsp.end() - found;
      if ( num_last_bytes >= 3 && *(found+2) > 0x3 )
        curr_it += 3;
        end_search = true;
    while ( !end_search );
    static const char start_code_prefix[] = {0,0,1};
    vector<uint8_t>::const_iterator found = search(it, rbsp.end(), start_code_prefix, start_code_prefix+3);
    unsigned num_nonemulated_bytes = found - it;

Sorry, not low complexity but high efficiency.

It looks like i was too literal when i wrote down prevent emulation of "start_code_prefix".

Using the following test cases, this has now been resolved in r917:

Self-Test: 0, {ff,ff,ff,ff} -> {ff,ff,ff,ff} OK
Self-Test: 1, {ff,ff,ff,ff,0,0,0} -> {ff,ff,ff,ff,0,0,3,0,3} OK
Self-Test: 2, {ff,ff,ff,ff,0,0,1} -> {ff,ff,ff,ff,0,0,3,1} OK
Self-Test: 3, {ff,ff,ff,ff,0,0,2} -> {ff,ff,ff,ff,0,0,3,2} OK
Self-Test: 4, {ff,ff,ff,ff,0,0,3} -> {ff,ff,ff,ff,0,0,3,3} OK
Self-Test: 5, {ff,ff,ff,ff,0,0,4} -> {ff,ff,ff,ff,0,0,4} OK
Self-Test: 6, {ff,ff,ff,ff,0,0,0,ff,ff} -> {ff,ff,ff,ff,0,0,3,0,ff,ff} OK
Self-Test: 7, {ff,ff,ff,ff,0,0,1,ff,ff} -> {ff,ff,ff,ff,0,0,3,1,ff,ff} OK
Self-Test: 8, {ff,ff,ff,ff,0,0,2,ff,ff} -> {ff,ff,ff,ff,0,0,3,2,ff,ff} OK
Self-Test: 9, {ff,ff,ff,ff,0,0,3,ff,ff} -> {ff,ff,ff,ff,0,0,3,3,ff,ff} OK
Self-Test: a, {ff,ff,ff,ff,0,0,4,ff,ff} -> {ff,ff,ff,ff,0,0,4,ff,ff} OK
Self-Test: b, {ff,ff,ff,ff,0,0,0,0,0,0,ff,ff} -> {ff,ff,ff,ff,0,0,3,0,0,3,0,0,ff,ff} OK
Self-Test: c, {ff,ff,ff,ff,0,0,1,0,0,1,ff,ff} -> {ff,ff,ff,ff,0,0,3,1,0,0,3,1,ff,ff} OK
Self-Test: d, {ff,ff,ff,ff,0,0,2,0,0,2,ff,ff} -> {ff,ff,ff,ff,0,0,3,2,0,0,3,2,ff,ff} OK
Self-Test: e, {ff,ff,ff,ff,0,0,3,0,0,3,ff,ff} -> {ff,ff,ff,ff,0,0,3,3,0,0,3,3,ff,ff} OK
Self-Test: f, {ff,ff,ff,ff,0,0,4,0,0,4,ff,ff} -> {ff,ff,ff,ff,0,0,4,0,0,4,ff,ff} OK
Self-Test: 10, {ff,ff,ff,ff,0} -> {ff,ff,ff,ff,0,3} OK
Self-Test: 11, {ff,ff,ff,ff,0,0} -> {ff,ff,ff,ff,0,0,3} OK
Self-Test: 12, {ff,ff,ff,ff,0,0,0} -> {ff,ff,ff,ff,0,0,3,0,3} OK
Self-Test: 13, {ff} -> {ff} OK

