[The following transcript is more for the techies of my readership. For those of a less technical inclination, feel free to wait for the next post on “Active Design Ideas” which I have separated out due to the length of this post.]
I want to underpin the philosophical aspect of this discussion by using an example software architecture and considering some design problems that I have experienced with multi-threaded video player pipelines. The issues I highlight could apply to many video player designs.
The following image is a highly simplified top-level schematic, the original being just an A4 pdf captured from a whiteboard, a tool that I find much better for working on designs than using any computer-based UML drawing tool. The gross motor movement of hand drawing ‘in the large’ seems to help the thinking process.
There are 3 basic usual commands for controlling any video player that has random access along a video timeline:
In this example there is a main controller thread that handles the commands and controlling the whole pipeline. I am going to conveniently ignore the hard problem of actually reading anything off a disk fast enough to keep a high resolution high frame-rate player fed with data!
The first operation for the pipeline to do is to render the display frames in a parallel manner. The results of these parallel operations, since they will likely be produced out of order, need to be made into an ordered image stream that can then be buffered ahead to cope with any operating system latencies. The buffered images are then transferred into an output video card, which has only a relatively small amount of video frame storage. This of course needs to be modeled in the software so that (a) you know when the card is full; and (b) you know when to switch the right frame to the output without producing nasty image tearing artefacts.
These are all standard elements you will get with many video player designs, but I want to highlight three design issues that I experienced in order to get an understanding of what I will later term an “Organising Principle”.
First there was slow operation resulting in non real-time playout. Second, occasionally you would get hanging playout or stuttering frames. Third, you could very occasionally get frame jitter on stopping.
Given what I said about Goethe and his concept of Delicate Empiricism, the very first thing to do was to reproduce the problem and collect data, i.e. measure the phenomenon WITHOUT jumping to conclusions. In this case it required the development of logging instrumentation software within the system – implemented in a way that did not disturb the real-time operation.
With this problem I initially found that the image processing threads were taking too long, though the processes were doing their job in time once they had their data. So it was slowing down BEFORE they could get to start their processing.
The processing relied on fairly large processing control structures that were built from some controlling metadata. This build process could take some time so these structures were cached with their access keyed by that metadata, which was a much smaller structure. Accessing this cache would occasionally take a long time and would give slow operation, seemingly of the image processing threads. This cache had only one mutex in its original design and this mutex was locked both for accessing the cache key and for building the data structure item. Thus when thread A was reading the cache to get at an already built data item, it would occasionally block behind thread B which was building a new data item. The single mutex was getting locked for too long while thread B built the new item and put it into the cache.
So now I knew exactly where the problem was. Notice the difference between the original assumption of the problem being with the image processing, rather than with the cache access.
It would have been all too easy to jump to an erroneous conclusion, especially prevalent in the Journeyman phase, and change what was thought to be the problem. Although such a change would not actually fix the real issue, it could have changed the behaviour and timing so that the problem may not present itself, thus looking like it was fixed. It would likely resurface months later – a costly and damaging process for any business.
In this case the solution here was to have finer grained mutexes: one for the key access into the cache and a separate one for accessing the data item, which was then lazily built on first access.
Hanging Playout or Stuttering Frames
The second bug was that the playout would either hang or stutter. This is a great example because it illustrates a principle that we need to learn when dealing with any streamed playout system.
The measurement technique in this case was extremely ‘old school’, simply printing data to a log output file. Of course only a few characters were output per frame, because at 60fps (a typical modern frame-rate) you only have 16ms per frame.
In this case the streaming at the output end of the pipeline was happening out of order, a bad fault for a video playout design. Depending upon how the implementation was done, it would either cause the whole player to hang or produce a stuttered playout. Finding the cause of this took a lot of analysis of the output logs and many changes to what was being logged. An example of needing to be clear about the limits of one’s knowledge and of properly identifying the data that next needed to be collected.
I found that there was an extra ‘hidden’ thread added within the output card handling layer in order to pass off some other output processing that required. However it turned out that there was no enforcement of frame streaming order. This meant that the (relatively) small amount of memory in the output card would get fully allocated and this would give rise to a gap in the output frame ordering. The output control stage was unable to fill the gap in the frame sequence with the correct frame, because there was no room in the output card for that frame. This would usually result in the playout hanging.
This is why, with a streaming pipeline, where you always have limited resources at some level, allocation of those resources MUST be done in streaming order. This is a dynamic principle that can take a lot of hard won experience to learn.
The usual Journeyman approach to such a problem is just to add more memory, i.e. more resource! This will hide the problem because though processing will still be done out of order, the spare capacity has been increased and it will not go wrong until you next modify the system to use more resource. At this point the following statement is usually made:
“But this has been working ok for years!”
The instructions I need to tell less experienced programmers when trying to debug such problems will usually include the following:
“Do not change any of the existing functionality.
Disturb the system as little as possible.
Keep the bug reproducible so you
can measure what is happening.
Then you will truly know when you have fixed the fault.”
Frame Jitter on Stop
The third fault case was an issue of frame jitter when stopping playout. The problem was that although the various buffers would get cleared, there could still be some frames ‘in flight’ in the handover threads. This is a classic multi-threading problem and one that needs careful thought.
In this case when it came time to show the frame at the current position, an existing playout had to be stopped and the correct frame would need to be processed for output. This correct frame for the current position would make its way through to the end of the pipeline, but could get queued behind a remnant frame from the original stopped playout. This remnant frame would most likely have been ahead of the stop position because of the pre-buffering that needed to take place. Then when it came time to re-enable the output frame viewing in order to show the correct frame, both frames would get displayed, with the playout remnant one being shown first. This manifested on the output as a frame jitter.
One likely fix of an inexperienced programmer would be to make the system sit around waiting for seconds while the buffers were cleared and possibly cleared again, just in case! (The truly awful “sleep” fix.) This is one of those cases where, again due to lack of deep analysis, a defensive programming strategy is used to try and force a fix of what initially seems to be the problem. Again, it is quite likely that this may SEEM to fix the problem, and is likely to happen if the developer is under heavy time pressure.
The final solution to this particular problem was to use the concept of uniquely identified commands, i.e. ‘command ids’. Thus each command from the controlling thread, whether it was a play request or a show frame request, would get a unique id. This id was then tagged on to each frame as it was passed through the pipeline. By using a low-level globally accessible (within the player) ‘valid command id set’ the various parts of the pipeline could decide, by looking at the tagged command id, if they had a valid frame that could be allowed through or quietly ignored.
When stopping the playout all that had to be done was to clear the buffers, remove the relevant id from the ‘valid command id set’ and this would allow any pesky remaining ‘in flight’ frames to be ignored since they had an invalid command id. This changed the stop behaviour from being an occasional, yet persistent bug, into a completely reliable operation and without the need for ‘sleep’ calls anywhere.
In the next post I will recap the above process of finding and fixing the problems from a human development perspective.
→ACCU2016: Talk on Software Architecture Design 5: Active Design Ideas
←ACCU2016: Talk on Software Architecture Design 3: The Issue of Doubt