Monday, 26 December 2011

Philips Healthcare's Sierra ECG format XLI Compression Scheme

One of the services I work for has recently acquired a Philips HeartStart MRx cardiac monitor. It came complete with Bluetooth transmission of 12-Lead and event data. At roughly the same time our service installed a computer, an MDT, into the cab of our unit to interface with our county's CAD software.

Naturally I linked our monitor and our MDT via Bluetooth, and transmitted a 12-Lead from a rhythm generator. When the file landed on the MDT, I looked for an application to view the 12-Leads and rhythm strips, however, none appeared to be able to use the file as-is.

For the non-technical, the ECGs are shipped compressed--somewhat like a ZIP file--which contains all of your monitored vital signs, printed rhythm strips, and your 12-Leads. The format of the 12-Leads is an Open Standard; Philips Healthcare provides most of the details needed to use the files. The 12-Lead data is also compressed to save space. Unfortunately, there is no documentation which tells you how to decompress the 12-Lead data.

The technically-faint-of-heart should skip these next bits.

For the technical, the ECGs are contained in a Gzip'd TAR archive. The 12-Leads are stored inside in an XML format known as the Sierra ECG format (currently at version 1.03 or 1.04, as far as I can tell). Inside this XML format is Base64 encoded, XLI compressed data comprising the acquired leads during a 12-Lead (up to 16 leads appear to be able to be stored).

I searched for a description of the XLI compression format, however, I was only able to find a reference implementation for Microsoft Windows which simply decoded the files. No code or description was provided, and the implementation itself is not portable. (ed: it appears this may be a reference to the HP PageWriter XLi which Philips acquired)

At this point I decided my only option was to reverse engineer the XLI Compression format, and began with simple guesses. I tried decompressing the data using Deflate, Zip, and RLE without any progress. I was able to determine that the first 8 bytes of the compressed data included a compressed length, some uncompressed data, and that each of the 12 to 16 leads were stored in a chunk with one of these headers:
offset   2        4        6        8  ...
+--------+--------+--------+--------+--------+--------+--------+
| Size | Unk. | Delta? | Compressed data... |
+--------+--------+--------+--------+ |
| ... [Size bytes] |
+--------+--------+--------+--------+--------+--------+--------+
| Next lead chunk ... |
Once the simple guesses were ruled out, I began exploring the behavior of the reference implementation provided for the Sierra ECG format. Using OllyDbg I noticed certain code tells which made me believe the decompression algorithm read 10-bits at a time:
SHR   EAX, 16h   ; reduce EAX to the 10-bit code word
SHL ECX, Ah ; prepare to read 10 more bits from the input
The compressed data also did not appear to contain a compression dictionary referenced by the code. At this point I considered I was looking at a form of Lempel-Ziv-Welch, or LZW, compression. LZW is a popular, lossless compression scheme which creates its compression dictionary on the fly. It is used by the GIF and TIFF image formats, and was the subject of controversy when it was first introduced into the GIF format due to patent licensing requirements.

In my quest to quickly reach a conclusion I found an excellent LZW implementation from Mark Nelson in C and it successfully decompressed the data. In fact, the structure of the C code was so familiar, I realized the reference implementation from Philips used the exact same code!

If you've reached this step while following along at home, you'll notice the decompressed data seems front-loaded with 0's. This is a case of intelligently streaming the data to the compression algorithm to take advantage of data duplication.

The uncompressed data represents 16-bit delta codes, of which the majority include 0x00 or 0xFF in their most significant byte (MSB). This is because they are either small and positive or small and negative, and as ECG data is rhythmic the delta codes are likely to retain the same sign for numerous samples.

To take advantage of this fact during compression, the delta codes are first deinterleaved into two halves. The first half includes each MSB and the second half includes each LSB. The pseudo-code for interleaving the decompressed data looks like the following:
# input contains the decompressed data
# output will contain the interleaved 16-bit delta codes
fun unpack( input[], output[], nSamples )
for i <- 1..nSamples
output[i] <- (input[i] << 8) | input[nSamples + i]
endfor
endfun
At this point the delta compression scheme will need to be decoded to produce the actual signal data for each of the leads. The delta compression scheme is a simple recurrence relation (a second order difference relation) using the prior two delta codes:
# output contains the 16-bit delta codes
# first is the 16-bit delta code from the chunk header
fun deltaDecompression( output[], nSamples, first )
x <- output[1]
y <- output[2]
prev <- first
for i <- 3..nSamples
z <- (2 * y) - x - prev
prev <- output[i] - 64 # is -64 to 64 the range?
output[i] <- z
x <- y
y <- z
endfor
endfun
Now that you have the actual, per signal data all you need to do is recreate leads III, aVR, aVL, and aVF. This is done using the data from leads I and II as on most ECG machines. I've omitted the actual formulas for brevity.

Using my reference implementation of the decompression algorithm I was able to feed the original acquired 12-Lead to the Philips ECG to SVG converter, with the following results:


If you'd like to start playing with my code I welcome you to join my Github Project: sierra-ecg-tools. I am also working on a C implementation, and likely an Android implementation. Stay tuned, and apologies for the technical post.

The author has no financial ties to Philips Healthcare and received no compensation for this work.

No comments:

Post a Comment