Investigating a legacy document delivery DRM system

Background

This post concerns a DRM system used in an online document delivery platform (think PDFs, but proprietary), established circa 2000 and still in popular operation. Documents purchased through the platform are delivered in a proprietary encrypted file format, which can be opened using a proprietary viewer and printed a limited number of times.

As always, the particular DRM system is not relevant and will not be identified.

Webapps, the reverse engineer's best friend

The viewing software for this platform has gone through a number of iterations, beginning with a browser plugin, then expanding to a desktop client in the early 2000s (which appears only to have received occasional minor updates since then). More recently, an Android and iOS client have also been released, but all of these sound rather challenging to reverse-engineer.

Thankfully for us, an HTML5-based JavaScript ‘printer’ viewer has also recently been released – the ability to easily view, manipulate and debug the JavaScript code will make reverse-engineering much easier. The viewer happens to be quite limited, with only the minimum functionality required to print documents, but this is more than sufficient for our purposes.

Rehosting the webapp

The HTML5 viewer is encapsulated within a single HTML page, /view/Viewer.aspx. The page loads various minified JavaScript libraries, and then calls into JavaScript, passing a number of parameters:

<script src="/js/Viewer1.min.js"></script>
<script src="/js/Viewer2.min.js"></script>
<script src="JSViewer.min.js"></script>
<script>
	window.onload = function() {
		Html5Init({
			'doc': 'XY1234567',
			'key': '1234567890ABCDEF',
			'uid': '1234567890ABCDEFGHIJKLMNOPQRSTUV'
		});
	}
</script>

These parameters are, oestensibly, the document ID, a decryption key, and some user identifier for fingerprinting purposes.

It is a simple matter to download this HTML file, parameters and all, and the referenced JavaScript documents, and serve them locally. To allow us to emulate any dynamic behaviours, we will serve the application using a simple Flask app:

from flask import Flask, render_template, request, safe_join, send_from_directory
app = Flask(__name__, template_folder='jinja2')

@app.route('/view/Viewer.aspx')
def viewer():
	return render_template('Viewer.html', doc=request.args['doc'], key=..., ...)

@app.route('/<path:fname>')
def viewer_static(fname):
	return send_from_directory('static', fname)

Through a combination of the Firefox Developer Tools Network tab and the Flask request logs, we can continue to download the remainder of the assets required by the application:

$ ls -al static
total 56
drwxr-xr-x 13 runassudo runassudo 4096 Jul 18 17:08 .
drwxr-xr-x  7 runassudo runassudo 4096 Jul 18 16:50 ..
drwxr-xr-x  3 runassudo runassudo 4096 Jul 18 17:08 assets
drwxr-xr-x  3 runassudo runassudo 4096 Jul 18 13:54 css
drwxr-xr-x  3 runassudo runassudo 4096 Jul 18 13:54 fonts
drwxr-xr-x  2 runassudo runassudo 4096 Jul 18 13:56 images
...

Reimplementing the document delivery endpoint

At this point, we have rehosted the HTML5 viewer itself, but we have not implemented the endpoint for fetching the documents themselves. Through the Developer Tools Network tab, we identify an XHR POST request to /view/DocumentDelivery.aspx with the following payload:

<?xml version="1.0" encoding="utf-8"?>
<DocumentDelivery>
	<ver>0</ver>
	<rev>1.01</rev>
	<msg>reqDocument</msg>
	<uid>1234567890ABCDEFGHIJKLMNOPQRSTUV</uid>
	<doc>XY1234567</doc>
	<why><to>view</to></why>
	<ran>42</ran>
</DocumentDelivery>

The response payload is a binary object which file identifies for us as a RIFF container. We can simply download this file, and update our Flask code to serve it at this endpoint:

import xml.etree.ElementTree as ET

@app.route('/view/DocumentDelivery.aspx', methods=['POST'])
def delivery_api():
	xmlroot = ET.fromstring(request.data)
	if xmlroot.find('msg').text == 'reqDocument':
		doc_id = xmlroot.find('doc').text
		return send_from_directory('docs', doc_id)

Examining the document payload

With the knowledge that the document is encapsulated in a RIFF container, we can use rifftree to outline the structure of the file:

$ rifftree /tmp/b.mtd
RIFF(DOCF)->
            docp;
            LIST(INFO)->
                        Iuid;
                        Idlr;
                        Ipop;
                        Iclt;
                        Idoc;
                        ...

Looking at the file in more detail in a hex editor, we can see a header containing a variety of metadata:

00000000: 5249 4646 ac56 0100 444f 4346 646f 6370  RIFF.V..DOCFdocp
00000010: 0800 0000 3031 3030 0d00 0000 4c49 5354  ....0100....LIST
00000020: bd04 0000 494e 464f 4975 6964 2100 0000  ....INFOIuid!...
00000030: 3132 3334 3536 3738 3930 4142 4344 4546  1234567890ABCDEF
00000040: 4748 494a 4b4c 4d4e 4f50 5152 5354 5556  GHIJKLMNOPQRSTUV
00000050: 005c 4964 6c72 0b00 0000 4143 4d45 2043  .\Idlr....ACME C
00000060: 6f72 702e 0000 4970 6f70 1a00 0000 6874  orp...Ipop....ht
00000070: 7470 3a2f 2f65 7861 6d70 6c65 2e63 6f6d  tp://example.com
00000080: 2f61 6263 3132 3300 4963 6c74 0c00 0000  /abc123.Iclt....
00000090: 4a61 6d65 7320 536d 6974 6800 4964 6f63  James Smith.Idoc
000000a0: 0a00 0000 5859 3132 3334 3536 3700 4963  ....XY1234567.Ic
...

The actual document data, though, appears to be binary nonsense, and likely encrypted. Considering the complexity of the documents involved, a black-box approach to reverse-engineering the format would probably take some time.

Nonsense nonces, or: ‘Reality can be whatever I want’

Returning to our rehosted HTML5 viewer, we have now essentially performed a replay attack, feeding our local HTML5 viewer with the payload data captured from the server. With this, the viewer dutifully outputs a preview of the document and allows us to print it.

Something is wrong, however – the preview is only displaying the first page of the document, and subsequent pages remain inaccessible.

By setting breakpoints in both the real HTML5 viewer and our local copy and painstakingly comparing the execution flow between both, we see that our local viewer diverges as follows:

var u = 0, y = 0;
...
while (...) {
	...
	
	if (C.Foobar && C.Foobar.Ok()) {
		// The real HTML5 viewer proceeds down this branch
		p(C, C.Foobar.Prep(w))
	} else {
		// Our viewer proceeds down this branch
		v.Skip(z.Len);
		u = 7
	}
	
	...
	if (u == 0) {
		y += 1;
	}
	...
}
...
C.PageCount = y;

At the point in the code where the divergence occurs, y is 1, i.e. one page (the first page) has already been successfully loaded. In the real viewer, the code continues to load the remainder of the pages, but in our rehosted viewer, something about Foobar is not ‘Ok’, an error flag is set, and the remainder of the pages are not loaded.

Digging into the file where Foobar.Ok is defined, we find a large number of DataViews, buffers and bitwise operations with funny magic numbers, of the kind we might expect to see in a decryption algorithm. But we have already provided the JavaScript code with the correct decryption keys, so what has gone wrong?

Within this file, we find a curiously-named Ran function:

d.prototype.Ran = function() {
	var e = new Date();
	return e.getSeconds() * e.getMinutes() * e.getMilliseconds() / 13585
};

This code apparently has the purpose of serving as a rather crappy random number generator. It's so bad, in fact, that once every hour, when e.getMinutes() is zero, the function will simply return 0 for an entire minute! Why the developers did not use Math.random is beyond me. Presumably this may have been an attempt at obfuscation, but if you ask me, the function is hardly well-disguised.

This Ran function is called in only one place:

h.H = h.r.Ran() ^ 117;

h.H is then promptly fed into one of those confusing-looking functions with many bitwise operations and magic numbers. H also appears in the following revealing snippet:

f.prototype.N = function(h) {
	var g = this.ToDeliver("reqDocument", h);
	g += "<ran>" + this.H + "</ran>";
	g += "</DocumentDelivery>";
	return g
};

Aha! So the <ran> element in the XML request from earlier is some kind of nonce used by the server to further protect the subsequent pages of the document. Naturally, this defence is easily foiled, as we also control this random value.

We make a one-line change to the JavaScript code:

//h.H = h.r.Ran() ^ 117;
h.H = 42;

And with that, our local viewer now happily decrypts the remainder of the document, and will happily allow us to print as many copies as we like.

Under normal circumstances, the viewer would make requests to the server to keep track of the number of times the document has been printed, and lock us out after the quota has been reached – but as our server does not implement this feature, the viewer will continually allow us to print and print, none the wiser that its attempts to phone home are falling on deaf ears.

However, supposing, say, that we wanted to convert these documents into standard PDFs, the print process still utilises the built-in functionality of the HTML5 viewer, which inserts copyright notices, watermarks the document with the name of the purchaser, and so on. Perhaps there might be a way around this?

Next part

Directly extracting a PDF file, or: automating DRM-breaking for ?fun and ?profit