Distorting the Web
Today I had to deal with a customization involving the change of the canonical link in some pages of a website, under certain conditions, due SEO requirements. The problem: the code of the website could not be modified, as it runs as a binary application quite difficult to modify and recompile.
So, I managed to put a proxy in front of it.
The above mentioned binary application was already proxied by an nginx instance handling SSL connections, gzip compressions, and bouncing requests to the local TCP port 4000. So, I added another location
directive specific for the path of the pages I had to rewrite and re-routed those requests to my other local proxy.
Using the JS module http-proxy implementing the required transformation is almost trivial, and the code can be summarized as:
const JSDOM = require("jsdom").JSDOM;
const httpProxy = require('http-proxy');
var proxy = httpProxy.createProxyServer({
selfHandleResponse: true,
});
proxy.on('proxyRes', function(proxyRes, req, res) {
var body = [];
proxyRes.on('data', function (chunk) {
body.push(chunk);
});
proxyRes.on('end', function() {
body = Buffer.concat(body).toString();
const dom = new JSDOM(body);
const now = new Date();
if (conditions_are_verified) {
dom.window.document.querySelector('link[rel="canonical"]').setAttribute('href', 'https://something.i.like.com/');
body = dom.serialize();
}
res.write(body);
res.end();
});
});
const http = require('http');
httpServer = http.createServer(function(req, res) {
proxy.web(req, res, {
target: 'http://localhost:4000',
});
});
httpServer.listen(8042, () => {
console.log('HTTP Server running on port 8042');
});
The same bold solution can be easily applied for any other HTML customization of any non-modifiable websites, and it would be also easy to add a Memcached integration to cache transformed pages and speed up the whole process.