While generating a PDF from a dynamically created HTML file, I found that the PDF generation failed as there were non UTF-8 characters in the HTML file.
To try and find these characters, I used the strings command with the -n 8 switch to remove any non UTF characters:
cat original.html | strings -n 8 > nonUTF.html
I was then able to compare the two html files to find out where the non UTF characters were appearing.