- The Encoder
- The Sanitizer
- The Security Runtime Engine (SRE)
The Sanitizer (Microsoft.Security.Application.Sanitizer) is meant to parse HTML content that was submitted by a user and return only “safe” HTML tags, attributes, and values. This approach is very, very difficult to do correctly as illustrated by the fact that Microsoft’s Sanitizer library and OWASP’s AntiSamy library have both had vulnerabilities (or strange edge cases) in the past despite the fact that they were carefully crafted by intelligent security developers. Part of this problem is that browsers try to be helpful and interpret what you meant to do rather than strictly parse tags.
Ideally, these types of features wouldn’t be needed at all. Applications should use alternative approaches or markups like Wiki syntax and convert that content into allowed HTML elements. But in the real world, many organizations choose to write applications that accept HTML from users and redisplay it on pages. So, these libraries help make it measurably safer (but probably not completely safe).
So, why am I writing about alternatives to Microsoft’s Sanitizer library? Well, it turns out that Microsoft’s library has been broken for quite a while. The sanitizer is a bit overzealous and strips out pretty much everything. Here are a couple references on the WPL CodePlex site about the issue:
- GetSafeHtmlFragment replacing all html tags (Discussion/Issue)
- Update on the sanitizer.
- http://wpl.codeplex.com/discussions/377019 (Discussion)
Since the Microsoft solution isn’t an option, I looked at the Anti-Samy .NET library (the original is for Java). After investigating this solution, I learned that it is no longer maintained and it seems the last code check-in was in 2009. The solution may work and be good, but this didn’t give me a lot of confidence. I chose to keep looking without investigating Anti-Samy .NET further.
This led me to the following blog post:
It recounts the author’s frustration with Microsoft’s Sanitizer and provides a solution using the HTMLAgilityPack library. I noticed HTMLAgilityPack.dll was also present in the root directory of the Anti-Samy .NET svn repository. I wrote a quick sample application that implemented everything provided by the author, except one change. "html.DocumentNode.DescendantNodesAndSelf()” is deprecated, so I replaced it with “html.DocumentNode.DescendantsAndSelf()”. It worked right away, and seemed easy to customize. I spent a short, limited amount of time sending XSS payloads to the application and sanitizing the output. It protected against all the attacks I sent; however, I would call my testing very preliminary. I think it would be critical that a more concentrated effort be put in to test the code before it was used in a production environment. And, as stated above, this approach is dangerous and difficult to get right. There will likely still be vulnerabilities to contend with down the road as browsers change and try to “help” more. Regardless, some web sites NEED this functionality because of the design decisions chosen, making this code a good alternative to Microsoft’s Sanitizer library.