Jul 23, 2011

On DOM Snitch internals and some of the rationales behind them

Given some of the feedback I've received during the first couple of weeks after releasing DOM Snitch, I'd like to shed more light into some of the inner workings of DOM Snitch and the rationales behind some of the decisions that were made while building the tool.


Topic 1: Why is DOM Snitch not catching security issues executing at load time through inline JavaScript?

In its current implementation, DOM Snitch is set to start running on a page as soon as the DOMWindow object is created, but before the DOM tree is built. This allows the extension to act either as soon as it's instantiated or when any of the DOM modification events get dispatched. Relying on the events, however, comes with a cost and that is the inability to know what caused the event to be dispatched in the first place; therefore resulting in the tool's inability to gather proper debug information. (I should add the disclaimer that currently DOM Snitch does not use any of the V8 debugging functionality.)


Topic 2: Is DOM Snitch using any of the experimental APIs in Chrome?

The short answer is "no". One of the early goals I set while building the tool was to stay away from experimental APIs or touching the Chromium code base as doing either one of the two will get in the way of deploying easily without changing the security posture of the user. Additionally, using unsupported functionality might result in some maintenance issues further down the line. That being said, I'm quite keen on using the chrome.experimental.debugger API should it become supported by the Chromium team.


Topic 3: On innerHTML, outerHTML, and stale pointers inside JavaScript

Stale JavaScript pointers is one side effects that really worries me when intercepting innerHTML. Although, the tool has gone through lots of iterations to getting this right, I must admit that people still find creative ways to introduce a stale pointer somewhere in their code. By giving a bit more detail on how innerHTML is intercepted, I hope that developers will pay some attention as to what may go wrong in their code from a testability perspective.

In its current version, WebKit does not reveal the internal innerHTML pointers through the getter and setter methods. As a result, by overwriting the innerHTML setter, DOM Snitch loses the value of the original pointer. To counter this, DOM Snitch re-creates the action of setting an element's innerHTML by appending a new child element and setting the child's outerHTML to the intended innerHTML value ( and as it turns out, this will force WebKit to throw away the newly created child element and replace it with whatever children that get introduced through the HTML content). You can see this implementation here.

Jun 4, 2011

Creating spreadsheets in Google Docs... mission possible?

Exporting data as spreadsheets into Google Docs sounds easy and quite feasible. However, it also has a few gotcha's. After struggling for a day, I've decided to share my notes on approaching this problem.

Creating the spreadsheet
My initial requirement for my application is to export my data into a brand new spreadsheet where the user will have ownership of it. A quick stroll through the Spreadsheet API led me to a section describing the need to upload a spreadsheet beforehand or create one manually. This was definitely against my requirements.

However, buried inside the Google Documents List Data API is a mechanism for creating empty documents or from a template (I'll come to this in a minute). Voila!

Changing the term attribute to http://schemas.google.com/docs/2007#spreadsheet surely does the trick if I want a completely blank spreadsheet.

Feeding data into the spreadsheet
This is where it becomes more tricky -- as it turns out, there isn't any convenient way to use the list approach (something I preferred using as opposed to the cell approach) to populate an empty spreadsheet without first telling Google Docs the schema that your list would use. One bit of observation to make is that the list schema is derived from the header row in your spreadsheet, therefore if the A1 cell stores "sample text" as its value, every subsequent row will have the "A" column described as <gsx:sampletext>. In my case, I opted to use a pre-built template from which to create my spreadsheet (see above about copying documents); however, there might be also a workaround involving batch cell update.

Some random tips
Here are some random bits that I found useful during this exercise:
- Always pay attention to the response object when posting new items (be it documents, worksheets, list entries, etc.) to Google Docs. The response object will provide you with some very useful information (stored inside the link element of the feed/entry) about what are the next logical APIs to call.
- The _worksheetId_ for the default worksheet in a newly created spreadsheet is... yes, "default". Should you need to add a row to it, you can simply call POST https://spreadsheets.google.com/feeds/list/[spreadsheet key]/default/private/full.


Edit: I went back to experiment with alternative ways of setting the header row of the default worksheet. As expected, one can set the header row by fetching the cells there and updating them with whatever contents they need to have.

Mar 13, 2011

Separating code from data... are we there yet?

Disclaimer: The opinions in this post are entirely my own and should not be associated with any current or previous employer of mine.

Mixing code and data has been a big (and I mean BIG) problem for decades in security. Setting security design flaws aside, having the ability to transform data into code has led to some of the biggest and probably most notorious examples of why software security really matters. Period.

While there is a lot of progress in handling this problem in native code (examples are plenty: GS, SafeSEH, DEP, ASLR, etc.), where does this leave web? Unlike native applications, the web has one major disadvantage: it relies heavily on parsers that help with the execution of dynamically written code. Browser security aside, there have been a few initiatives to help developers separate code from data through validating data on input or rendering data in a safe format. Although they've been done with a noble intent, I am still not convinced that this is a comprehensive enough solution. Here is why:
  1. Input validation. Input validation limits the user in submitting data... both malicious and legitimate; therefore limiting the application's usability and the its users' productivity. For instance, should the folks maintaining the input validation need to be aware of every context in which their data is used (e.g. database query, JavaScript/JSON, HTML, etc.)?

  2. Output encoding/escaping. Output encoding/escaping is useful for transforming data into a format that is then interpreted as data within the context where the data will be used. However, the major question here is what is the actual context in which the data will be used? In the following sample snippet the data passes through at least 2 different contexts (HTML and JavaScript):
    <html>
    <head>
    <script>
    document.write('<div id="' + [USER_CONTROLLED_DATA] + '">bla</div>');

    So naturally the question: how should the encoding/escaping be done in such cases? According to which context should the encoding/escaping be tailored?

  3. Contextual escaping via templates. In most cases and if (a big IF) properly enforced, templates are a good thing to use. Needless to say, with their help development teams can streamline a solution to the problem. From that point on lint and checks on commit are the way to go. However, what if the templates are not strictly enforced? What if the development team provides special cases where data should not be escaped? What if data from upstream already comes escaped/encoded?

So where does this leave web? I suspect the biggest challenge is to separate the channels through which code and data are communicated. Avoiding or limiting the use of code parsers and eval() (or equivalents of it) for parsing data are definitely steps in the right direction...

P.S. - This post has been picking a lot on cross-site scripting as an example. However, the same logic holds for other types of security issues that arise from injection based attacks.

Feb 25, 2011

The evil magic of eval

Eval is a special, very powerful function in many scripting languages. JavaScript is not an exception. Although quite useful for dynamic manipulation of the existing code, any misuse of eval can cause a lot of problems and headaches. StackOverflow covers this topic well, with the exception of one very important bit: eval can get in the way of testability.

Static analysis
With JavaScript being what it is, it is easy to assume that one can painlessly audit the scripting code that runs inside the browser and understand how the client-side part of a web application functions. Well, that's not entirely true. With web applications becoming more complex, we often run into cases where we have a 60k line obfuscated code that looks a lot like this:
function a() {
this.c = eval;
}
...
var b = a();
var d = b.c(e);

How should this be tested or audited? Simple answer: you can't really or at least not easily.

Dynamic analysis
This leads me to the next bit: what about dynamic analysis? What if we "hook" into eval and listen to the code as it goes by? As it turns out, this isn't a fool proof solution either. Eval is not exactly a function. In fact, if you ask any experienced developer, he/she may flat out tell you that eval is magic and should not be touched. To re-use the example from StackOverflow, what value should the alert box show?
var a = 10;
function foo() {
var a = 1;
eval("a+=1");
alert(a);
}
foo();

As it turns out, eval is aware of the current scope from which it was called; therefore the alert box will show 2 and not 11. This is important because as you overload eval, you change the scope in which it operates (off by 1 stack frame). As a result eval will attempt to manipulate a global object and not the local one. This in turn may lead to a half baked page.

Conclusion
Although eval is very powerful, it is also overly misused for parsing and/or processing data (example: using eval to parse JSON). In a nutshell, eval provides one of the easiest ways to mix code and data... a dangerous thing to do nowadays.

P.S. - Although some browsers (namely Firefox) may allow you to overload eval at a lower level through an extension, it is not necessarily the same case with all mainstream browsers; therefore, some edge cases may be missed.

Jan 28, 2011

The importance of the end-user experience: internationalization

A friend of mine always says: "I don't want more features. I want working features." This friend is also a big fan of Apple and the simple, clean, pleasant, intuitive user interfaces. In principle I agree with him (although I'm not an Apple fan boy), but I'd also like to add that software should work not only in its own little environment, but also in the user's environment.

I've ranted time and time again about the importance of localization and internationalization privately that I think it's time I express my opinion in the public. I'd like to pick on the Apple iTunes Store as an example and point out some of the reasons for my rants.

Here it goes...

Different country means different store.
Be it due to legal issues or some business strategy, Apple has decided separate entities to operate each store at the country level. So for instance, an Inc. operates the U.S. store, a B.V. operates the Dutch store, and so on. One thing worth of noting is that the content in each store differs. While the U.S. store offers games, music, movies, and so on, the Bulgarian store, for example, offers only a limited subset of the iOS apps that are available to U.S. customers.

Transferring from one store to another you ask? Not so easy. Buying a product in the Dutch store does not guarantee that you will receive your updates if you change to the Bulgarian or U.S. stores. Sure, you will get notified that your apps are out of date and a determined attacker may use them to compromise your email and private messaging... but in reality, the app you've installed has been purchased from a separate entity and is not available at the store which you are using. This brings me to the next point...

Your billing address determines your locale.
This is the fun bit in Apple's model. Everything is tied to the customer's payment card. Here is a trivial question: you're an expat living in the U.K., having just moved from Germany (thus having a credit card from a German bank), and are receiving your monthly statements in the Netherlands. Which locale of the Apple Store would you say you are using? The Dutch of course, your billing address is in the Netherlands.

Your locale determines your language.
Here's even the better question: what are the supported languages for your locale? English? German? Actually... Dutch is the only supported language in the Dutch locale. It's sad to see big companies like Apple deciding for their users that if a user receives his/her bank statements in a given country, then he/she surely speaks the local language.


So why am I sharing all this? Well... I for once experienced it as a user. It is not pleasant and it turns users away. Users deserve the right to use software at their own comfort level without being victims of engineering mistakes. Let's learn from this.

Nov 26, 2010

The cyber crime ecosystem

I'm re-sharing via Dexter a presentation from Albert Hui on cyber crime. It's impressive to see how far this has grown technology wise since early 2000s. It seems like yesterday when I read the reports on MyDoom experimenting with it's powers to bring DDoS against Yahoo and Google in the UK.

Anyhow, slide 9 in Albert's presentation is particularly interesting as it shows a nice overview of the players in the cyber crime underground.

Sep 9, 2010

Manager index continued...

After my last post I did a bit more experimenting with the manager statistics. This time I've added a second Bulgarian manager -- Yassen Petrov, the current manager of Levski Sofia. Since both Stoilov and Petrov started their careers roughly the same time, it's interesting to see how they stack against each other. Feedback is always welcome. Enjoy! :)

Edit: I've now added Ilian Iliev, the manager of Beroe to the mix.