PowerPoint

Report
Modeling the HTML DOM and Browser API
in Static Analysis
of JavaScript Web Applications
ESEC/FSE 2011
Anders Møller, Magnus Madsen
and Simon Holm Jensen
1 / 28
Motivation
• How can we help developers writing
JavaScript web applications?
– by providing tools for findings bugs early in the
development cycle
• In this work we focus on finding bugs in the
way JavaScript programs interact with the web
browser
2 / 28
JavaScript in a browser
user
interaction
rendering
web browser
events
DOM manipulation
The Document Object Model
JavaScript code
3 / 28
Example
The el.button property
is always absent
(it is undefined)
An HTMLImageElement
object does not have a
button property
Unreachable
The programmer has
confused el and ev
4 / 28
TAJS: Type Analysis for JavaScript
[S.H. Jensen, A. Møller and P. Thiemann SAS '09]
A tool for static analysis of plain JavaScript
– the starting point for our work
– flow-sensitive dataflow analysis
– interprocedural
– whole-program analysis
– intended for non-minified, non-obfuscated code
5 / 28
Bug Finding
We look for general errors such as:
– dead or unreachable code
– invocations of built-in functions with an incorrect
number of arguments or wrong argument types
– undefined dereference
– reading absent properties
– etc.
6 / 28
Contributions
We extend the static analysis of TAJS to reason
about JavaScript that execute in a browser:

– how to model the browser API?

– how to model the HTML page?

– how to model the event system?
• 100s of non-standardized objects and functions
• complex prototype hierarchy of the W3C DOM
• many kinds of events
• dynamic registration of event handlers
7 / 28
Architecture


Browser API

Flow graph
extension
DOM model
TAJS
• JavaScript code
• Event handler code
<script>...</script>
• Named tags
potential errors
<div onclick="..."/>
<form id="foo">...</div>
8 / 28
The Browser API 
• The global window object
–
–
–
–
history, location, navigator, screen
alert(...), print(...), encodeURI(...)
setTimeout(...), setInterval(...)
addEventHandler(...)
• Non-standard and legacy functionality
9 / 28
The HTML DOM 
• The Document Object Model (W3C)
– tree like structure
– e.g. one JavaScript object for each HTML tag
• HTMLInputElement, HTMLFontElement, etc.
– arranged in a large prototype hierarchy
• Huge amount of properties and functions
– most properties are string or integer constants
10 / 28
The HTML DOM
• Important functions
–
–
–
–
createElement(...)
getElementById(...)
getElementByName(...)
getElementByTagName(...)
• The analysis tracks elements by:
<img id="foo" name="bar"/>
Tag
ID
Name
11 / 28
Prototype Hierarchy
The complete model has ~250 objects and ~500 properties
12 / 28
Choice of Abstraction
Model the DOM objects as:
single abstract object
single abstract object
for every element kind
abstract object for every
element in the initial
HTML page
Our Choice
<img>
<img>
<div>
<img>
<img>
<div>
<img>
<img>
<div>
13 / 28
Straightforward Hierarchy?
• The image tag looks pretty innocent:
<img src="a.png" alt=""/>
• Image objects can be created in several ways:
new Image();
document.createElement("img");
14 / 28
Example
15 / 28
Image Prototype Hierarchy
Object
(prototype obj)
HTMLImageElement
(prototype obj)
Image
(prototype obj)
HTMLImageElement
(instance obj)
Image
(instance obj)
new Image();
document.createElement("img");
HTMLImageElement
(constructor obj)
Attached to
window
Attached to
window
Image
(constructor obj)
Blue arrows are internal prototype links
Red arrows are external prototype links
16 / 28
Registration of Event Handlers 
• Directly in the HTML source
– <div onclick="...">
• Using the Browser API
– setTimeout(...), setInterval(...)
– addEventListener(...)
• Writes to "magic properties"
– x.onclick = ...,
Special properties
that have sideeffects on the DOM
when written to
17 / 28
Tracking Event Handlers
Separate event handlers based on their kind
– page load (onload)
– keyboard (onkeypress, ...)
– mouse (onclick, onmouseover, ...)
– timed (setTimeout, setInterval, ...)
– etc.
18 / 28
Flow graph Extension
Event handlers are executed
by introducing an eventhandler-loop
– separates page load event
handlers from other event
handlers
– executes event handlers in
two non-deterministic loops
19 / 28
Evaluation
• With these extensions TAJS can reason about
JavaScript applications that run in a browser
• Is the analysis precise enough to be useful?
20 / 28
Benchmarks
Evaluated on a series of benchmarks:
– Chrome Experiments
– Internet Explorer 9 Test Drive
– 10K Challenge – A List Apart
– (excluding benchmarks using eval, jquery or not
relevant for JavaScript)
21 / 28
Research Questions
Q1: Ability to show absence of errors?
The analysis is able to show that
• 85-100% of call sites are safe
• 80-100% of property reads are safe
22 / 28
Research Questions
Q2: Ability to locate sources of errors?
– We randomly introduce spelling errors
– The analysis is able to pinpoint most of them
(details in the paper)
23 / 28
Research Questions
Q3: Precision of computed call graph?
The analysis is able to show that
90-100% of call sites are monomorphic
24 / 28
Research Questions
Q4: Precision of inferred types?
– boolean, number, string, object and undefined
– the analysis is able to show that the average type
size is 1.0-1.3
• e.g. if the average type size is 1.0 then every read in the
program results in values of a single type
25 / 28
Research Questions
Q5: Ability to detect dead or unreachable code?
– found several unreachable functions
– most appear to be unused library code copy &
pasted directly into the benchmark programs
26 / 28
Future / Current Work
• Dynamically generated code
– eval
• Library support
– jQuery, MooTools, etc.
27 / 28
Conclusion
Extended previous work to reason precisely
about JavaScript programs that execute in a
browser-based environment
allows us to discover general errors such as:
•
•
•
•
reading absent properties
dereferencing null or undefined
invoking functions with incorrect arguments
etc.
28 / 28
29 / 28
DOM Modules & Levels
Module \ Level Level 0
Level 1
Level 2
Level 3
Core Module
-


()
HTML Module
-


()
Event Module
-
-

()
CSS Module
-
-
()
()
Browser API

-
-
-
~1996
1998
2000
2004
Year
In addition we support the HTMLCanvasElement from HTML5.
30 / 28
Soundness Issues?
Assignment to computed property names
foo[bar] = "baz"
foo[bar] = function() {...}
If the exact value of bar is unknown:
– it could be a write to a "magic property"
– or a registration of an event handler
31 / 28

similar documents