The Shared Source CLI Archive

An overview of the SSCLI
Mark Lewin
Microsoft Research
[email protected]
Stats for SSCLI Archive
(approximate as of 27 March 2002)
• Packaged as single compressed file archive
• 1.9 million lines of code
1.15M of C and C++
625K of C#
125K of CIL (intermediate language)
Smattering of assembly code
• 5900 source files (9700 total)
– 2900 tests
• Build output
– 1200 defined types
– About 20 dynamically loadable libraries
– About 22 executable programs
The Lay of the Land
Four major areas in source code
Runtime “execution engine”
Compilers and tools
Portability layer, tests, and build infrastructure
Other important points of interest
– License
– Documentation
– Samples
Root Directory: License & Doc
• Shared source license
– Allows for non-commercial redistribution of both
binary and source, including modified versions
– Simple restrictions (primarily non-commercial use)
– Points to relnotes.html with late breaking items
– Points to docs/index.html
• Also important, but not included:
– ECMA specifications
– Microsoft .NET Frameworks SDK
Area #1: Execution Engine
• Heart of component-oriented infrastructure
• Converts metadata, resources, and CIL (on disk as
PE/COFF image) into running code
• Great interoperability with “unmanaged code”
• JIT compilation and IL verification
• Cross-language exception handling
• Language-agnostic, object-capable, type system
• Automatic heap and stack management
• Dynamic code loading
• Evidence-based security (code access security)
Look For: PE Reading/Writing
• “Portable Executable” format
– Based on PE/COFF (see ceefilegen for details)
– Uses existing extension mechanism
– Three kinds of CLI data: metadata, CIL instructions, resources
• Persistent packaging for types
– Methods include stack frame size, types of local variables and
parameters, pinned variable info, and exception handler table
– Powerful metadata extensibility
• Logical components as “assemblies”
– Enforced security boundary, see fusion for binding logic
– Also contains refs to other assemblies and version info
– Cryptographic “strong names” an option (required for sharing)
Look For: Metadata Facilities
• Metadata is heart of CLI execution model
Metadata interwoven with instruction set
Read-only PE pages shared between processes
Tokens used for addressing metadata
Data stored in tables (heaps when variable length)
Tokens are concatenated table + record number
Optimized for load time at expense of generation
complexity/speed; compact within these parameters
• Three levels of API
– Reflection for managed code
– Both internal and external API for C/C++, see md
Look For: CIL
• Basis for most managed code execution in the CLI
• Simple stack machine model – see vm directory
• Typeless opcodes defined in opcodes.def
– Verifiable subset – see fjit directory for verification
– Signed and unsigned via opcode, not type
• CIL code is JIT compiled to run under CLI management
– Type layout, control, and dispatching
• Typed variable argument lists, dynamically typed pointers
• Tail calls, virtual dispatch, call via function pointer
– Rich set of conversion operations, calling conventions, interop
– Exception handling (two-pass)
– Garbage collection
Area #2: Frameworks
• A “minimal toolkit” for the 21st century programmer
• Base class library
Collections, arrays, strings, and other compound datatypes
Globalization and formatting
System services (threads, I/O, synchronization, etc.)
• Floating point and extended arrays libraries
• Networking, regular expressions, and XML libraries
• Includes access to runtime infrastructure
– Reflection and custom attributes
– Remoting, unmanaged interop, serialization, and marshaling
– AppDomains, Assemblies, GC, other execution engine features
Area #3: Compilers and Tools
• Full-featured C# compiler (also used in build)
• JScript compiler written entirely in C#
• Additional developer tools
– clix, the shared source CLI program launcher
– Assembly tools: resource compiler, assembly linker,
metainfo metadata viewer, assembler, disassembler
– Debuggers: cordbg managed debugger, plus a debug
extension for working on managed code from C/C++
Interesting C# Features to Examine
Unified type system (boxing)
Extensible attributes
struct types
Enumeration types
Delegate types
Unsafe code
Conditional methods
Interop (parameter marshaling)
Multi-dimensional arrays
Operator overloading
User-defined conversions
Variable parameter lists
ref and out parameters
Unsigned types
Decimal type
Conditional compilation
foreach statement
Overflow checking
Explicit interface implementations
Area #4: Build, Port, & Test
• Platform Adaptation Layer (PAL) in pal directory
System resources all consumed via PAL
Less than 250 partially complete Win32 APIs
Missing system features layered into PAL on any given platform
“PAL runtime” contains shared code for upper level tools
• Good specification can be found in pal_guide.html
Loading and teardown of context
Threading model, scheduling, and timers
Synchronous file I/O, asynchronous network I/O
Synchronization (critsec, mutex, semaphore, events)
Debugging and tool support
• Additional work for port:
– JIT (very portable 3 layer macro emitter design)
– Small amount of assembly code in execution engine
Build Infrastructure and Utilities
• Build scripts and programs
– Scripts for environment setup and build
– build, binplace, nmake, and resource compiler utilities
• Utility configuration programs
gacutil – manage the “global assembly cache”
sn – manage assembly signing
peverify – verify CIL in an assembly
caspol – modify policy for “code access security”
storeadm – manage “isolated storage”
Build Bootstrap Process
• Initial phase
– Build PAL using native toolchain
– Build build tools against PAL
– Complete by building resource compiler and PAL runtime using
newly compiled build tools
• Main phase has complex interdependencies
– Unmanaged frameworks, compilers, and tools, are built
– Managed portions of runtime and frameworks are built
– Assemblies and runtime are configured
• Finally, when managed portions are used at runtime, they
are, of course, JIT compiled
• Incredibly handy when modifying code!
• Two major areas: managed code and PAL
• PAL suites
– Test driver script is
– Can be used on new PAL implementations
• Quality suites
– Test driver script is
– Three primary areas in this release
• IL verification
• “BVT” smoke tests – many small programs
• Tests for the BCL
Additional Resources
• ECMA specs:
• Microsoft commercial C# and CLR SDK
• Shared source info:

similar documents