vectorwise key features - Spatial Database Group

Presented by
Marie-Gisele Assigue
Hon Shea
Thursday, March 31st 2011
relational database software for reporting,
data analysis and Business Intelligence
 Meaning it has to analyze terabytes of data
 VectorWise recently set a new record on the
TPC-H benchmark
 Financial
services like banks, wall street
Desirable to query historical data as well as
current positions
Data volume is simply too large to store cost
effectively in memory
VectorWise delivers in-memory performance with
data stored on-disk
 Social
media: for example for advertisement
 E-commerce
Performance of database suffers as the amount
of historical data grows
VectorWise is able to deliver good performance
even when analyzing large amount of data
Using large CPU cache as execution memory
SIMD instructions allows the same operation performed on
multiple data simultaneously
Traditional databases process data one tuple at a time
Vectorwise processes vector of hundreds of element at once
Size of vector is tuned to fit into cache
Supported by Intel Xeon processor
Speeds up operations like:
Selections on strings using wild card matching
Aggregations on string-based values
Joins or sorts using string keys
Up to 2 – 4 times faster
 Use
of COLUMN-BASED storage
For data warehouse databases, most queries retrieve
many rows
Row-based storage would generate a lot of unnecessary
Column-based storage is generally accepted as a
superior storage model for this type of workload
 VectorWise’s
Hybrid Column Store
By default data is stored column by column
For tables that are indexed on more than one column,
indexed columns are stored together in the same block
This data storage model is known as PAX (Partition
Attributes Across)
PAX delivers better cache performance
 Guarantees
ACID properties
 Supports multi-version read consistency
Small inserts, updates or deletes is expensive in
column-based database (as opposed to large bulk
data load operations)
PDT is an in-memory structure that stores the
position and the change (delta) at that position
PDTs use a configurable amount of memory. Once
the memory pool is exhausted, updates are
written to disk
- VectorWise compresses data on a column-bycolumn basis using these any one of these
algorithm: RLE(Run Length Encoding,
PFOR(Patched Frame Of Reference or delta
encoding on top of PFOR))
- For instance the VectorWise Innovated use of
data compression in order to improve
performance by allocating a portion of
physical memory for a memory-bases disk
buffer called the CBM(Column Buffer
Manager). The data is automatically prfetched from disk and stored in the CBM.
storage indexes in extreme cases can provide
the same benefit as data partitioning does
for other databases w/o the overhead of
multiple database object or maintaining a
partitioning strategy.
- VectorWise automatically maintains a storage
index per column storing minimum and
maximum values for the data block.
- Very efficient in determining whether a
database block is a candidate block for a
particular query.
 Parallel execution provides the greatest
performance improvements in DSS (Decision
support system) and data warehousing
environments. The VectorWise engine is able to
sustain a large amount of concurrent queries
efficiently on a multi-core system
Ex.of Parallel Execution Server Connections and Buffers
 New
record set by Ingres
for the TPC-H benchmark at
the 100GB scale factor is an
astounding 3.4 times faster
than the old mark.
 New record of 251,561
QphH (Queries per hour) for
100 GB of data was set by
Ingres's VectorWise
database running on one HP
ProLiant DL380 G7.
 Enables
you to a workload on a server
 Can lower the cost instantly by better
utilizing your hardware (dynamic).
 Achieve extremely fast performance for
typical data warehouse and data mart
INGRES VectorWise Whitepaper, a technical
Ailamaki, Anastassia. A Storage Model to Bridge the
Processor/Memory Speed Gap. Carnegie Mello
University, 2001

similar documents