Dot Net Solutions
George V Place,
4 Thames Avenue
Windsor
Berkshire
SL4 1QP
Great Britain
0845 402 1752
GEO: -0.606174, 51.4843
 
 
 
 

Hash Partitioning Pattern - Windows Azure Table Design Patterns 

Tags: Azure, Design

The Hash Partitioning Pattern aims to improve the performance of parallel and batch queries.

Example

As part of the pre-processing done for Wikipedia Explorer, the HTML/XML representation of a Wikipedia page is converted in to a set of POCO objects and the incoming and outbound links are analysed. The core data is stored in a Windows Azure Table and is used when pre-processing a page, which is a massively CPU intensive operation. As optimisations this work is done in parallel and is also for batches of records to reduce latency.

image

Challenge

The problem involves the selection of a partition key that is both optimised for parallel and batch processing.

PartitionKey (PageId) RowKey Title Status RenderedUrl
123456 (guid) St. Laurence’s College Extracted
123457 (guid) David Hartman (TV personality) Converted
123458 (guid) Liz Parker Evans Converted

Too maximise the performance of a single query all the data would ideal be in a single partition. Too maximise performance of parallel queries the data would be distributed across many partitions.

Solution

A solution is to define a finite number of partitions to be used as buckets. The choice of the number of buckets is a balance of the performance of the writes and reads of data and the parallel and batch queries being performed. The naming of the buckets should not relate directly to the entity being stored within that partition to ensure a pseudo distribution of entities between partitions.

PartitionKey RowKey PageId Title Status
00 (guid) 123456 St. Laurence’s College Extracted
00 (guid) 123457 David Hartman (TV personality) Converted
00 (guid) 123458 Liz Parker Evans Converted

In this example 256 buckets or partitions were used and the pages pseudo randomly assigned to each bucket. This provides an approximately uniform distribution of entities across the partitions, which can be visualised as:

image

Summary

Motivation:

To improve the performance of parallel and batch queries.

Implementation:

By creating a finite number of partitions and creating an approximately uniformly distribution of entities across those partitions.

Uses:

  • Improve performance of parallel entity reads or writes.
  • To provide support for random batch reads or writes.

Reference

Azure Application Demonstrations of Wikipedia Explorer and ScrumWall

Windows Azure Tables and Queues Deep Dive

Hash partitioning

Also See

Table Name Key Pattern

Hash Partitioning Pattern

Transactional Master-Item Record Pattern

Chronological Query Pattern

Starts With Query Pattern

Author: Marcus Tillett
@drmarcustillett

Tweet
Published: 28 May 2010  09:16
1  Comment  |  Trackback Url  | 0  Links to this post | Bookmark this post with:        

Links to this post

No linkbacks added

Comments

No comments added yet

 
 
 
 

Post comment

Name *:
URL:
Email:
Comments:


CAPTCHA Image Validation