Recently I had a question from a client whether there was really any performance improvement in SSAS cube processing when using partitioning strategies vs. just using a single partition.
While intuitively I "knew" that partitioning cube fact tables results in better parallelism and thus increased performance (when adequate CPU, memory and I/O resources can support it), I realized I really didn't have any empirical "proof" to backup my professional opinion. So--I setup a simple test.
The following is a very simple comparison of processing the same data set with a cube design that's identical--except for the introduction of multiple partitions. For this test I used Analysis Services 2008R2 x64 running on a 4-core Xeon server
The data set processed includes 37 million fact rows, and several dimensions--the largest including around 4,000 members. In the first test, a single partition is used.
In the second, the cube is partitioned along months, resulting in about 1.3 million rows per partition. This is below the recommended threshold for partition sizing, but provides a decent (if basic) evaluation of the effect of parallelism on cube processing.
Results:
- Time to process the cube using a single partition: 21 minutes
- Time to process the cube using 23 partitions: 12.7 minutes
- Bottom line: by partitioning the processing of the cube, there's a time savings of around 40% in this case
Observation: The server I used is in my lab and has 4 Xeon cores, 4GB RAM, and 1.3TB RAID 1E storage. During the single partition processing, memory demand didn't exceed 3GB, and CPU utilization didn't exceed 60% or so. With partitioned processing, both CPU and memory were completely saturated. When I get the chance (and the required loose change) to upgrade the server to 8 cores and more memory, I'd like to re-run this to see whether increased resources would provide an even greater benefit. However, having at least this much empirical evidence is a good start.
Single partition processing
23 Partitions processing
