Monday, January 23, 2017

Oracle Database Cloud (DBaaS) Performance - Part 2 - Storage

In this second part of this installment I'll focus on the performance figures related to I/O encountered when the corresponding tests were performed on the platform.

IOPS

When running with minimum sized buffer cache, direct and asynchronous I/O enabled, the following average read-only IOPS figures were measured over a period of several days (this is the test described in part three of the "performance consistency" series) .

First, running on a 4 OCPU single instance configuration (8 CPUs / 8 cores as outlined in the previous part) with either four or eight sessions:


Second, running on a 2+2 OCPU two instance RAC configuration (4+4 CPUs / 4+4 cores as outlined in the previous part) with either four or eight sessions:


So for the 8 KB block size the single instance test shows an average of almost 40.000 IOPS for read-only tests, and the two instance RAC even scales to almost 60.000 IOPS on average. These are pretty impressive IOPS figures for a general purpose shared / virtualized environment, and - at least - for the read part - are way above what other DBaaS cloud providers offer out of the box.

It's also worth mentioning that I got the same IOPS results independent from the amount of storage allocated - in contrast to Amazon RDS for example, where even the "Provisioned IOPS" storage class requires you to allocate at least 3 TB of storage in order to get the maximum of 30,000 IOPS. I've repeated the same test setup with the table size inflated to the maximum possible within my test account limitations (so total size of storage allocated close to the 1 TB storage quota) and still got comparable IOPS results to this test that only allocates approx. 1 GB for the 8 KB block size test (total storage allocated approx. 100 GB).

I/O Throughput

In order to measure the maximum I/O throughput corresponding Parallel Execution statements were used to create tables of sufficient size and read them via Parallel Query.

First, running on a 4 OCPU single instance configuration (8 CPUs / 8 cores as outlined in the previous part) at a degree of 16:


Second, running on a 2+2 OCPU two instance RAC configuration (4+4 CPUs / 4+4 cores as outlined in the previous part) at a degree of 16:


So the single instance configuration writes at 120 MB per second, and reads at 640 MB per second, whereas the two instance RAC configuration writes at 80 MB per second and reads at 1.100 MB per second. Clearly the storage layer is read optimized and writes are much slower than reads. The read performance is very good however, and again above what other cloud providers deliver (for example Amazon RDS "Provisioned IOPS" offers a maximum throughput of 320 MB per second, but again only if a sufficient amount of storage is allocated).

I/O Latency

In order to measure the I/O latency the test used above to determine the IOPS rate was modified to use random single block reads ("db file sequential read") instead of asynchronous reads ("db file parallel read"). The reason for this is that from a database perspective the latency for reads performed asynchronously cannot be measured properly, due to the way the database handles the I/O.

The results were the following, this time running on a 2 OCPU (4 CPUs / 4 cores) single instance configuration with 8 KB block size and 8 sessions:



So again rather impressive 0.45 ms average wait time for a random single block read, and the wait event histogram also shows that the performance was very consistent, with almost 75 percent of the waits taking less than 512 microseconds at a rate of more than 16.000 IOPS.

Write Performance

The read-only test above was modified to be 100 percent updates (for more details see here). Since the instance was configured to operate with a minimum sized buffer cache this puts maximum pressure on the database writer to write dirty blocks as fast as possible in order to allow new blocks to be read into the buffer cache.

Running on a 4 OCPU single instance configuration (8 CPUs / 8 cores) with eight sessions the following profile was measured - using NOARCHIVELOG mode and Flashback disabled:


So clearly the database writer couldn't write the dirty blocks quick enough - almost 50 percent of the database time the sessions had to wait for free buffers. This means that write intensive workloads might not work too well and run into these limitations.

Summary

The storage layer is clearly optimized for reads and delivers at least for the read-only tests a very good I/O performance. Write intensive workloads might not work too well if they consistently require more write performance than provided by the service.

Monday, January 16, 2017

Oracle Parallel Execution Deep Dive Session

Here is a recording of a session I did a while ago, covering how to understand the essentials of Oracle Parallel Execution and how to read the corresponding execution plans.