AllUnderstanding Data Center Workloads with Virtual Instruments
Understanding Data Center Workloads with Virtual Instruments
Abstract
Virtual Instruments combines a storage workload modeling application – WorkloadWisdom, with purpose-built load
generation appliances and data capture probes, to help storage architects and engineers to accurately characterize
storage performance. Step one is to acquire and analyze product data to characterize the workload model. In 2015,
Virtual Instruments changed the industry by automating the analysis of production workload data, via the Workload
Data Importer feature of WorkloadWisdom.
This document provides a summary and details for characterizing network storage workloads and determining the level
of data required for a good, better, or best outcome. The more complete the data reported by and array or performance
monitor, and the higher the resolution in terms of reporting period, the more closely a production application can be
accurately modeled for testing. We describe the details of what should be monitored and provided to produce a
superior workload model from a production workload.
The goals of this document are two-fold:
1.
Outline ways to characterize workloads for Enterprise data center applications. The goal is intended to cover
the most common characteristics that are readily available information and impact developing a good emulation
of the workload. It is not intended to cover corner case characteristics that would be required in order to test
the full functionality of a storage array. The intent is to enable the development of a realistic enough workload
to be able to compare different devices, configurations, firmware versions, or to detect degraded components
in an infrastructure. Characterizing a workload that would be complete enough to do a full storage
manufacturer regression test of a storage subsystem is outside the scope of these recommendations.
2.
Describe how the Workload Data Importer simplifies and automates the analysis of production storage
workloads. While understanding the section “What to Characterize” is useful, it’s no longer necessary, as the
Importer now does the heavy lifting.
Virtual Instruments Services
As the leader in infrastructure performance optimization, the Virtual Instruments Professional Services organization
helps teams to characterize their workload and to model configurable workloads to test ‘what if’ scenarios against their
most common workloads.
What to Characterize
There are 4 basic areas to consider when characterizing a workload for a storage environment.
1.
2.
3.
3
Description of the size, scope and configuration of the environment, including number of servers, LUNs,
volumes, and shares
The patterns that describe when, how frequently and in what ways data are accessed
IOPS and throughput rates during the time-period data is gathered
4.
The impact on the network subsystem in addition to the patterns observed on the array itself. Though this
information isn’t used directly to model the workload, it enables an impact comparison to be made between the
emulated traffic and the actual production traffic. This helps measure how representative the emulated traffic is
to real-world traffic
The Importance of Granularity in Workload Characterization
Besides knowing what to characterize, it is important to understand the granularity of the collected data. Data
granularity determines the quality of the resulting emulated workloads in terms of a good, better, or best
characterization. It is important to consider:
1.
2.
3.
4.
How often and how long the data is collected
How completely the data represents workload elements like LUNs, volumes/mount points, shares,
directories/files and LBAs.
How detailed the access patterns are in terms of data and metadata protocol command coverage.
How detailed the information is in terms of access patterns and request sizes.
High Level Guidelines
The realism of the Workload Model that can be created from the analyzed Workload Data depends heavily on the
fidelity of the workload data that is provided by the storage arrays and performance monitoring tools. As there is not an
industry standard for the data structure provided by various storage arrays and performance monitoring tools, and as
there are different ways to arrive at the same metrics even though the data sources are different, only a high-level
guideline can be provided here as a starting point, and that a deeper technical discussion must be held to define the
exact data set that is needed to create a parser / tool that simplifies workload modeling.
Protocol / General
Description
Model Quality 1
FC / iSCSI
Per-LUN R/W, Block Size distribution, KPIs 2
Best
FC / iSCSI
Per-LUN R/W, KPIs
Good
FC / iSCSI
Per-array R/W, KPIs
Minimum
NFS / SMB
Per-volume R/W, Metadata buckets, block sizes, KPIs
Best
NFS / SMB
Per-array commands AND per-volume R/W/Other, KPIs
Good
NFS / SMB
Per-array commands OR per-volume R/W/Other, KPIs
Minimum
General
Data interval: 1s – 1min
Best
General
Data interval: 1min – 5min
Good
General
Data interval: 5min – 10min
Minimum
Most commercial storage arrays and performance monitoring tools provide data sources that fall under the Good
category. The VirtualWisdom solution from Virtual Instruments and a few commercial solutions provide data sources
that fall under the Best category above.
Workload Details
What to Characterize
Storage environment
Understanding the storage environment differs for file, block and object storage. Each environment has unique
characteristics that must be understood to create and map an emulated workload similar to the observed production
environment.
File (NAS)
The Model Quality is assessed only by the Virtual Instruments Product Management team, in the context of creating
realistic Workload Models that sufficiently represent the observed Production Workload Data
2 Key Performance Indicators are Throughput, IOPS, Latency
4
1
Please complete the form to gain access to this content