06-08-2013, 03:43 PM
Performance Analysis in Loosely-Coupled Distributed Systems: the case for a data-driven approach
Performance Analysis .ppt (Size: 739 KB / Downloads: 30)
Web Services
App-to-app communication over the Internet using standardized XML messaging
Examples:
Google Web API
Microsoft Passport
Web services can be combined arbitrarily
What happens when something goes wrong?
How can we monitor performance?
Web Services Profiling
Account the resources consumed by a service invocation throughout its lifetime
Cross-machine
Cross-administrative domain
This is not the same as monitoring performance counters on a system
The bottleneck resource for all requests may not explain the performance of a particular single request
Throughput vs response time
Magpie: Data-driven
Performance Monitoring
Unit of interest is the service invocation in a tightly-coupled distributed system
System resource consumption is accounted to individual requests
e.g. CPU, disk accesses and network bandwidth used by each HTTP request in web server
Online measurements are taken by a set of profiling components
Offline processing of the recorded data derives a model of the system workload
Magpie for Performance Prediction
Scope (currently) is multi-tier server farms running .NET web sites
Goals:
Acquire a workload description with less human effort than conventional benchmarking
Extract a detailed model from a ‘representative’ system
Not just a long-term average across all transactions
Measure with a realistic mix of transaction types – caches!
Build a probabilistic model of the workload which includes “hidden” transaction types, eg error conditions
Complex behaviour may not be easily observable manually, eg web transaction type discriminator is not necessarily the URL
Why is this data useful?
Observe the behaviour of a single request
Resources demanded
Dependencies and interactions
Service received
Characterize the workload
Problem diagnosis
Performance prediction
Charging and SLAs
Workload Characterization by Clustering
Borrowing algorithms from gene-sequence comparison and speech recognition
Construct a “string” representation of traces
Cluster using String Edit Distance
Start with a ‘representative’ trace as cluster centroid
Compute distance from each trace to each cluster centroid
Compare inter/intra-cluster mean distances to decide when to create a new cluster
Approx O(N * C) - where C is #clusters
Conclusion
Data-driven performance profiling is important for:
Understanding request structure
Components in the critical path may change
Services call other services unknown to the user
Understanding request service
How does response time relate to throughput?
Web Services need Magpie!