Step by Step Trouble Shooting
Performance SOP
Identify the Bottleneck
Usually the bottleneck can be found in one of these areas:
- client
- network
- server
To breakdown a slow performing web application is a good idea to start, where the bad performance is perceived... in the browser. Most modern browsers provide very sophisticated tools supporting the search for a bottleneck and draw some conclusions, and eliminate other possible causes easily.
- Developer tools - Net(work)
- Chrome -> [F12] / [CTRL + SHIFT + I]
- Firefox -> [CTRL + SHIFT + Q] or Firebug [F12]
- IE9+ -> [F12]
- IE8 & others -> fiddler2
Investigating the network traffic by following the questions below the biggest problem area(s) should become apparent after a few minutes:
1. Are there one or more long running requests?
NO → #Client Side Issue
YES
↓
2. Is it a static resource?
YES (js, css, images...) → check #ZK Server Configuration (debug / cache / compression) → STILL SLOW → #Network Issue
NO (dynamic request into ZK application)
- *.zul = full page request (can be followed by ajax requests)
- zkau/* = ajax request
↓
3. Which PHASE of the request is slowest ? (examples based on Chrome developer tools)
CONNECTING (or one of Proxy, DNS Lookup, Blocking, SSL)
- 3.a) Is this a network problem (everything between browser and ZK Application)?
- test ping / trace route to different servers
- test dns lookup timing
- YES → #Network Issue
- NO → #Server Side Issue (application takes long time to accept connection, or even times out)
SENDING
- 3.b) Is the request unreasonably big? (rare case, usually due to an upload (reasonable), or form posting a lot of data)
- YES → #Client Side Issue
- NO
- ↓
- 3.c) Is the bandwidth low?
- e.g. try upload the same amount of data to the server via ftp/scp to check possible upload speed
- YES → #Network Issue
- NO → #Server Side Issue (application server receiving request data slowly)
WAITING → #Server Side Issue (application server taking long time to prepare response)
RECEIVING
- 3.d) Is the response unreasonably big?
- YES → #ZK Server Configuration (render on demand / compression)
- NO
- ↓
- 3.e) Is the bandwidth low?
- e.g.try to download the same amount of data from the server via ftp/scp to check download speed
- YES → ask your administrator to fix it ;)
- NO → #Server Side Issue (appserver sending response data slowly)
Client Side Issue
If there is no significant time spend on the Network and Server side, the slowdown must happen somewhere on the client side.
Client side performance is affected my many factors and may vary with different browser types versions. Other factors are:
- operating system
- available memory
- CPU speed / load
- screen resolution
- graphics card speed
So it is good to compare the client performance on different computers with different browsers, to identify the configuration causing the issue.
If client performance among configurations / browsers is equally bad, the issue will more likely be found in the Rendering area → #Client Side Profiling
Once you identified the client side rendering takes very long, check the size of the response, if the Client engine needs to render a lot (e.g. a Grid with 1000 lines) it will take its time. So compare the timing with a smaller response, and consider if this can be prevented by reducing the data sent to the client using Render on Demand or Pagination (Most users don't need 1000 lines visible at once)
Performance degrading over time when using the application (while network and server timings remain constant) might indicate a client side memory leak → #Client Memory Issue
Client Side Profiling
Make sure your local computer is not under heavy CPU load, and has "enough" Memory available, before starting to profile the Javascript execution in the browser.
To measure and break down the time spent in the JS engine you can try the following steps (in chrome), and interpret the results:
- switch to the "profiles tab"
- choose "Collect Javascript CPU Profile"
- click "start"
- perform your action e.g. reload the screen, or load the search results
- click "stop"
- switch to "Flame Chart" (choice at the bottom)
You'll get a nice view like this:
This brilliant visualization of the JS execution flow and stack depth can be used / interpreted in many ways to extract the information you require.
The timeline on the top indicates the whole period between "start" and "stop", I selected the range we are interested in, and the colorful area at the bottom gives details about which methods are actually called and their timing (you can zoom in and out using the mouse wheel too), clicking on one method will directly lead you to the associated line in the source code (enabling debug-js will help when using this feature).
The small peak (at 1300ms) on the left side is my actual event refreshing the page. The gap until 1500ms represents the idle time of the JS engine waiting for the first response from the server, then you can follow in which order the JS files are executed and which of them consume most of the time.
Additional waiting times in the middle indicate load time of additional JS files and garbage collection times. At about 1840ms the JS engine stops meaning ZK has finished rendering the page widgets and updating the DOM elements.
We can conclude our page took about 540ms (after the initial event) to load and render in order to become available the user and here is no significant slowdown on either JS or network side.
General conclusions:
- more wider mountains → mean more JS time (e.g. ZK render time, a third party library)
- more valleys (flat lines) → mean more waiting time (mostly network, maybe also CSS formatting or other time the browser does not assign to the JS engine)
A similar but not as colorful view is available in Firefox [Shift + F5] showing some basic timeline. And in IE you can also get profiling data.
I find the flame-chart in chrome most powerful, and even if the Performance issue only occurs in a different browser it is a good starting point visualize the timing of the problem, and understand the complexity of the execution path (e.g. in IE you get the information that most time is spend in method x(), and in chrome you can actually see the method in a bigger context). Or when using an older version of IE, you can define the methods of interest where to put some log statements to trace performance manually.
Another interesting view to determine the render time is the Timeline - Events view in Chrome e.g. :
Client Memory Issue
Use a system management tool (e.g. on windows: task manager or process explorer) to watch the memory consumption of your browser over time.
If memory is increasing repeating an action on a page -> subsequently remove items from your ZUL file to identify the component causing this issue.
Again Chrome offers a memory timeline view, real time memory profiling functions, and JS heap dumps.
Server Side Issue
http://www.computerperformance.co.uk/HealthCheck/Disk_Health.htm#Disk%20Bottleneck%202 OS or VM
Perfmon.exe
LogicalDisk(_Total) -> Avg. Disk Queue Length LogicalDisk(_Total) -> Current Disk Queue Length PhysicalDisk(_Total) -> % Disk Time Processor Information(_Total) -> % Processor Time
JVisualVM
- Busy vs. IO wait - Java Debugging - Task manager / Process explorer / top - GC logging - profiling - memory leaks
Appserver config e.g. check number of concurrent requests
ZK Monitor/Statistics...
- Sessions - Desktops
Server Side Profiling
Memory Issue
ZK Server Configuration
- check debug mode → zk config should not be enabled (disabled by default)
- check caching config → should not be disabled (enabled by default)
- http://books.zkoss.org/wiki/ZK_Configuration_Reference/zk.xml/The_Library_Properties/org.zkoss.web.classWebResource.cache
- http://books.zkoss.org/wiki/ZK_Configuration_Reference/zk.xml/The_Library_Properties/org.zkoss.zk.WPD.cache
- http://books.zkoss.org/wiki/ZK_Configuration_Reference/zk.xml/The_Library_Properties/org.zkoss.zk.WCS.cache
- check compression settings → should not be disabled (enabled by default)
- consider/check render on demand settings
- http://books.zkoss.org/wiki/ZK_Developer%27s_Reference/Performance_Tips/Client_Render_on_Demand
- http://books.zkoss.org/wiki/ZK_Developer%27s_Reference/Performance_Tips/Listbox,_Grid_and_Tree_for_Huge_Data/Turn_on_Render_on_Demand
- http://books.zkoss.org/wiki/ZK_Configuration_Reference/zk.xml/The_Library_Properties/org.zkoss.zul.client.rod
- http://books.zkoss.org/wiki/ZK_Configuration_Reference/zk.xml/The_Library_Properties/org.zkoss.zul.grid.initRodSize
- http://books.zkoss.org/wiki/ZK_Configuration_Reference/zk.xml/The_Library_Properties/org.zkoss.zul.listbox.initRodSize
- http://books.zkoss.org/wiki/ZK_Configuration_Reference/zk.xml/The_Library_Properties/org.zkoss.zul.tree.initRodSize (ZK 7)
- TODO ??? maybe more tweaks ???
- check http://books.zkoss.org/wiki/ZK_Developer%27s_Reference/Performance_Tips
Network Issue
If something in your network infrastrucure (routers, proxies, web servers...) is causing the performance issues, there is little you can do as web application developer.
Here some ideas to identify possible bottlenecks trying to reduce network complexity by:
- using ip addresses directly → indicates DNS problem
- avoiding proxies, routers, firewalls (e.g. access the application server from a remote desktop "closer" to the actual server)
- accessing the application server directly, instead of going through a webserver or load balancer
- disabling SSL and check difference
→ "Kindly" inform your network administrator about your observations and ask for help identifying, excluding, fixing these infrastructure problems.