To start J2EE application performance QA and tuning:

Set a goal:

Before you begin tuning your J2EE application’s performance, set a goal. Often this goal addresses the maximum concurrent users the application will support for a given limit on response times. But the goal can also focus on other variables – for example, the response times should not increase more than 10 percent during the peak hour of user load.

Identify problem areas:

It is important to identify the bottlenecks when you start making changes to improve performance. A little investigation into problems might reveal the specific component that causes poor performance. For example, if the CPU usage on an application server is high, you will want to focus on tuning the application server first.

Follow a methodical and focused path:

Once the goal is set, try to make changes that are expected to have the biggest impact on performance. Your time is better spent tuning a method that takes 10 seconds but gets called 100 times than tuning a method that takes one minute but gets called only once. In an ideal world, you test one change at a time before using it in a production environment. You make one change and stress-test it. If the change results in positive impact, only then will you make it permanent.

Performance planning for managers:

  • Include budget for performance management.
  • Create internal performance experts.
  • Set performance requirements in the specifications.
  • Include a performance focus in the analysis.
  • Require performance predictions from the design.
  • Create a performance test environment.
  • Test a simulation or skeleton system for validation.
  • Integrate performance logging into the application layer boundaries.
  • Performance test the system at multiple scales and tune using the resulting information
  • Deploy the system with performance logging features.

Balancing Network Load with Priority Queues Tips:

  • Hardware traffic managers redirect user requests to a farm of servers based on server availability, IP address, or port number. All traffic is routed to the load balancer, then requests are fanned out to servers based on the balancing algorithm.
  • Popular load-balancing algorithms include: server availability (find a server with available processing capability); IP address management (route to the nearest server by IP address); port number (locate different types of servers on different machines, and route by port number); HTTP header checking (route by URI or cookie, etc).
  • Web hits should cater for handling peak hit rate, not the average rate.
  • You can model hit rates using gaussian distribution to determine the average hit rate per time unit (e.g. per second) at peak usage, then a poisson probability gives the probability of a given number of users simulatneously hitting the server within that time unit. [Article gives an example with gaussian fitted to peak traffic of 4000 users with a standard deviation of 20 minutes resulting in an average of 1.33 users per second at the peak, which in turn gives the probabilities that 0, 1, 2, 3, 4, 5, 6 users hitting the server within one second as 26%, 35%, 23%, 10%, 3%, 1%, 0.2%. Service time was 53 milliseconds, which means that the server can service 19 hits per second without the service rate requiring requests being queued.
  • System throughput is the arrival rate divided by the service rate. If the ratio becomes greater than one, requests exceed the system capability and will be lost or need to be queued.
  • If requests are queued because capacity is exceeded, the throughput must drop sufficiently to handle the queued requests or the system will fail (the service rate must increase or arrival rate decrease). If the average throughput exceeds 1, then the system will fail.
  • Sort incoming requests into different priority queues, and service the requests according to the priorities assigned to each queue.

My query was fine last week and now it is slow. Why?

The likely cause of this is because the execution plan has changed. Generate a current explain plan of the offending query and compare it to a previous one that was taken when the query was performing well. Usually the previous plan is not available.

Some factors that can cause a plan to change are:

  • Which tables are currently analyzed? Were they previously analyzed? (ie. Was the query using RBO and now CBO?)
  • Has OPTIMIZER_MODE been changed in INIT.ORA?
  • Has the DEGREE of parallelism been defined/changed on any table?
  • Have the tables been re-analyzed? Were the tables analyzed using estimate or compute? If estimate, what percentage was used?
  • Have the statistics changed?
  • Has the SPFILE/ INIT.ORA parameter DB_FILE_MULTIBLOCK_READ_COUNT been changed?
  • Has the INIT.ORA parameter SORT_AREA_SIZE been changed?
  • Have any other INIT.ORA parameters been changed?

What do you think the plan should be? Run the query with hints to see if this produces the required performance.

It can also happen because of a very high high water mark. Typically when a table was big, but now only contains a couple of records. Oracle still needs to scan through all the blocks to see if they contain data.