ZK Huge Grouping Model"

From Documentation
Line 210: Line 210:
 
</source>
 
</source>
  
The cache size is reasonably small and the LRUMap will make sure it will not grow beyond its capacity. Throwing out the oldest Groups/Children, when overflowing.
+
The cache size is reasonably small and the LRUMap will make sure it will not grow beyond its capacity throwing out the oldest Groups/Children, when overflowing.
  
 
The disadvantage of this approach is, that in most cases at least two chunks need to be loaded to display a single page, however when caching a few of these chunks in the model, you'll have one of the two already loaded for the previous page no matter which direction you navigate.
 
The disadvantage of this approach is, that in most cases at least two chunks need to be loaded to display a single page, however when caching a few of these chunks in the model, you'll have one of the two already loaded for the previous page no matter which direction you navigate.

Revision as of 06:28, 2 August 2013

ZK Huge Grouping Model

Author
Robert Wenzel, Engineer, Potix Corporation
Date
August, 2013
Version
ZK 6.5 (or later)

WarningTriangle-32x32.png This page is under construction, so we cannot guarantee the accuracy of the content!

Introduction

bla bla you have some big data... how to display

article already handles display big data in a flat list http://books.zkoss.org/wiki/Small_Talks/2009/July/Handling_huge_data_using_ZK

based on the concepts (paging at DB level, separate paging control from grid) there how to do grouping...

The challenges

grouping is a powerful feature in ZK but also adds an extra layer of complexity

paging and grouping challenges

1. groups can be open or closed (also interactively)

-> the total count and the number of pages changes, when opening/closing nodes... (needs efficient counting, and state keeping)

2. groups can have arbitrary number of children

-> random access to a specific page ... how to know the current group and position inside the group for that page
--> implement a feaseable search

3. minimize DB operations (accumulating network/DB latency)

-> caching vs. memory consumption

1. + 2. + 3. !!!! combining all three in an efficient, memory preserving way

limitations... needs to store the state in memory (humans are limited, so one is unlikely to toggle 100+ groups)

Maxing out what is possible... ZK GroupsModel supports int indexes so go up to Integer.MAX_VALUE (~ 2.000.000.000 records)

not unrealistic -> Hibernate have changed their paging api to long already...

Implementation

Accessing the data

not use DB, just deterministic random value... caching the group sizes in memory e.g. 400.000.000 of groups child counts stored in an int[] are still 160 MB of memory

Data Record and Dao

package org.zkoss.grouping.dao;

import java.util.Date;

public class AccessDataRecord {
	private String ipAddress;
	private String browser;
	private long contentLength;
	private String country;
	private Date accessTime;
	private String url;

//... Constructor + getters
package org.zkoss.grouping.dao;

public class AccessLogDao {

	public int getGroupCount() {
		//your DB query to count the groups
	}

	public List<GroupInfo<String, String>> getGroupInfos(int startIndex, int pageSize) {
		//your DB query and mapping to create GroupInfos for this page of groups
	}

	public List<AccessDataRecord> getChildInfos(int groupIndex, int startIndex, int pageSize) {
		//your DB query and load this page of children in the group
	}

	/**
	 * This method will count a lot and return a constant value as long as data does not change, 
	 * so caching would be desirable
	 * @return
	 */
	public int getTotalChildCount() {
		//your DB impl to calc the total number of children
	}

	/**
	 * This method will be called quite often with recurring parameters, 
	 * so caching (of most frequently used params) would be desirable !!!
	 * @return the childrenCount between 2 groups... (including groupIndexFrom and excluding groupIndexTo)
	 */
	public int getChildCountBetween(int groupIndexFrom, int groupIndexTo) {
		//your DB impl to calc the number of children between 2 groups
	}

The Paging Model

AccessDataGroupsModel our specialized PagingGroupsModel<D, H, F> is using AccessDataRecord as "data"(D) and String as "head"(H) and "foot"(F) implementation. This class implements the 4 model methods and a GroupingPositionSearch strategy - all delegating to AccessLogDao from above, to keep the workload at DB-Level.

package org.zkoss.grouping;

import java.util.List;

import org.zkoss.grouping.dao.AccessLogDao;
import org.zkoss.grouping.dao.AccessDataRecord;
import org.zkoss.grouping.model.GroupInfo;
import org.zkoss.grouping.model.PagingGroupsModel;
import org.zkoss.grouping.model.search.BinaryGroupingPositionSearch;
import org.zkoss.grouping.model.search.GroupingPositionSearch;

class AccessDataGroupsModel extends
		PagingGroupsModel<AccessDataRecord, String, String> {

	private AccessLogDao groupsDao;

	public AccessDataGroupsModel(AccessLogDao groupsDao, int pageSize, boolean initialOpen, boolean hasGroupfoot) {
		super(pageSize, initialOpen, hasGroupfoot);
		this.groupsDao = groupsDao;
		setPositionSearch(binarySearch());
	}

	@Override
	protected int loadGroupCount() {
		return groupsDao.getGroupCount();
	}

	@Override
	protected List<GroupInfo<String, String>> loadGroupPage(int startIndex, int pageSize) {
		return groupsDao.getGroupInfos(startIndex, pageSize);
	}

	@Override
	protected List<AccessDataRecord> loadChildrenPage(int groupIndex, int startIndex, int pageSize) {
		return groupsDao.getChildInfos(groupIndex, startIndex, pageSize);
	}

	@Override
	protected int getTotalChildCount() {
		return groupsDao.getTotalChildCount();
	}

	private GroupingPositionSearch binarySearch() {
		return new BinaryGroupingPositionSearch(this, 4096) {
			@Override
			protected int getChildCountBetween(int groupIndexFrom, int groupIndexTo) {
				return groupsDao.getChildCountBetween(groupIndexFrom, groupIndexTo);
			}
		};
	}

}

As we see the methods are focusing on counting, and retrieving data in page sized chunks. Nothing magical here, however our DB is doing the heavy work (selecting, grouping, counting, sorting) ... that's what it is optimized for, and with huge data we'll not attempt to compete with our DB, we just want as little data as possible.

Page/Caching, Counting & Open/Close State keeping

Open/Close State keeping

The PagingGroupsModel will keep track of the toggled state only for the changed groups (together with the childCount of that group). If the Model is initialized with INITIALLY_OPEN, then only the closed nodes will be tracked, and vice versa. Assuming a human will not take the time to toggle an outrageous number of groups, this Map (groupIndex => childcount) will stay relatively small and should not compromise our performance and memory expectations. Additionally the total number of toggled groups is updated on every open/close interaction.

This information is used to adjust the number of total children (for the UI paging calculations) or the number of children between 2 groups (for Position search) - as the DB won't keep that UI state.

see:

  • PagingGroupsModel#getCurrentRowCount()
  • PagingGroupsModel#getToggledCountBetween(int groupIndexFrom, int groupIndexTo)
  • PagingGroupsModel#toggleGroup(int groupIndex, boolean open)

Page/Caching

The GroupsModel interface offers methods to retrieve single Groups, and Children... Whenever one of these methods is called the PagingGroupsModel will check if that position is cached for the current page, and reload a page sized chunk from the DB if not, and cache that chunk both at group- and child-level.

One problem here is, that the chunk retrieved from the DB will not match the page boundaries in the UI.

e.g.

    1         2         3         4                
    |.........|.........|.........|.........           -> UI pages (page size 10)

    0.....1.........|......2...3......4....5           -> Groups | Chunks Group 0 open
    01.........|......2...3......4....5.....           -> Groups | Chunks Group 0 closed

Page size vs. Group size: While the UI-Pages have a constant size the groups may have a variable size, and are offset because of previously opened/closed nodes, the offset of the chunks loaded from the DB would vary, whenever nodes are toggled.

As we want to benefit from caching at all possible levels, keeping the method/query parameters constant will reduce our required cache size at Dao method level or in DB and will avoid duplicate load of overlapping chunks (with multiple users having different UI state).

So in the scenario above whenever e.g. Element 3 of Group 1 needs to be loaded the PagingGroupsModel will load the chunk "Group1 Child0-9" calling this.loadChildrenPage(1, 0, 10). Perfect for caching, as it would never call e.g. this.loadChildrenPage(1, 3, 10).

see:

  • AccessDataGroupsModel#loadGroupPage(int startIndex, int pageSize)
  • AccessDataGroupsModel#loadChildrenPage(int groupIndex, int startIndex, int pageSize)

The same strategy applies to caching the Groups. PagingGroupsModel will also load chunks of Groups with fixed intervals. In case groups are displayed with a foot, the groups cache size can be halved, as the group foot will consume one row too.

cachedGroups = Collections.synchronizedMap(new LRUMap(pageSize * (hasGroupFoot ? 1 : 2))); //cache at least 2 pages ...
cachedChildren = Collections.synchronizedMap(new LRUMap(pageSize * 4)); //cache enough for at least a few pages

The cache size is reasonably small and the LRUMap will make sure it will not grow beyond its capacity throwing out the oldest Groups/Children, when overflowing.

The disadvantage of this approach is, that in most cases at least two chunks need to be loaded to display a single page, however when caching a few of these chunks in the model, you'll have one of the two already loaded for the previous page no matter which direction you navigate.

I think the better cachability outrules the slight memory overhead. At a page size of 50, we might have 49 records too much in memory (which will likely be reused on the next page, when the user navigates, by one Page). But reduce the permutations of method/query parameters by 98% which highly improves the chance of a cache hit at method or DB level. I would even keep the cache bigger, so that back navigation by few pages does not cause a reload.

Also very small groups will result in more DB round trips, but hey ... we are talking about BIG data :)

Random Access Paging

Appendix

Download

selenium-IDE-example.zip

Comments



Copyright © Potix Corporation. This article is licensed under GNU Free Documentation License.