ZK Huge Grouping Model"
Robertwenzel (talk | contribs) |
Robertwenzel (talk | contribs) m (→Page/Caching) |
||
Line 199: | Line 199: | ||
So in the scenario above whenever e.g. Element 3 of Group 1 needs to be loaded the PagingGroupsModel will load the chunk "Group1 Child0-9" calling this.loadChildrenPage(1, 0, 10). Perfect for caching, as it would never call e.g. this.loadChildrenPage(1, 3, 10). | So in the scenario above whenever e.g. Element 3 of Group 1 needs to be loaded the PagingGroupsModel will load the chunk "Group1 Child0-9" calling this.loadChildrenPage(1, 0, 10). Perfect for caching, as it would never call e.g. this.loadChildrenPage(1, 3, 10). | ||
− | The disadvantage of this approach is, that | + | The disadvantage of this approach is, that in most cases at least two chunks need to be loaded to display a single page, however when caching a few of these chunks in the model, you'll have one of the two already loaded for the previous page no matter which direction you navigate. |
+ | |||
+ | I think the better cachability out rules the slight memory overhead. At a page size of 50, we might have 49 records too much in memory. But reduce the permutations of method/query parameters by 98% which highly improves the chance of a cache hit at method or DB level. | ||
+ | |||
+ | Also very small groups will result in more DB round trips, but hey ... we are talking about BIG data :) | ||
==Random Access Paging== | ==Random Access Paging== |
Revision as of 06:14, 2 August 2013
Robert Wenzel, Engineer, Potix Corporation
August, 2013
ZK 6.5 (or later)
Introduction
bla bla you have some big data... how to display
article already handles display big data in a flat list http://books.zkoss.org/wiki/Small_Talks/2009/July/Handling_huge_data_using_ZK
based on the concepts (paging at DB level, separate paging control from grid) there how to do grouping...
The challenges
grouping is a powerful feature in ZK but also adds an extra layer of complexity
paging and grouping challenges
1. groups can be open or closed (also interactively)
- -> the total count and the number of pages changes, when opening/closing nodes... (needs efficient counting, and state keeping)
2. groups can have arbitrary number of children
- -> random access to a specific page ... how to know the current group and position inside the group for that page
- --> implement a feaseable search
3. minimize DB operations (accumulating network/DB latency)
- -> caching vs. memory consumption
1. + 2. + 3. !!!! combining all three in an efficient, memory preserving way
limitations... needs to store the state in memory (humans are limited, so one is unlikely to toggle 100+ groups)
Maxing out what is possible... ZK GroupsModel supports int indexes so go up to Integer.MAX_VALUE (~ 2.000.000.000 records)
not unrealistic -> Hibernate have changed their paging api to long already...
Implementation
Accessing the data
not use DB, just deterministic random value... caching the group sizes in memory e.g. 400.000.000 of groups child counts stored in an int[] are still 160 MB of memory
Data Record and Dao
package org.zkoss.grouping.dao;
import java.util.Date;
public class AccessDataRecord {
private String ipAddress;
private String browser;
private long contentLength;
private String country;
private Date accessTime;
private String url;
//... Constructor + getters
package org.zkoss.grouping.dao;
public class AccessLogDao {
public int getGroupCount() {
//your DB query to count the groups
}
public List<GroupInfo<String, String>> getGroupInfos(int startIndex, int pageSize) {
//your DB query and mapping to create GroupInfos for this page of groups
}
public List<AccessDataRecord> getChildInfos(int groupIndex, int startIndex, int pageSize) {
//your DB query and load this page of children in the group
}
/**
* This method will count a lot and return a constant value as long as data does not change,
* so caching would be desirable
* @return
*/
public int getTotalChildCount() {
//your DB impl to calc the total number of children
}
/**
* This method will be called quite often with recurring parameters,
* so caching (of most frequently used params) would be desirable !!!
* @return the childrenCount between 2 groups... (including groupIndexFrom and excluding groupIndexTo)
*/
public int getChildCountBetween(int groupIndexFrom, int groupIndexTo) {
//your DB impl to calc the number of children between 2 groups
}
The Paging Model
AccessDataGroupsModel our specialized PagingGroupsModel<D, H, F> is using AccessDataRecord as "data"(D) and String as "head"(H) and "foot"(F) implementation. This class implements the 4 model methods and a GroupingPositionSearch strategy - all delegating to AccessLogDao from above, to keep the workload at DB-Level.
package org.zkoss.grouping;
import java.util.List;
import org.zkoss.grouping.dao.AccessLogDao;
import org.zkoss.grouping.dao.AccessDataRecord;
import org.zkoss.grouping.model.GroupInfo;
import org.zkoss.grouping.model.PagingGroupsModel;
import org.zkoss.grouping.model.search.BinaryGroupingPositionSearch;
import org.zkoss.grouping.model.search.GroupingPositionSearch;
class AccessDataGroupsModel extends
PagingGroupsModel<AccessDataRecord, String, String> {
private AccessLogDao groupsDao;
public AccessDataGroupsModel(AccessLogDao groupsDao, int pageSize, boolean initialOpen, boolean hasGroupfoot) {
super(pageSize, initialOpen, hasGroupfoot);
this.groupsDao = groupsDao;
setPositionSearch(binarySearch());
}
@Override
protected int loadGroupCount() {
return groupsDao.getGroupCount();
}
@Override
protected List<GroupInfo<String, String>> loadGroupPage(int startIndex, int pageSize) {
return groupsDao.getGroupInfos(startIndex, pageSize);
}
@Override
protected List<AccessDataRecord> loadChildrenPage(int groupIndex, int startIndex, int pageSize) {
return groupsDao.getChildInfos(groupIndex, startIndex, pageSize);
}
@Override
protected int getTotalChildCount() {
return groupsDao.getTotalChildCount();
}
private GroupingPositionSearch binarySearch() {
return new BinaryGroupingPositionSearch(this, 4096) {
@Override
protected int getChildCountBetween(int groupIndexFrom, int groupIndexTo) {
return groupsDao.getChildCountBetween(groupIndexFrom, groupIndexTo);
}
};
}
}
As we see the methods are focusing on counting, and retrieving data in page sized chunks. Nothing magical here, however our DB is doing the heavy work (selecting, grouping, counting, sorting) ... that's what it is optimized for, and with huge data we'll not attempt to compete with our DB, we just want as little data as possible.
Page/Caching, Counting & Open/Close State keeping
Open/Close State keeping
The PagingGroupsModel will keep track of the toggled state only for the changed groups (together with the childCount of that group). If the Model is initialized with INITIALLY_OPEN, then only the closed nodes will be tracked, and vice versa. Assuming a human will not take the time to toggle an outrageous number of groups, this Map (groupIndex => childcount) will stay relatively small and should not compromise our performance and memory expectations. Additionally the total number of toggled groups is updated on every open/close interaction.
This information is used to adjust the number of total children (for the UI paging calculations) or the number of children between 2 groups (for Position search) - as the DB won't keep that UI state.
see:
- PagingGroupsModel#getCurrentRowCount()
- PagingGroupsModel#getToggledCountBetween(int groupIndexFrom, int groupIndexTo)
- PagingGroupsModel#toggleGroup(int groupIndex, boolean open)
Page/Caching
The GroupsModel interface offers methods to retrieve single Groups, and Children... Whenever one of these methods is called the PagingGroupsModel will check if that position is cached for the current page, and reload a page sized chunk from the DB if not, and cache that chunk both at group- and child-level.
One problem here is, that the chunk retrieved from the DB will not match the page boundaries in the UI.
e.g.
1 2 3 4
|.........|.........|.........|......... -> UI pages (page size 10)
0.....1.........|......2...3......4....5 -> Groups | Chunks Group 0 open
01.........|......2...3......4....5..... -> Groups | Chunks Group 0 closed
Page size vs. Group size: While the UI-Pages have a constant size the groups may have a variable size, and are offset because of previously opened/closed nodes, the offset of the chunks loaded from the DB would vary, whenever nodes are toggled.
As we want to benefit from caching at all possible levels, keeping the method/query parameters constant will reduce our required cache size at Dao method level or in DB and will avoid duplicate load of overlapping chunks (with multiple users having different UI state).
So in the scenario above whenever e.g. Element 3 of Group 1 needs to be loaded the PagingGroupsModel will load the chunk "Group1 Child0-9" calling this.loadChildrenPage(1, 0, 10). Perfect for caching, as it would never call e.g. this.loadChildrenPage(1, 3, 10).
The disadvantage of this approach is, that in most cases at least two chunks need to be loaded to display a single page, however when caching a few of these chunks in the model, you'll have one of the two already loaded for the previous page no matter which direction you navigate.
I think the better cachability out rules the slight memory overhead. At a page size of 50, we might have 49 records too much in memory. But reduce the permutations of method/query parameters by 98% which highly improves the chance of a cache hit at method or DB level.
Also very small groups will result in more DB round trips, but hey ... we are talking about BIG data :)
Random Access Paging
Appendix
Download
Comments
Copyright © Potix Corporation. This article is licensed under GNU Free Documentation License. |