Home
Documentation
This is the implementation of Broeksema et. al's presented visualization. The background is, that analysing multidimensional categorical datasets is hard and a tool to do this task was developed by them. The tool makes it possible not only to analyse the correlation of data points, but also the correlation of attributes itself. For example if a value of an attribute is plotted at the center, it means that is is commonly selected by data points. Two values next to each other means, that either they are from the same category and data points select other values often togehter, or they are from an different attribute and often selected together. Observations plotted next to each other means they have common values. Based on this knowledge, it is possible to filter and merge the data, to gain further insight and detect clusters and correlation of the data.
Browser support
I tested the visualisation on firefox and google chrome. Both seem to work, however the css does only works as intended on firefox. In google chrome the dimensions view has incorrect sizeing. The runtime should be better on chrome though. The processor used for testing is an intel 4770k with 4 cores at 3.9GHz. The runtime for autoInsurance3000 should be quite OK with machines at a similar specification. The full dataset does work on my machine, but
takes about 20-40 seconds of calculations for the observations. Out of memory errors may come up, but most of them are fixed with the split of the calculation on different web workers. It can however consum up to 2GB of ram.
Interaction and Workflow
First thing to-do is to load a dataset. The dataset can be selected on the bottom right corner. The test dataset is very small and for interface testing only. The autoInsurance dataset contains 3000,6000,9000 observations and it is recommended to use the smallest. The chess dataset is a dataset which I added as a second dataset for testing purposes, but I don't think it is easily understandable.
After loading a dataset, the calculation is done in the background. After finishing, the voronoi diagram in the projection view is drawn and the dimension view is updated with relevance values. The delta slider below the voronoi can be used to merge cells by distance. The dimensions view can be sorted by clicking the header. Hovering a voronoi cell shows all values in the cell. Different hue shows the relevance, however this is better visible on the grayscale when observations is activated.
The dimensions view on the right side shows the attributes and by clicking opens the associated values. The bar next to values shows the number of observations with said value. The bar next to the attribute shows the relevance. By clicking the header of the table, values are sorted by the header. The color of the voronoi and the color of the attribute in the dimension view are matched.
Pressing the filter view button shows the filter view. Here values of an attribute can be selected by clicking, which highlights a value, and merged. The area of the associated voronoi cell is shown, except NaN when the cell is to close to another cell and d3.js polygon calculation fails. It is also possible to filter an attribute by clicking the checkbox. The user can also split already merged values. The operations are applied when clicking apply, which triggers a recalculation of the data.
By clicking show observations, the projection of the data points are shown. This is possible as soon as the calculation has finished and usually this takes longer than calculating the column projections. When clicking the button, the observations are shown as black circles. By clicking a bar on the different bar charts colors the observations by the values of the associated attribute. Hovering a observation shows its value.
How to Execute
This should work by opening the
index.html
with a browser, however security settings in chrome may stop execution of local web workers.
The online version can be accessed at:
https://mwallinger.github.io/