For the past month I have been heads-down experimenting with ways of “turning analytics inside out” within a virtual data world. The initial approach was to build a simple “Hello Data World” that illustrated 2D visualizations augmented with a third (but useful) dimension. Using Unity3D with a single C# script, I was able to show “dancing coordinate attribute poles” for the classic Iris dataset. The vertical poles are the four normalized features (Petal/Sepal Length/Width) with the target Species in color. The lines cycled through the 150 instances/examples in the dataset. The next step was to dance the attribute poles in XZ space using PCA with force-graph dynamics, along with creative camera control.
However, it became apparent that this initial ‘simple’ approach would result in an un-architected mess! In particular, the analytics requires interactive prototyping with clear separation from rendering results within the virtual space. The data objects require the full richness of 3D virtual reality to explore the numerous possibilities for moving beyond 2D visualizations.
A Revised Approach
The revised approach has the recipe: “Combine Python scikit-learn with Unity3D. And stir vigorously!” The analytics (at the depth of scikit-learn) must drive the rendering of the virtual data space. And, the rendering must reflect the complexities of the analysis results. For instance, when the typical fit/predict sequence is performed, the results are 5-10 different structures. How can you ‘sense’ the nature and quality of those results? That question is the focus and challenge of this approach.
After surveying a wide variety of analytic tools and approaches, Python scikit-learn is an excellent beachhead with established bridges to other analytic tools (Jupyter notebooks, QlikView/Sense, RapidMiner), packages (NLP, Theano), platforms(Aster) and languages (R, Scala), along with scaling via cloud platforms (Spark MLlib). Further, the API structure of scikit-learn algorithms is open-source and accessible, implying that data hooks could be placed within the algorithms to monitor intermediate stages of the analytic processing, like the behavior of the gradient descent. In other words, scikit-learn is the Swiss Army Knife for analytics!
The plan is to use scikit-learn as the template for exploring the combinations and flow of analytic methods. The scikit-learn map (cheat-sheet?) is a good starting point. For each of these sections, there are detailed tutorials (with code) illustrating the concepts. For example, supervised learning tutorial used the Iris dataset to illustrate the “curse of dimensionality”. In particular, most examples rely upon MatPlotLib, whose parameters are excellent clues to key attributes of analysis results. Also, the scikit-learn community is filled with supporting resources. In particular, the recent book Python Machine Learning by Sebastian Raschka does an excellent job covering the landscape. Finally, the latest Anaconda Python3 distribution is incredible (and stable), enabling a high-interactive fine-granularity development environment with Jupyter (formerly iPython) notebooks. We can actually document as we explore!
The Unity3D is likewise an excellent beachhead with bridges into a wide range of 3D visualization and virtual reality technologies and communities, all of which are driven by a multi-billion-dollar video gaming industry. Unity3D has become the go-to development environment for high-end games, supported by a robust community (with an excellent asset store). We can take advantage of thousands of options for spawning, rendering and animating game-objects of any structure and shape. Because the Unity scene for a data world must be procedurally generated (as opposed to manually built), C# is the language. To make life easier (and fun), there are many prototyping tools, like PlayMaker that uses FSM (finite state machines) to create complex behaviors in a modular fashion.
The plan is to use Unity to build virtual data worlds that turn the analytics inside-out, enabling a deeper understanding of the analytic processing and not just presenting the analytic results. We are not simply extending 2D visualizations with one additional dimension. In fact, MatPlotLib images can (and should) be bought in-world to complement the Unity-generated data objects. Further, the way that the scikit-learn code uses MatPlotLib gives valuable clues about understanding the important aspects of analytic results. Later, discussions about using D3.js with three.js and WebGL will be pertinent, but only after various design alternatives are explored.
What are the Arrows?
In the smiley-face figure above, the arrows represent a bi-directional interface between analytics and rendering. The analytics pushes analysis results to the data world to give data objects appearance and behavior, while the rendering pushes events about selecting data objects and invoking methods to the analytics.
The plan is to explore that many alternatives for constructing this interface in both the early and later stages of development. The early stage should emphasize light and flexible, while the later stage should emphasize robust and scalable. An issue to investigate is whether to flow unstructured data (JSON with a publish/subscribe model) or share a structured database (SQLite, MongoDB) or both. Technologies like Node.js and ZeroMQ are often mentioned in Reddit and similar forums. Hence, lots of exploring alternatives is required.
Suggestion for Moving Forward
This is a large project requiring a long-term effort by a variety of persons with deep curiosity. Suggested is an ongoing working group of 5-8 persons who meet regularly (biweekly via GoToMeeting), generate code (managed at ImmersiveAnalytics’ GitHub), share prototypes (posted at IA website), and write specifications (wiki at IA website). My prototyping platform is a dual-monitor Win10 desktop with Anaconda Python3 with Jupyter notebooks and Unity3D 5.3 with C# and some PlayMaker. Because of the open-source licensing of the IA community, the motivation to participate would be to develop prototypes (which could be commercialized), along with learning valuable new skills and having geeky fun.
If interested, let’s chat. Contact me via editors at ImmersiveAnalytics dot com.