I'm pretty sure most here are familiar with stem-and-leaf plots (if not check out the stemgraphic
companion brochure)
What you might not know is that there is an open source (first released in 2016) EDA toolkit which has full stem-and-leaf support. It is stemgraphic (
http://stemgraphic.org). It includes a command line tool that can be used to analyze distribution of data. It is also a very easy to use python package for making graphical stem-and-leaf plots, usable in other programs, or in a Jupyter notebook environment.
It scales numerical stem-and-leaf plots with support for large computing clusters (tackling billions of data points without problem - see pydata 2016 video). Beyond the original Tukey stem-and-leaf plots with numerical values, as of version 0.5.0 (current is 0.5.3) it is able to handle categorical data or even text (for NLP, language analysis etc)
Additionally, the stemgraphic package include support for stem-and-leaf heatmaps, for comparing multiple heatmaps, for radar plots (levenshtein distance), for stem-and-leaf and word counts as bar or donut charts and 2d and 3d scatter plots to compare multiple text sources.
Documentation is available
online and as a
pdf.
Source code is on
github (feel free to star the project if you find it interesting) along with example
notebooks.
I'd love to get feedback, comments, suggestions, requests for enhancements etc.
Thank you kindly,
Francois
------------------------------
Francois Dion
Chief Data Scientist
Dion Research LLC
------------------------------