ASA Connect

 View Only
Expand all | Collapse all

What graphs do people really use in practice?

  • 1.  What graphs do people really use in practice?

    Posted 11-06-2019 13:26
    Hello everyone!

    I teach community college intro statistics and some of the topics we teach are how to construct a stem & leaf plot, a dot plot, boxplots, histograms,.... you get the idea. Since I don't practice actual statistics and data science in the "real world", I feel like a fraud telling my students what graphs are created in the "real word". Does anyone actually make a stem & left plot??

    My question to you....

    • Do researchers/others make stem and leaf plots to display real data for others to read in research/publication/reports/websites/etc? What fields use these graphs?
    • I have the same above question for boxplots, histograms, dot plots, too.
    • Alternatively, what types of graphs are more commonly used in the "real world" that maybe aren't taught in a beginning statistics class?

    Thanks for helping me make intro statistics more real-world and up to speed. :)

    Jennifer

    ------------------------------
    Jennifer Ward
    Clark College
    ------------------------------


  • 2.  RE: What graphs do people really use in practice?

    Posted 11-07-2019 06:49
    Hi Jennifer,

    I work in the pharmaceutical and health science industries.  I have never created a stem and leaf plot, but we do use histograms, boxplots, bubble plots, dot plots, forest plots, line plots, and scatter plots with and without regression lines, extensively.  These are used for study reports, manuscripts, and administratively.

    Rebecca Hoagland
    Cota Enterprises

    ------------------------------
    Rebecca Hoagland
    Consulting Statistician
    Cota Enterprises
    ------------------------------



  • 3.  RE: What graphs do people really use in practice?

    Posted 11-07-2019 07:20
    Hey Jennifer!
    Sounds like you're doing right by your students by asking the right questions! They're lucky to have someone so proactive.

    Below are two links to tutorials for the plotting libraries "matplotlib" and "seaborn", which are in the Python programming language.
    If you scroll through them, you'll see what a practitioner thought was important to highlight to other practitioners. (I figure that carries more weight than me just asserting which plots are used.)

    Without getting caught up in the code segments, just scroll through these tutorials to see the pictures of the plots people are using. I've added some commentary to confirm that the tutorials are representative of the needs of many practitioners.

    Could someone who uses R more please do the same for ggplot, etc.? Thanks

    Best wishes,
    Glen


    Matplotlib
    Link:             https://towardsdatascience.com/matplotlib-tutorial-learn-basics-of-pythons-powerful-plotting-library-b5d1b8f67596
    Highlights:   Line, scatter, histogram, barchart, 3D scatter
    Comments: I can confirm that I use most of these on a daily basis. To this I'd add box-whisker plots (with scatter plot overlaid) as a tool for measuring predictive performance of algorithms.

    Seaborn
    Link:            https://elitedatascience.com/python-seaborn-tutorial
    Highlights:  Density plots, really attractive color schemes (color differentiation is a big deal...check out Cynthia Brewer's work for more info http://www.personal.psu.edu/cab38/ColorBrewer/ColorBrewer_intro.html )
    Comments: The main difference between these, and the matplotlib options above is the focus on (i) estimating where data is dense, and (ii) easy stratification of the data set so you can compare the plots of different classes/categories within the same data set (e.g., the famous Iris data set).

    Bonus Link:    https://seaborn.pydata.org/
    Seaborn's own page is really fun to click around too!




    ------------------------------
    Glen Wright Colopy
    DPhil Student
    University of Oxford
    ------------------------------



  • 4.  RE: What graphs do people really use in practice?

    Posted 11-07-2019 07:36
    Hi Jennifer,

    Fellow alum of a community college and fellow stats instructor here! Please let me thank you for all of the dedication and hard work you put into your class. I would not be who or where I am today without some excellent community college professors.

    To answer your questions, I have never used a stem-and-leaf plot and every course I've taken since 2011 that's covered them has always mentioned them in the context that no one uses them but they teach it any ways. I am curious to hear if anyone uses them because I'd love to see one in action.

    As a graduate student working in federal government I used bar graphs, box plots, and line graphs with the occasional heatmap or dot plot. In classes I used histograms, line plots, and bar graphs. On the job as a statistician working on 'omics research I primarily use box plots, PCA plots, and what are essentially combination line+ scatter plots. I've linked some additional resources I've found helpful and I hope you do too!

    Tracey Weissgerber et al., on Beyond Bar and Line Graphs
    https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002128
    Tracey Weissgerber et al., on Transforming Data Visualization to Improve Transparency
    https://ahajournals.org/doi/pdf/10.1161/CIRCULATIONAHA.118.037777
    Her Twitter is also an excellent resource for graphing best practices

    Karl Broman on How to Display Data Badly
    https://www.biostat.wisc.edu/~kbroman/presentations/IowaState2013/graphs_combined.pdf
    Karl Broman's Top Ten Worst graphs
    https://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/

    ------------------------------
    Jessica Randall, MPH
    Biostatistician | Emory Integrated Computational Core | Emory University
    ------------------------------



  • 5.  RE: What graphs do people really use in practice?

    Posted 11-07-2019 08:00
    Edited by Gerald Belton 11-07-2019 08:01
    I have never used a stem and leaf plot in "real life." I do think there's some value in having students create them by hand, because it helps to solidify the relationship between the data and the visualization. The same can be said for dot plots. I think the ubiquity of the personal computer has made these obsolete. The point of a stem and leaf plot is that it is easy to create with paper and pencil!

    I use a lot of histograms, scatter plots, and box plots. And of course, bar charts for categorical data. You need to know your audience, though. A lot of management people don't know what a box plot is, so if you use one you have to be prepared to teach the user how to read it.

    ------------------------------
    Gerald Belton
    North Carolina State Health Plan
    ------------------------------



  • 6.  RE: What graphs do people really use in practice?

    Posted 11-07-2019 08:18
    Hi Jennifer,

    As a statistical consultant at a university who works with and tutors mainly graduate students, I have occasionally seen stem plots used in papers, including the variant side-by-side plots. I think graphing technology has made stem plots somewhat less appealing, but they still have the advantage of also fully presenting the data.

    Perhaps the biggest use is that the Explore command in SPSS provides both histogram and a stem plot. Having both allows a check on the possibility that the binning algorithm for the histogram is not ideal. While I think I would be hesitant to put a stem plot in a paper or publication, they are useful for understanding.

    Boxplots, scatter plots, dot plots, line plots, are all popular (perhaps because they are readily available) and make their way into papers, theses and dissertations. There is a fair amount of less common (path and factor diagrams) and unique visualizations as well.

    ------------------------------
    Daniel Coven
    Graduate Statistics Consultant
    Arizona State University
    Daniel.Coven@asu.edu
    ------------------------------



  • 7.  RE: What graphs do people really use in practice?

    Posted 11-07-2019 08:28
    Hi Jennifer,

    Good question.

    Short Answer.
    In my work, I begin with exploratory data analysis that always includes box plots, bar charts /histograms, scatter plots, and sometimes pie charts.  So, at the community college level, I think those are the best plots to focus on.  Excel can generate all of these plots, in addition to statistical software like SAS, Stata, and R.

    More Info.
    For the next of sophistication, I like to add a LOESS curve to scatter plots.  LOESS is smoothing technique and gives a rough picture of the trends in the data.  I also use spaghetti plots, which involve taking a small random sample of observations and plotting the trajectories for the random sample.  On top of histograms, I like to add a kernal density plot, which is a curve to illustrate the shape of the histogram.  Q-Q plots are helpful in checking the goodness of fit with a normal distribution.  For these more sophisticated plots, one would need statistical software like SAS, Stata, or R.  These plots are probably beyond what one would learn in a community college class.  If you would like SAS code to generate any of these plots, I will be happy to share it with you.  I also have some Stata and R code for these graphics, but am most familiar with SAS.

    Another useful plot is a word cloud.   This is used for text analysis and displays the most commonly used words in the largest font.
    JMP can create a word cloud and I there is also an R package.  

    Hope this helps.  I have never used a stem and leaf plot on the job, nor seen one presented at a conference.

    ------------------------------
    Brandy Sinco, BS, MA, MS
    Statistician and Programmer/Analyst
    Michigan Medicine
    ------------------------------



  • 8.  RE: What graphs do people really use in practice?

    Posted 11-07-2019 08:46
    Edited by Andrew Lane 11-07-2019 09:07
    Hi Jennifer,

    My background is industrial statistics and engineering. For students in an introductory stats class, I'm considering two related learning outcomes:

    1. The ability to construct and communicate graphical representations of data, for example in reports and presentations
    2. The ability to critically interpret and evaluate graphical representations of data

    For (1) I believe box plots, scatterplots, run charts (e.g. line graphs, time series), and histograms are by far the most widely useful. The first three especially can convey a lot of information (sometimes too much!) using grouping and color, and I believe they are often underutilized. I've seen countless examples of using more complicated graphs when one of those will do, and using overly simple or flat out wrong graphs. If not trained otherwise, people seem to think every graph should be a bar chart! I don't remember run charts being taught but they are used all over business, often incorrectly. Stem-and-leaf plots are a relic, and dot plots are simply an alternative to the other charts in certain situations.

    Number (2) is probably more important to the average student and citizen, but it is a painfully neglected topic. It seems that advances in technology have made beautiful and interesting data visualizations all the rage nowadays. It's a positive development for sure, but there are drawbacks - it's easier than ever to use dazzling displays to abuse statistics and lead observers to false conclusions.

    Check out the NY Times weekly column "What's Going On in This Graph?" for more:
    • It's geared towards educators and students
    • There are weekly webinars live discussions
    • Teachers can get the graphs ahead of release to plan ahead
    • There are tons of collaborators, including some within ASA
    Hope I've helped!

    ------------------------------
    Andrew Lane
    ------------------------------



  • 9.  RE: What graphs do people really use in practice?

    Posted 11-07-2019 09:06
    Hi Jennifer,
    In my opinion, stem-and-leaf plots are more suitable for small datasets, and our datasets are growing due to increasing electronic sources.    I wish John Tukey (1915-2000) were alive to see what is happening in exploratory data analysis today.
    We now conduct "visual analytics", to uncover the story behind the data. Each visual answer one question and leads to many more.
    With interactive software, we generate a visual that is comprised of several graphic elements. This eDISH plot below is a scatter plot programmed by Jeremy Wildfire showing the relationship or correlation between two lab parameters. On the top and right, there are box plots showing their distributions. The dots are color coded for sex. The dots can also be sized to a 3rd parameter. If you hover over the dots, you see individual patient profiles, such as study day. If you click on a dot, it connects all dots for that patient so you can see the time trend. Then there are horizonal and vertical reference lines to identify clinically meaningful outliers. From this plot, other graphs are generated to help a safety scientist understand all potential causes of elevated lab values, such as drug exposure, hepatitis, etc. This graph is used in pharma and by the FDA. For exploring the real world, we are focused on a sharable, replicable, visual story in real time using all available data.
                                                    
    Scatter Plot of Two Lab Parameters

    Respectfully yours,
    Mike W. Colopy, PhD
    UCB Biosciences

    ------------------------------
    Mike Colopy
    Biostatistician
    UCB Biosciences
    ------------------------------



  • 10.  RE: What graphs do people really use in practice?

    Posted 11-07-2019 10:16

    Try to base it on evidence. Leaf through several issues of several journal you hope your students will publish in later. That is a nice teaching project that produces data that need some diagram themselves. And you would have to teach every diagram that comes up. Beware to generalize to other uses. What institutions produce on the fly or for internal use only may differ. I would hesitate to leaf through output produced by not so professional authors. Perhaps include just papers with at least two statisticians and two non-statisticians in the list of authors.

    Or you ask, what should be done. There are lots of pie charts out there that are much derided by statisticians. Barcharts are misused to display means. You got the right start by naming a diagram for every scale type (nominal, ordinal, count, continuous, censored, log, ... ). I would go on by scale type of independent variable. There is empirical research on what works (William Cleveland). How to do it: Grammar of graphics, ggplot2::, cheat sheet, and the full list. As Tufte wrote: Revise and edit.
    Stem&leaf is from the time when data and computer graphics were rare. It would be appreciated by a statistician trying to avoid mistakes before producing a violin plot for her reader. Similarly, Statisticians may consume diagnostic plots like QQ-plots, residual plots, leverage plots, likelihood of the Box-Cox parameter, etc. before producing an overlay plot of a pastell scatter of raw data and solid regression lines. Those are all scatterplots to you?
    For medicine see e.g. Lang, Secic or pharma recommendations. Forest plots have emerged indispensible as have flow-charts (in the equator-network).

    Looking forward to the other answers,



    ------------------------------
    Reinhard Vonthein
    Universitaet zu Luebeck
    ------------------------------



  • 11.  RE: What graphs do people really use in practice?

    Posted 11-07-2019 10:46
    Speaking from 25 years' experience in the telecom industry, I don't recall ever using a stem-and-leaf plot or dot plot in a context other than for teaching purposes.

    Box plots and histograms are very common visualizations in the analytics which I publish for business purposes. I also use graphics from Statistical Process Control and Lean Six Sigma traditions such as scatter plots, Pareto charts and control charts.

    As an alternative to control charts, I produce a lot of time series graphs which may combine bar charts and line charts. For example, I commonly display time series data with a bar chart showing volumes in a particular time frame overlaid with a line chart for a continuous metric calculated from the data whose volume is represented by the bar. This type of graphic is pretty popular among managers and executives in the business environment.

    ------------------------------
    James W. Miller, Ph.D.
    ------------------------------



  • 12.  RE: What graphs do people really use in practice?

    Posted 11-07-2019 10:47
    Hi Jennifer,
    It is said that teaching without doing research is akin to going to confession without sinning -- you can do it, but it isn't as interesting. I am impressed that you are trying to remedy the situation through your question (which is doing research). I worry how many other teachers in your position do't bother. The short answer to your question is: Absolutely. A longer answer would include a long list of discoveries either abetted by graphic displays (like the structure of diabetes (3D scatterplot), the character of the weather (complex glyphs on a map), how a cat lands on its feet, why the shuttle Discovery exploded on take-off) or graphs used to commuinicate discoveries to others (the French losses in the 1812 invasion of Russia, the character of the Great Migration of African Americans from the South after reconstruction, the extent of cheating on exams).
    There are a number of terrific books on the subject filled with such examples (see those by Edward Tufte or by me as examples).
    Good luck and thanks for asking,
    Howard Wainer

    ------------------------------
    Howard Wainer
    Extinguished Research Scientist
    ------------------------------



  • 13.  RE: What graphs do people really use in practice?

    Posted 11-07-2019 10:49
    It has been my experience that scientists (I assume this applies to non-scientists as well) use whatever software they have available to make graphs--often in Excel. As others have mentioned, I have never seen a stem-and-leaf plot in practice other than the "Unemployment Rates and Earnings by Educational Attainment" graph (https://www.bls.gov/emp/chart-unemployment-earnings-education.htm) that I have used from time to time when talking to high school students. While not technically a stem and leaf plot, It gets at the same idea.

    For question 2, I have used boxplots and histograms. Dot plots seem to be more of an educational tool. I have not seen it used in practice.

    For question 3, I would suggest that data scientists come up with all kinds of crazy-looking non-traditional graphs that in some cases are developed on the spot. Regarding education, I would suggest that covering the principles of graphics is important. For a given set of data, which graph is most appropriate? What are pitfalls to avoid when making or interpreting graphs? So, I think you can still cover the basic graphs (histogram, line plot, box plots, bar graphs, etc.). But pose complex situations with data and help students learn which types of graphs will better illustrate the point the author/researcher is trying to make. I like the idea of using "What's Going on in this Graph" to delve into the complexities of today's graphics and visual displays of data. 

    One challenge I experience many times is seeing a graphic for means and standard deviations across multiple groups. Individuals will place the groups across the x-axis and the y-axis will represent the response. A dot will mark where the mean is for each group and error bars will mark one standard deviation on either side of each of the means. They then proceed to discuss where error bars over lap and where they do not as if that means there are significant group differences. As we know, that is incorrect. Many times, these graph creators don't even know what the error bars represent (SD, SE, CI, etc.). But the error bars look nice and are a default by the graphics program they are using. So, I would love for someone to teach students that they need to know what every aspect of their graph represents. Otherwise they may be misleading their consumers. Another good one is the center line in box plots. Does that represent the mean or the median? Do the length of the whiskers represent 2 IQR? 3 IQR? 95% CI? 2 SD? 3 SD? SE? People usually don't know and then cannot describe their data accurately. So, understanding features of graphics and knowing the types of graphics that best illustrate the phenomena in your data is what I think some focus should include. Good luck!

    ------------------------------
    Jamis Perrett
    Bayer U.S.- Crop Science
    ------------------------------



  • 14.  RE: What graphs do people really use in practice?

    Posted 11-07-2019 11:28
    Jennifer


    Thanks for posting, this is a great question.  I'm sure you'll get a lot of replies from people much better versed in the art and science of data visualization than I claim to be, but a few thoughts I had upon reading your post:

    1) in an introductory statistics course, there's probably some value in teaching these things even if they aren't commonly used in daily practice because they help students think about distributions of data.  You're correct that I never use a stem-and-leaf plot in my daily work, and I suspect most of us don't, but I still think it was a useful part of my "introduction to statistics" experience for thinking about how data are distributed.  So the takeaway here is, something may have a degree of educational value even if not commonly used in the real world.

    2) Some of the more 'advanced' graphics probably would be difficult to cover in an introductory statistics course because they have a very specialized use; for example, I work in medicine and make a lot of Kaplan-Meier survival curves, but would not advocate teaching that in an introductory statistics course if only because most students will not 'need' that unless they are going sufficiently deep into the field that they take a course on survival analysis (others may disagree on this sequencing of learning...)

    3) You'll find no shortage of takes on boxplots, histograms, and dotplots, but IMO much like the stem and leaf plot, I still think they are useful for introductory students to understand the strengths and limitations of each and to think about what one gets from each piece of information about a dataset.  A boxplot gives you some information about how data are distributed, but also leaves some out, and thinking about what you actually get from it versus what you don't can be useful as a thought exercise (perhaps showing them a boxplot, then a boxplot with the data points overlaid, and thinking about the pros and cons of each).  The same could be said for a histogram - show them a histogram, then show them a histogram with different bins, then show a dotplot with all of the datapoints, and think about what you get from each.  Does that make sense?  

    Just my thoughts!


    ------------------------------
    Andrew D. Althouse, PhD
    Assistant Professor of Medicine
    Center for Research on Health Care Data Center (CRHC-DC)
    Center for Clinical Trials & Data Coordination (CCDC)
    University of Pittsburgh School of Medicine
    200 Meyran Avenue, Suite 300
    Pittsburgh, PA 15213
    Email: ada62@pitt.edu
    Twitter: @ADAlthousePhD
    ------------------------------



  • 15.  RE: What graphs do people really use in practice?

    Posted 11-07-2019 11:59

    First 3 rules of StatsClub:
    1. No pie charts
    2. No pie charts
    3. No pie charts

    I'd suggest you look at examples from:

    - ggplot2: http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html
    - plotly: https://plot.ly/r/
    - d3: https://github.com/d3/d3/wiki/Gallery

    These are commonly used modern data-viz tools with charts you'll see (or should see) in real world technology and data-rich firms. They're also commonly used in published by media firms like NyTimes, WashingPost, Economist (among others).

    Anyway, I'd suggest you find examples of simple charts and have your students start by learning to produce them in Excel. Yes, I love Excel (I'm sure Google sheets would have similar capabilities).  It will only be able to create the simple ones, but it will get them started. Then they can take on creating more complicated ones in R or Python.

    I cannot overstate how powerful simple, clean, and modern looking charts can be in the "real world".  It engages people in conversation, which is the point, right?

    Good luck!



    ------------------------------
    Iyue Sung
    ------------------------------



  • 16.  RE: What graphs do people really use in practice?

    Posted 11-08-2019 17:30
    re: the rules of StatsClub.

    NO 3-D graphs when 2-D graphs sill serve perfectly well. #-D graphs for 2-dimensional problems, like the dreaded Pie chart and bar charts with only 2 variables. 3-D totally misrepresent the relationships. Also, no cute images or Icon's like stacked cars or houses​.

    These graphs may look sophisticated, but the job of graphs and charts is to realistically represent the relationships found in the data. All the fancy graphics often misrepresent the relationships.

    Function over form, steak over sizzle.

    ------------------------------
    Michael Mout
    MIKS
    ------------------------------



  • 17.  RE: What graphs do people really use in practice?

    Posted 11-08-2019 17:33
    re: the rules of StatsClub.

    NO 3-D graphs when 2-D graphs sill serve perfectly well. 3-D graphs for 2-dimensional problems, like the dreaded Pie chart and bar charts with only 2 variables. 3-D totally misrepresent the relationships. Also, no cute images or Icon's like stacked cars or houses​.

    These graphs may look sophisticated, but the job of graphs and charts is to realistically represent the relationships found in the data. All the fancy graphics often misrepresent the relationships.

    Function over form, steak over sizzle.

    ------------------------------
    Michael Mout
    MIKS
    ------------------------------



  • 18.  RE: What graphs do people really use in practice?

    Posted 11-07-2019 12:55
    Edited by K. Wallace Todd 11-07-2019 12:56
    I work as a biostatistician in medical research and having graduated both undergrad and MSc in the last 5 years, I often wished I had a professor like you who asked questions that help prioritize the teaching of industry-used topics in the limited class time we had. Thanks so much!

    1. I've never used a stem and leaf plot. There are times I've seen rotated histograms (see: https://bit.ly/2WS0Jq5 ) to make comparisons between genders or disease groups by age group bins, which one could say are just a fancier version of the stem and leaf plot, but that's as close as I can think of.
    2. Histograms are crucial in what I do; playing with the bin sizes is often educational for me to see both the overall data shape and also the smaller valleys and peaks. Boxplots, histograms (and density plots), scatterplots, and spaghetti plots (various plots over unit of time following the progress of a group or individual) are used frequently in exploratory data analysis and in presenting the shape of researcher's data to them and the general community at large in the medical field I'm in. Most of my graphing life is either histograms or scatterplots by group with loess or linear regression lines mapped on top.
    3. I agree with most all the suggestions on types of graphs to learn in addition to those you've written. Just a few more:
      1. I would add an extra emphasis on heatmaps, which are used extensively genomics/genetics work (look up LD plot), and also double as a great way to look at missing data to seeing if the pattern of missingness by group is informative and important to account for or impute. Most entry stats classes don't introduce informative missingness, but as more analysts emerge from school without higher level education in statistics where this'd be covered normally, it might not be a bad idea to touch on it early on. See the vim package in r (https://bit.ly/2NpWaQx). Also heatplots can be a visual representation of correlations when they are correlograms (https://bit.ly/33q3bXb).
      2. Violin plots are the best of both histograms and boxplots. So long as the student knows how to interpret them, they let you look at the shape of the data (like a histogram) and also the IQR/median (like a boxplot). Also improves upon the boxplot since you can see the sample size you are drawing conclusions based on.
      3. As more machine learning gets used, the basics of classification might be a good idea to visualize graphically. Something like a dendrogram is both cool to look at and is actually used as an early tool in supervised machine learning courses. This would just be an expansion of the decision tree into graph form, and you could mention random forests (also a building block of machine learning). This sets up your students for much more advanced concepts.
      4. https://www.data-to-viz.com/caveats.html This site talks about the cons of various charts and how to make them better, like why the 3-d charts Microsoft users like to use can distort the perception of data (cough*tiltedpiechart*cough). I'd look through here to see if there things on the basic graphs (boxplots, histograms, scatterplots, ect) that would be important to add to a lecture.
      5. If you can take the time to show them some cool stuff that might encourage them to pursue further statistics education, spatial graphics are important in health research as we connect air pollution or heat or mosquito migration patters to various diseases. Go to a page of the Economist and show them the interactive graphics they have, and encourage them to pursue interactive or moving plots! Network diagrams are being used in several fields to show connections between things, and are really just a variant of process diagrams so frequently used in engineering.
    Glen Colopy had asked for an R user to add things for the ggplot2 package, so I've added that and also other visualization packages used fairly frequently in my day to day work.
    ggplot2 (most well used graphics package): http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html
    plotly (interactive graphics): https://plot.ly/r/
    ggvis (an expansion of ggplot2): https://ggvis.rstudio.com/ggvis-basics.html
    ggplot, ggvis, ggally, plotly, others: https://www.r-graph-gallery.com/
    many packages: https://datavizproject.com/

    ------------------------------
    K Todd
    ------------------------------



  • 19.  RE: What graphs do people really use in practice?

    Posted 11-07-2019 13:24
    Your note is about individual modes of graphics and whether they should be prioritized in teaching statistics. You might want to look at sophisticated modern data graphics (the New York Times provides a good resource) and judge for yourself the extent to which dot plots and box-and-whisker plots remain in use. 

    I've noticed that many of the graphics featured in statistics books are not rooted in data. By this I mean something pretty specific:  Do axes refer to variables and can individual data points be shown along with the statistical annotations? 

    I think we would do well to trim down the number of gratuitously different modes of graphics found in  statistics books in favor of a simpler and more consistent framework.  My proposal for this, illustrated here, involves following some simple constraints:

    1. Both axes should refer to the values of variables. (The one-variable case, so dominant in statistics books, should be encountered rarely. And even then, it should  be presented as if there were a second variable which happens just to take on  one level.)
    2. We should use the response/explanatory variable framework always,  with the response on the vertical axis.
    3. Every mark within the axes frame should be easily and unambiguously interpretable as part of a specific  layer of graphics. I suggest three basic kinds of layers:
      1. Data layer in which each mark is one case (as in a scatter plot).
      2. Interval layers in which each mark has an I-beam shape (as with confidence intervals) or is a band (as in confidence bands on models). I also use "summary intervals," for instance the central 95% of data values.
      3. Density layers, in which each mark is a display of density. So-called "violin" plots are, in my view, superior to histograms (too much visual clutter,  only one variable),  density  plots (only  one variable, the other axis is a quantity of no direct numerical interest), or box-and-whisker icons (they do work with the response/explanatory framework and can be effectively layered with data, but they distract attention from the data layer and aren't  as rich as violin plots).
    I want students to  be able to judge, without hesitation, what each axis refers to  and  whether a mark is about data or statistical annotation (intervals, violins).  I think it's also helpful, in communicating about statistics, to move away from the point-estimate mark, such  as showing  the means of two groups. Showing the point estimate and the confidence interval conveys no more meaningful information than just showing the confidence interval and discourages over-interpretation of small differences between point estimates. 


    ------------------------------
    Daniel Kaplan
    Macalester College
    ------------------------------



  • 20.  RE: What graphs do people really use in practice?

    Posted 11-07-2019 13:27

    Stem and leaf plots:  Essentially never.  Next to impossible to convince clients to use but I don't use them either.

    Dot plots: Best way to display such data but generally hard to sell to clients.  (Ugly and less informative (stacked) bar charts are many times requested for such data but I push back on that.)

    Box plots:  Occasionally and even then, only when the software automatically produces those and only in the beginning of an analysis to get a quick and coarse look at the data.

    Histograms:  Better to use nonparametric density plots as such software is now readily available, one can overlay multiple nonparametric density plots on a single figure (which one shouldn't do with histograms although software makes that readily available for histograms – yuk!), and a relatively smooth underlying density is what one is trying to estimate in the first place.  Clients tend to feel more comfortable with histograms but I push nonparametric density plots whenever I can (and, of course, when appropriate).  Maybe it's the way histograms are taught:  the vertical axis is almost always of a count even when comparing datasets with different sample sizes.  (If you could fix that misconception, that would be great.)

    What is likely not taught in beginning statistics classes:  Dynamic graphics are now more readily available and should be encouraged more.  I use Mathematica but lots of other software also provide such capabilities.  This is especially useful for showing the effects of changing predictor values in a regression where there are several predictors.



    ------------------------------
    Jim Baldwin
    Retired
    ------------------------------



  • 21.  RE: What graphs do people really use in practice?

    Posted 11-07-2019 17:56
    Okay, I've used stem-and-leaf plots, but only really when I'm in a meeting without computer at hand and want to see the distribution of a small table of data someone is displaying.  For that, stem-and-leaf and other Tukey-style EDA graphics can be useful, but I admit to having used them much less in recent years.

    ------------------------------
    Bill Harris
    Data & Analytics Consultant
    Snohomish County PUD
    ------------------------------



  • 22.  RE: What graphs do people really use in practice?

    Posted 11-07-2019 19:13
    Thank you for your insight, suggestions, resources, personal experience, and thoughts. It made my day to see that my question is an important one to be asking. 

    My takeaways from your feedback are...

    • Don't underestimate the convenience of Excel. I use other technology in class, but maybe I'm doing my students a disservice of not showing them how to use what is convenient...but I'll skip the pie charts. ;) (Thanks for the laugh Iyue!)
    • Stem and leaf plots are an "old technology", but they can serve the purpose of teach students about data.
    • I shouldn't throw out the baby with the bathwater. While we intro stats teachers teach graphs, we (I) should put in the caveat of "Let's use these tools (graphs) to build your skills, but here also are "real" graphs that you might see in the future."
    • I think I'd like to focus more one interpretation of data and not test students as much on the construction of graphs. I sat in on Tuesday's StatPREP webinar where I saw one of Danny Kaplan's Shiny Apps which showed violin plots and was inspired to use applets and technology in teaching "fancier" graphs. (Thanks Danny!)
    • I love What's Going on in the Graph and need to use it more often! (Thanks Andrew!)
    • I need to assess whether students can read each an every part of a simpler graph, and then move onto more sophisticated graphs. Or, if they don't understand what's shown in the graph, can they clearly articulate their question? (Thanks Jamis!)
    • THANK YOU for all your links and resources for further investigation! I think a dive into published research to survey what graphs are displayed would be a fun excursion for my students, and also for myself! 
    Keep the replies coming! :D I value the diversity in everyone's replies. 

    Jennifer

    ------------------------------
    Jennifer Ward
    Clark College
    ------------------------------



  • 23.  RE: What graphs do people really use in practice?

    Posted 11-07-2019 19:34
    Hey Jennifer,
    Glad to help!
    Your takeaway points summarize the answers very well, so you're definitely focusing on the right things.
    Best,
    Glen Wright Colopy


  • 24.  RE: What graphs do people really use in practice?

    Posted 11-08-2019 10:03
    The limitations of any graphic or basic statistic is a good thing to add. Here's an interesting site that shows how very different distributions can yield the same statistics and graphics. https://www.autodeskresearch.com/publications/samestats

    Andy Brendler

    ------------------------------
    Andrew Brendler
    ------------------------------



  • 25.  RE: What graphs do people really use in practice?

    Posted 11-08-2019 11:03
    Jennifer: In practice there are often many questions under consideration, each providing a p-value. If you rank (sort) the p-values from smallest to largest and plot them against the integers, 1, 2, 3, ...., your get a p-value plot. Using a p-value plot, you can judge any particular finding in the context of the other findings.If you see a roughly 45-degree line that is evidence for a uniform distribution (no effect). Stan Young

    ------------------------------
    Sidney Young
    Retired
    ------------------------------



  • 26.  RE: What graphs do people really use in practice?

    Posted 11-08-2019 12:28
    Several years ago, I was asked to review some books for the Journal of Biopharmaceutical Statistics and one of the things I did for my own benefit was a stem and leaf diagram of the years of publication of all the items in the book's bibliography. At least one of the books was stuck in the 1990s and earlier with pretty much all of its references.

    I would not publish a stem and leaf diagram, but it was the fastest way for me to estimate the median year of publication as well as the first and third quartiles.

    By the way, I have also used a tallying system invented by John Tukey, described at 

    https://www.johndcook.com/blog/2008/03/08/tukey-tallying/

    Very useful if you are counting something that hasn't been computerized yet.

    Given the ease with which computers can generate graphs, I think that much of what John Tukey has done is a historical curiosity, that makes sense only when you want a quick analysis without entering a bunch of numbers in a computer. He did a really cool thing called the Tukey resistant line that I love, but which I would never teach. Of course, I would definitely teach his boxplots.

    ------------------------------
    Stephen Simon, blog.pmean.com
    Independent Statistical Consultant
    P. Mean Consulting
    ------------------------------



  • 27.  RE: What graphs do people really use in practice?

    Posted 11-08-2019 21:21
    Jennifer and Others:

    Just one additional point that should be made concerning this prolonged thread, namely we should not just mix these graphics together as just about all of them involving displaying data but one in particular involves summarizing data.

    Wrt Stat 101, a stemplot is still a worthwhile graphic as such should be used to display a relatively small data set, which has students look at the data when sorting such (maybe sans tech the first time, which I sometimes refer to as the scrap-work stage, before using the TI-84 for the sorting). Then a boxplot almost naturally follows to summarize the data.

    Also, such could be used to lead to a histogram and bin widths by a simple counterclockwise rotation of 90o.

    Thus, Tukeyism is alive and still fairly well, as long as we keep the big data of data science at bay...

    ------------------------------
    David Bernklau
    (David Bee on Internet)
    ------------------------------



  • 28.  RE: What graphs do people really use in practice?

    Posted 11-08-2019 09:30
    Edited by Matthew Brenneman 11-08-2019 09:30
    Hi Jennifer,

    I appreciate your desire to make your course material relevant to your students. I would say for an intro stats class (which is typically looking at one and two random variables) , I would echo what most have said except I would add one caveat. In stats there is not an "official party line". Students want nice simple answers, but what drives a professional data analyst's decision is the PURPOSE of the study. This is actually the more important lesson for the students to learn IMO: for example, if my students are working with a data set on one quantitative variable that is not large and they want to determine if the plot is skewed or symmetric, usually a boxplot is the better plot (because it shows the grosser features of the plot and your eye doesn't fixate on a slight tail). On the other hand, if you're looking at outliers, BOTH plots are useful (sometimes boxplots show points that are "technically" outliers, but in reality, are part of a long tail of a skewed distribution). I say this because stats, like all disciplines, has "fashions" and new methods that sometimes people adopt because its trendy.

    So a good working list of plots that are used in the "real world" might be:

    • *One categorical (cat) var :      bar charts, pie plots (not a fan of), or Pareto, (if you have many categories).
    • Two cat vars:                           side by side bar charts
    • One quant var:                         histogram and boxplot (sometimes dot plot if the data set is small, say n<20)
    • Two quant vars:                       scatterplot
    • One quant and one binary cat: tough choice but usually side by side boxplot wins :)
    So with the exception of the stem and leaf (which as others pointed out is not used a great deal), the fundamentals will do your students well.Hope that helps and good luck.

    ------------------------------
    Matthew Brenneman
    Instructor of Mathematics & Statistics
    Embry-Riddle Aeronautical University
    ------------------------------



  • 29.  RE: What graphs do people really use in practice?

    Posted 11-08-2019 16:26
    As a practicing statistician for over 30 years, I can say that I have never used a stem & leaf plot in any of my presentations/reports/publications.  However, they do appear in output from various statistical software (e.g., SAS procedures).  Thus, it may be useful to introduce them in your course so that students will understand how to interpret them.  Box plots (or box and whisker plots) are very useful in concisely displaying the distribution of data by segments.  Since these type of charts are more difficult for the non-statistically savvy audience to understand, I typically reserve these charts for published articles or white papers.  For presentations of statistical findings to general audiences, I tend to use histograms, stacked bar charts and line graphs.

    ------------------------------
    James Kostecki
    Director, Research & Analytics
    College of DuPage
    ------------------------------