PDF Summary:Better Data Visualizations, by Jonathan Schwabish
Book Summary: Learn the key points in minutes.
Below is a preview of the Shortform book summary of Better Data Visualizations by Jonathan Schwabish. Read the full comprehensive summary at Shortform.
1-Page PDF Summary of Better Data Visualizations
When presenting information, clarity is paramount. In the guide Better Data Visualizations, author Jonathan Schwabish underscores the art of effective data depiction through visually appealing and easily interpretable charts and graphics.
Schwabish elaborates on fundamental principles, exploring techniques and strategies for creating compelling visualizations tailored to diverse audiences and data types. The guide delves into aspects like eliminating clutter, choosing appropriate visuals, and conveying insights with thoughtful annotations. Schwabish also emphasizes streamlining visualizations in alignment with your brand's aesthetic guidelines—ensuring consistency across an organization.
(continued)...
Avoid charts with dual axes and instead standardize the data to improve the clarity of comparisons.
Schwabish warns that charts with dual vertical axes often lead to confusion and the dissemination of misinformation. He presents multiple techniques for clearly differentiating data that employ different units of measurement.
Dual-axis charts can be visually confusing as they visually intertwine two distinct variables with different scales, leading readers to perceive meaningful intersections that are merely coincidental (p. 143). Schwabish suggests a range of tactics, including the alignment of charts along a common axis, the normalization of data through the calculation of indices or rates of change, and the exploration of alternative visual formats like connected scatterplots.
Various methods for depicting the distribution of data
Evaluating whether data distributions should be represented using histograms, box plots, or violin diagrams.
Schwabish delves into how data variability and uncertainty can be depicted, highlighting the importance of statistical graphics such as histograms, as well as other techniques like box charts and violin-like illustrations. Although these methods of depicting data are effective, they may require additional explanations and initial instructions for the audience due to their dependence on statistical principles.
Histograms segment data into intervals, thereby illustrating the frequency and uncovering any asymmetry, multiple peaks, or evenness within the data's distribution (p. 179). Box-and-whisker diagrams depict the distribution by showing the median, quartiles, and range, with the central rectangles elongating into lines that emanate from the midpoint. Violin charts offer a detailed representation of the anticipated density distribution, presenting a more complex view that surpasses the inherent simplicity found in Box-and-Whisker plots.
Choosing to present a synthesized visual representation of data or to display the raw data using plots like strip, beeswarm, or raincloud.
Schwabish delves into visualization methods that highlight individual observations, offering an alternative perspective to the collective representations commonly found in histograms, violin charts, and box-and-whisker plots. He delves into strip plots, beeswarm plots, and rainclouds, emphasizing their ability to uncover the diversity within data distributions.
Strip plots exhibit data points along a single axis, which helps in spotting clusters and outliers, but they may become cluttered as the amount of data grows. Beeswarm plots utilize a technique to spread out data points, preventing them from obscuring each other, which improves visibility and offers a comprehensive view of the data distribution. Raincloud plots combine the visualization of data distribution akin to violin charts with detailed displays akin to strip plots, facilitating an in-depth examination of both the overarching tendencies and individual data points, which assists in pinpointing outliers and repeated patterns.
Understanding perceptual considerations for interpreting visual uncertainty
Jonathan Schwabish emphasizes the importance of illustrating uncertainty within data visualizations to enhance both transparency and trustworthiness, particularly when sharing outcomes derived from statistical analysis or predictive models. He explores various techniques for visually depicting statistical confidence.
Uncertainty can be visually encapsulated by utilizing bands or lines that envelop a central line. Graphs frequently employ thicker lines or brighter hues to highlight the key value, whereas the boundaries of an estimate are usually represented by lines or shaded areas that indicate its limits. In stripe plots, the intensity of the color corresponds to the level of certainty, illustrating the different levels of uncertainty. Finally, fan charts employ varying shades to depict the increasing levels of uncertainty over time.
Charts that display geographical data
Acknowledging the limitations inherent in maps that employ color shading to depict data, and exploring various types of cartograms.
Schwabish acknowledges the inherent appeal of visual representations that facilitate the identification of specific areas and understanding their spatial relationships. Jonathan Schwabish cautions that using color-coded maps to represent data can inadvertently imply a connection between the size of the geographical region and the data, despite the absence of any actual relationship.
Employing color or shading on choropleth maps can naturally indicate geographic distributions, but it also has the potential to cause misunderstandings. For instance, the significance of a larger geographic area could be misleadingly overemphasized, despite the data it encompasses being of minimal relevance. Jonathan Schwabish explores the potential of cartograms, which alter the proportions of geographical regions to represent statistical data, as a possible solution.
Choosing suitable and uniform color schemes for data related to geography is essential.
Jonathan Schwabish underscores the importance of selecting color palettes with care, particularly in the representation of geographical data. He advises employing a gradient of colors that transitions from pale hues for smaller quantities to richer hues for larger quantities, aiding in the avoidance of misleading visual links.
He also recommends utilizing tools like Adobe Color, Color Brewer, Colour Lovers, and Design Seeds to develop unique and attractive color palettes.
Understanding the principles of using symbols to represent proportions, illustrating the concentration of points to indicate density, and portraying the layout of transportation systems on maps.
Schwabish explores a range of methods for representing geographic data that extend past the usual area-focused techniques. He highlights the distinctive visual characteristics of various types of data representations, including maps that depict quantities with dots, maps that illustrate movement, and symbols scaled according to value.
Shapes like circles or squares are used to represent data values on proportional symbol maps, with the area they occupy determining their size. Individual point symbols on dot density maps illustrate either single or multiple data values, effectively highlighting areas of geographic concentration. In flow maps, the thickness of the arrows often indicates the magnitude of movement between various locations.
Visual depictions serve to highlight relationships and associations.
Communicating association between multiple variables with scatterplots, bubble plots, parallel coordinates, and radar charts
Schwabish explores various methods to illustrate relationships and connections, underscoring the necessity of conveying and comprehending these links with clarity and precision. He discusses scatterplots, bubble plots, parallel coordinates, and radar charts, outlining their strengths and limitations.
Scatterplots can illustrate the nature of the relationship between two variables—positive, negative, or nonexistent—by positioning them within a coordinate system formed by perpendicular axes. Bubble plots use varying circle sizes to represent an additional dimension of data. Connecting lines across several vertical axes can illustrate the different relationships between data points. Finally, charts often known as spider or radar charts illustrate how variables, which are spread out along spokes that emanate from a central point, are interconnected.
Utilizing scales along with clarifying notes helps to reduce inaccuracies and recognize the natural limitations present in graphical representations.
Schwabish warns of the potential for distortion and the inherent perceptual difficulties associated with circular charts such as radar charts. Although such visual presentations may engage viewers, they can hinder accurate comparisons and often require an abundance of labels and explanations for the audience to fully understand them.
Circular configurations may result in inaccuracies as they present difficulties in our ability to precisely assess areas and angles, thereby hindering accurate comparisons. Jonathan Schwabish emphasizes the importance of using a consistent scale, highlighting specific data points, and adding clear labels and annotations to improve comprehension.
Exploring various visual representations such as network and tree diagrams to illustrate complex data and hierarchical relationships.
Schwabish illustrates the effectiveness of network and tree diagrams in depicting intricate connections and tiered systems, particularly when the information encompasses patterns of movement, links, or classifications. He underscores the necessity of making intentional choices about the layout and design to achieve clarity in visual presentations and avoid clutter.
In network diagrams, the relationships between nodes are depicted through the use of connecting lines, and techniques like hive plots are employed to emphasize specific patterns within the network. Tree diagrams visually depict the structure of hierarchy, beginning with a primary root and branching out to include subsequent nodes and ending with terminal leaves. Visualizations known as word trees display the frequency with which words are combined in a text, offering a graphical representation of their semantic relationships.
Visual representations that illustrate how separate elements contribute to the entirety.
Recognizing the challenges and constraints associated with the traditional pie chart, which can hinder comprehension and cognitive processing,
Schwabish responds to the frequent criticisms of pie charts by explaining their visual shortcomings and providing principles for determining when their utilization is suitable. While pie charts are widely understood and simple in design, their use can hinder precise comparisons and may lead to misleading interpretations if they include numerous segments.
He underscores that the prevalent critique arises from our limited ability to accurately assess and differentiate angles, particularly in the context of multiple segments within pie charts. Schwabish advises designers to ideally limit pie charts to three or four categories and to always start the sequence at the top center.
Identifying the circumstances in which donut charts are suitable and acknowledging instances where they might not be advisable.
Schwabish underscores the significance of careful implementation when using donut charts, which are characterized by a prominent central gap. Although pie charts with a hole in the center might appear more aesthetically pleasing, they can exacerbate the challenges of deciphering visual information.
Charts with a donut design offer the advantage of a central space that can be utilized for additional details or labels. However, if angle is the main determinant of our value perception, removing the reference point could potentially obscure the comparison further. Schwabish advises a comprehensive analysis of the data and consideration of various graphical alternatives prior to deciding on the implementation of a donut chart.
Exploring various techniques like treemaps, Voronoi diagrams, and sunburst charts to illustrate the organization and connections in data based on their relative sizes.
Schwabish presents a variety of visualization techniques to depict parts of a whole, offering alternatives that are more accurate and engaging than the traditional pie chart. He emphasizes the utility of different graphical representations like treemaps, along with Voronoi tessellations, for handling complex hierarchical structures and large amounts of data.
Treemaps partition the space into rectangles, enhancing the visual comparison and depiction of data with hierarchical structure, offering an alternative to pie charts through their rectangular layout. Diagrams known as sunbursts utilize rings that represent different levels of hierarchy to illustrate data with a hierarchical structure, similar to treemaps but laid out in a radial fashion. Finally, these diagrams partition a space into distinct, non-overlapping zones, offering a novel viewpoint and comprehensive insight for examining spatial relationships and understanding how separate elements connect to the whole system.
Other Perspectives
- While bar charts are flexible, they can oversimplify complex data, potentially leading to misinterpretation or loss of nuance.
- The use of less common visual representations may require a steeper learning curve for the audience, which could hinder the immediate understanding of the data presented.
- The recommendation to avoid clutter in bar graphs by using small multiples or dot plots might not be suitable for all audiences or data types, as these alternatives can also become cluttered or confusing.
- While line charts are effective for visualizing trends, they can be less effective for displaying discrete data points or distributions.
- The use of color to simplify data assumes that all viewers will interpret color in the same way, which may not account for color vision deficiencies.
- Standardizing data to avoid dual axes can sometimes oversimplify or distort the relationships between different data sets.
- Histograms, box plots, and violin diagrams, while effective, may not be the best choice for all data types, and their complexity can be a barrier to understanding for some audiences.
- Strip plots, beeswarm plots, and raincloud plots, while offering detailed views, can become overwhelming with large datasets and may not effectively communicate the main message.
- Visualizing uncertainty is important, but the methods for doing so can be misinterpreted by audiences unfamiliar with statistical concepts.
- Cartograms, while addressing some issues with traditional maps, can be confusing due to their distortion of familiar geographic shapes.
- The use of uniform color schemes in geographical data visualization may not account for cultural differences in color perception and symbolism.
- Proportional symbol maps, dot density maps, and flow maps each have their own limitations, such as potential overcrowding or misinterpretation of scale.
- Scatterplots and bubble plots are powerful, but they can be misleading if the relationship between variables is not linear or if there is a lot of data overlap.
- Radar charts, despite their visual appeal, can be difficult to interpret and compare across multiple variables.
- Network and tree diagrams can become overly complex and hard to follow, especially with large amounts of data or many nodes and connections.
- Pie charts, while criticized for their limitations, can be effective in certain contexts, such as when the data has a small number of categories with clear differences in size.
- Donut charts, despite their criticisms, can be more engaging and may encourage viewers to focus on the data rather than the chart type.
- Treemaps and sunburst charts can be confusing if viewers are not familiar with hierarchical data representations, and they can suffer from clutter with large datasets.
- Voronoi diagrams, while offering a unique perspective, can be abstract and difficult for general audiences to interpret without significant explanation.
Creating a consistent method for graphically representing data.
This section emphasizes the significance of adopting a consistent strategy for data visualization on an individual level as well as throughout various organizations. The author emphasizes the significance of adhering to recognized style conventions to ensure consistency and establish a unique brand identity.
Creating a collection of stylistic guidelines
Developing and assessing various color palettes.
The author emphasizes the importance of developing and applying color palettes that are not only visually striking but also align with the company's branding approach. He provides detailed guidance on developing color palettes, selecting shades, and making sure they are perceivable by those with color vision deficiencies.
He outlines five essential color schemes for presenting data: binary, sequential, diverging, categorical, and highlighting, each selected based on their suitability for different kinds of data and the relationships between them. Schwabish recommends investigating various color schemes and evaluating their transparency, legibility, and societal connotations with the help of digital tools like Adobe's color scheme creation tool, Color Brewer's choice platform, the Colour Lovers forum, and the palette inspiration resource from Design Seeds.
Selecting appropriate fonts and establishing a hierarchy of typographic significance.
Schwabish underscores the necessity of choosing fonts that not only improve legibility but also align with the corporate brand. He recommends avoiding the routine choice of conventional typefaces and promotes the investigation of unique alternatives available across various operating systems.
He underscores the importance of establishing a consistent typographic hierarchy, which involves selecting distinct fonts or styles for different elements of charts such as titles, legends, and annotations. Schwabish emphasizes the need for choosing fonts with consistently sized digits to improve the readability and comprehension of numerical data within tables.
Establishing optimal methods for transferring images between various platforms
Schwabish emphasizes the need for careful improvement of data visuals to ensure they maintain their clarity and accuracy when shown in different contexts from where they were initially developed. He provides guidance on selecting appropriate file formats while taking into account the platform, the intended use, and the visual characteristics of the image.
He examines the advantages and disadvantages of bitmap versus vector-based imagery. Pixel-based images become less clear when enlarged, whereas vector images, which are made from geometric shapes, retain their sharpness regardless of size. Schwabish recommends choosing file formats like PDF and SVG that are optimally designed for both print and web use, in addition to being compatible with multiple other platforms.
Other Perspectives
- While consistency in data visualization is important, too rigid an adherence to a single style can stifle creativity and innovation.
- Recognized style conventions may not always be the best fit for every type of data or audience, and flexibility can sometimes lead to better understanding.
- Color palettes that align with company branding may not always be the most effective for data representation, as branding colors are not necessarily designed for data clarity and differentiation.
- The five essential color schemes mentioned are not exhaustive and may not cover all types of data visualization needs.
- Digital tools for color scheme selection may not always provide the best options for all users, as they can be limited by their algorithms and available color ranges.
- The recommendation to avoid conventional typefaces may not take into account the accessibility benefits that familiar fonts can provide to a wider audience.
- Establishing a typographic hierarchy is useful, but it can become overly complex, leading to confusion rather than clarity if not implemented carefully.
- Fonts with consistently sized digits are important, but other aspects of font design, such as x-height and character spacing, are also crucial for readability.
- The focus on maintaining clarity and accuracy across platforms is important, but the process can be resource-intensive and may not be feasible for all organizations.
- The recommendation for file formats like PDF and SVG may not consider the limitations or specific requirements of certain platforms or user needs.
- Vector-based imagery is not always superior to bitmap; for instance, complex vector images can be more resource-intensive to render than bitmaps.
- The emphasis on digital tools and platforms may overlook the value of traditional, hand-drawn visualizations or physical models in certain contexts.
Grasping the impact of perception and cognition on data visualization is essential.
This section delves into the fundamental rules that govern how we visually interpret data, underscoring the importance of adhering to these rules to produce impactful data visualizations. Schwabish clarifies that understanding these principles can guide design decisions and enhance communication through imagery, leading to graphics that engage and have a lasting impact on viewers.
Utilizing fundamental concepts of how we see to improve the impact of visual displays.
Employing the concepts of Gestalt, such as proximity, similarity, enclosure, closure, continuity, and connection, is essential for the effective presentation of data.
Schwabish outlines the essential guidelines that govern our comprehension and classification of visual data, commonly referred to as the principles of Gestalt in visual perception. He emphasizes six core tenets that enhance the design of a range of charts, which in turn boosts transparency and assists in making the information clearer for viewers.
He delves into the tenets of visual grouping, addressing the concept of proximity by placing elements near each other; similarity by linking elements with shared characteristics; enclosure by creating a shared identity for elements within a boundary; closure by our inclination to perceive incomplete shapes as whole; continuity by recognizing uninterrupted paths; and connection by establishing visual ties between elements. By rigorously adhering to these guidelines, creators can successfully direct attention to highlight the essential components and relationships inherent in the data.
Understanding the way audiences decode visual information influenced by their initial, subconscious perception.
Jonathan Schwabish delves into the idea of "preattentive processing," a mechanism by which the brain rapidly and unconsciously assesses certain visual elements. Designers can guide the audience's attention to key details by thoughtfully incorporating elements like hue and scale into their chart designs.
He demonstrates his concept by assigning a simple exercise found on page twenty-five, where he challenges readers to identify the highest values in a table. When presented in a plain format, this task requires visual scanning and conscious effort. Employing a range of colors and their intensity levels can seamlessly distinguish specific elements, demonstrating how preattentive features facilitate instant recognition.
Other Perspectives
- While understanding perception and cognition is important, it's also crucial to consider the context in which data visualizations are used, as different contexts may require different visualization strategies.
- The principles of Gestalt are foundational, but they are not the only theories that can inform effective data visualization; other psychological and design principles can also be relevant.
- Strict adherence to Gestalt principles might not always result in the most effective data presentation; sometimes breaking these rules can lead to more innovative or impactful visualizations.
- Overemphasis on visual grouping principles might lead to oversimplification of complex data, potentially obscuring important nuances.
- The concept of preattentive processing is useful, but it's also important to consider the cognitive load on viewers; too many visual cues can be overwhelming and counterproductive.
- Relying heavily on elements like hue and scale might not be suitable for all audiences, particularly those with visual impairments such as color blindness.
- Preattentive features can indeed facilitate instant recognition, but they can also lead to misinterpretation of data if not used carefully and in balance with other design considerations.
The book explores the representation of different data types such as time-series, spatial data, and component diagrams in the context of their entirety.
This section explores the depiction of different data forms, outlining the specific challenges and techniques for their accurate and insightful portrayal. Schwabish emphasizes the necessity of considering both the characteristics of the data and the audience's understanding when choosing the right method for data visualization.
Understanding the different types of data and their corresponding measurements.
Identifying the nature of the data as either numerical or categorical and discerning its classification as categorical, sequential, or diverging.
Schwabish conducts an in-depth examination of different data types and scales, clarifying their traits and how these influence the choice of suitable methods for visualizing data. He clarifies the appropriate visual representations for each category, differentiating data that can be measured and expressed numerically, referred to as quantitative, from data that is descriptive and cannot be quantified, referred to as qualitative.
Additionally, he delineates the distinction between discrete data, which consists of whole numbers, and data that can be broken down into smaller and more precise measurements, elaborating on how each is utilized in different visual representations. He underscores the importance of understanding the different levels of data measurement, such as nominal (unordered categories), ordinal (ordered categories), interval (ordered categories with consistent differences but no true zero point), and ratio (ordered categories with a true zero point, enabling calculations). By analyzing the distinct characteristics present within the data, creators can select appropriate methods of visualization, ensuring the precision of their depicted information.
Recognizing and effectively communicating the subtleties related to potential biases and data uncertainty.
Recognizing potential biases across various datasets.
Schwabish underscores the importance of conveying to the audience any potential biases and inherent uncertainties that may be present in the data. He encourages those who develop visualizations to thoroughly scrutinize the origins of the data, diligently examining how it was gathered and the foundational presumptions associated with it.
He emphasizes the likelihood of inaccuracies in self-reported data, citing details from the American Community Survey carried out by the demographic analysis agency in the United States (p. 188). He details multiple factors that can lead to respondents giving incorrect information, ranging from deliberate falsehoods to depending on rough calculations or guesses. Acknowledging these constraints prepares the ground for a deeper comprehension and enhances the precision of visual representations.
Choosing appropriate methods to depict the inherent uncertainty in statistical models and calculations.
The author underscores the importance of employing accurate graphical components to depict statistical ambiguity, thus improving comprehension and preventing confusion. He delves into a variety of visual representations that communicate uncertainty, emphasize intervals that indicate the degree of statistical certainty, showcase gradients, and portray probability distributions, all the while elucidating their visual properties and the underlying statistical principles.
He explains that error bars are graphical elements attached to bars or lines representing the range of statistical uncertainty (p. 190). Charts that display confidence intervals use lines or shaded regions to visually represent the range of potential values within the estimated upper and lower limits. In visual representations, deeper shades signify higher levels of certainty, while varying tones suggest uncertainty in specific aspects of the information presented. Finally, fan charts visually demonstrate changing levels of uncertainty over time using color bands with varying saturations. Through the application of these visualization techniques, authors provide readers with the necessary resources to assess the reliability of the presented data and to understand the range of possible results.
Other Perspectives
- While Schwabish emphasizes the importance of considering the audience's understanding, it can be argued that overly simplifying complex data to match audience capacity might lead to misinterpretation or loss of critical nuances.
- The classification of data into numerical or categorical might overlook mixed data types that possess characteristics of both, which could require more complex visualization strategies.
- Schwabish's approach to visual representations may not account for the evolving nature of data visualization tools and techniques that could offer new ways of depicting data beyond traditional methods.
- The focus on distinguishing between discrete and continuous data might not fully capture the challenges in visualizing high-dimensional data that doesn't fit neatly into these categories.
- The emphasis on understanding levels of data measurement is important, but it might not address the practical difficulties in applying these concepts to real-world data that is messy and doesn't conform to ideal types.
- Recognizing potential biases is crucial, but the text may not acknowledge the inherent biases in the selection and design of visualization methods themselves, which can also shape audience perception.
- The discussion on conveying uncertainties might not consider the cognitive load on the audience; too much emphasis on uncertainty can lead to confusion or skepticism about the data's reliability.
- The use of error bars and confidence intervals is standard, but there is a risk that these visual elements are misinterpreted by a general audience unfamiliar with statistical concepts.
- Fan charts and other methods to depict uncertainty might not be universally understood, potentially leading to misinterpretation of the data's implications.
- The text may not address the limitations of visual representations in capturing the dynamic nature of data in an increasingly digital and interactive world.
Additional Materials
Want to learn the rest of Better Data Visualizations in 21 minutes?
Unlock the full book summary of Better Data Visualizations by signing up for Shortform .
Shortform summaries help you learn 10x faster by:
- Being 100% comprehensive: you learn the most important points in the book
- Cutting out the fluff: you don't spend your time wondering what the author's point is.
- Interactive exercises: apply the book's ideas to your own life with our educators' guidance.
Here's a preview of the rest of Shortform's Better Data Visualizations PDF summary: