У нас вы можете посмотреть бесплатно ggplot2 tutorial: Box Plots или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Learn more about ggplot2 in R at: https://www.datacamp.com/courses/data... In this chapter we'll take a look at our first class of specialised plots - what I like to call statistical plots. Actually, all the plots we've made so far can be called statistical plots, since data visualisation operates at the intersection of design and statistics. However, in contrast to the plots we've seen previously, these are more well-suited to an academic audience. We're going to look at two very common plot types for visualising the distribution of continuous data: box plots and density plots. We'll return to box plots in the case study where we develop a new stats layer to produce a two-dimensional box plot. Box plots were originally described by John Tukey in his 1977 classic text "exploratory data analysis", so actually they are not that old. He described them as a way of visualising the so-called 5-number summary. Let's begin with a continuous variable, typical descriptive statistics for the location and spread are the mean and standard deviation, as shown here. This is ok if our data is normally distributed, which we dealt with in the best practices chapter of the last course. Recall that the mean and standard deviation may not be the best representation of the data, in particular if we have extreme values that result in a skewed distribution. That's because the mean and standard deviation are not robust, so they'll be influenced by extreme values. Here, the measure of location and spread is the median, which is the second quartile, and the inter-quartile range, that is the IQR, which is the difference between the third and first quartiles. This is what is shown in a box plot, which you can think of as the robust equivalent of showing the mean and standard deviation. The difference is that here, our summary consists of 5 values, so we get a better impression of the distribution of the data set. We get an impression of the skew because each of the four segments, the two outer whiskers and the two parts of the inner box, reflect 25% of the data. There is a further subtle point to box plots, and that is the ability to show extreme values as distinct features. An extreme values is defined as falling outside the range 1.5-times the IQR, either below Q1 or above Q3. This is called the fence and is shown here as a dotted blue line. This is one definition of an extreme value. We can also change this threshold, although typically there is no reason to do so. however you should be aware that the fences can be adjusted and the default also depends on the software. To understand how the fence works, we'll use the same data set we' we've been working on so far but we'll pull the maximum value - coloured blue on our dot plot - further and further away from it's starting position, causing the dataset to slowly become positively-skewed. Any values that surpass the fence will be drawn as a dot. The boxplot is also updated: the whisker is drawn up to the highest observations within the fence. It doesn’t matter how many values are outside the fence, or how far away they are. They all get represented as dots. The whiskers never extend beyond the fence. If you're not clear about how the plot is drawn, you can actually get a very wrong view of the data's distribution. Many people who commonly use or encounter box plots are not aware that the mid-line is the median and not the mean. Nor are they aware that the dots represent extreme values that are part of the data, and therefore should not be disregarded. So use these plots with caution! Ok, let's take a look at the box plot geom in ggplot2 and some other problems you may encounter with boxplots in the exercises.