Русские видео

Сейчас в тренде

Иностранные видео


Скачать с ютуб Python tutorial: Cumulative Distribution Functions в хорошем качестве

Python tutorial: Cumulative Distribution Functions 8 лет назад


Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса ClipSaver.ru



Python tutorial: Cumulative Distribution Functions

Learn about empirical cumulative distribution functions: https://www.datacamp.com/courses/stat... We saw in the last video the clarity of bee swarm plots. However, there is a limit to their efficacy. For example, imagine we wanted to plot the county-level voting data for all states east of the Mississippi River and all states west. We make the swarm plot as before, but using a DataFrame that contains all states, with each classified as being east or west of the Mississippi. The bee swarm plot has a real problem. The edges have overlapping data points, which was necessary in order to fit all points onto the plot. We are now obfuscating data. So, using a bee swarm plot here is not the best option. As an alternative, we can compute an empirical cumulative distribution function, or ECDF. Again, this is best explained by example. Here is a picture of an ECDF of the percentage of swing state votes that went to Obama. A x-value of an ECDF is the quantity you are measuring, in this case the percent of vote that sent to Obama. The y-value is the fraction of data points that have a value smaller than the corresponding x-value. For example, 20% of counties in swing states had 36% or less of its people vote for Obama. Similarly, 75% of counties in swing states had 50% or less of its people vote for Obama. Let's look at how to make one of these from our data. The x-axis is the sorted data. We need to generate it using the NumPy function sort, so we need to import Numpy, which we do using the alias np as is commonly done. The we can use np.sort() to generate our x-data. The y-axis is evenly spaced data points with a maximum of one, which we can generate using the np.arange() function and then dividing by the total number of data points. Once we specify the x and y values, we plot the points. By default, plt.plot() plots lines connecting the data points. To plot our ECDF, we just want points. To achieve this we pass the string '.' and the string 'none' to the keywords arguments marker and linestyle, respectively. As you remember from my forceful reminder in an earlier video, we label the axes. Finally, we use the plt.margins() function to make sure none of the data points run over the side of the plot area. Choosing a value of 0.02 gives a 2% buffer all around the plot. The result is the beautiful ECDF I just showed you. We can also easily plot multiple ECDFs on the same plot. For example, here are the ECDFs for the three swing states. We see that Ohio and Pennsylvania were similar, with Pennsylvania having slightly more Democratic counties. Florida, on the other hand, had a greater fraction of heavily Republican counties. In my workflow, I almost always plot the ECDF first. It shows all the data and gives a complete picture of how the data are distributed. But don't take my word for how great ECDFs are. You can see for yourself in the exercises!

Comments