У нас вы можете посмотреть бесплатно How Your Systems Keep Running Day After Day - John Allspaw или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
How Your Systems Keep Running Day After Day: Resilience Engineering as DevOps John Allspaw, CTO/Researcher, Adaptive Capacity Labs DOES17 San Francisco DevOps Enterprise Summit https://events.itrevolution.com/us/ My goal today is twofold. One, I'm intending to challenge you. I'm hoping to provoke new thoughts, new questions in your mind. If, by chance, any of these new questions give rise to some anxiety, I want you, to assure you that that's quite normal. Don't worry. We'll get to some sort of resolution at the end. The anxiety may remain, but before we get started you'll notice I changed the title of the talk to How Your Systems Keep Running Day After Day, because that's really the general gist of this. Before we get started, I want to start with something. Can everybody read this? I don't know why everybody's laughing. I want to ask ... Don't worry, it's rhetorical, because I have the microphone. Is this dangerous? Well, at the very least, my expectation's that you'd say that it depends, right? Much like Nicole was saying earlier. All right, so let's take another one, a little bit more complicated. Right? What we see here is a diff. You see a change. This change is to an HTML comment. Change the case on the K. Right? Is this dangerous? Would your answer change if I tell you that this is for a load balance or health check? Okay, so let's get started. The point of both of those is that all work is contextual. "It depends," is an answer we give quite a lot, and that's important. We'll come back to this. Here's a slide about me. I won't spend too much time on it. Here are some of the places I've worked and things I've written, some places that I've studied. As Gene mentioned, I gave this talk, though I want to just point out that the last time I felt so strongly about the topics that I'm about to talk about was 2009 when I gave that talk with Hammond. What I want to talk about is new. It is different, and I feel very, very strongly about this. Another piece that might be relevant is my, the degree in Human Factors and System Safety. My thesis was Trade-Offs Under Pressure: Heuristics and Observations Of Teams Resolving Internet Service Outages. This helps set the stage, I guess, a little bit. I don't want you to worry too much about this. I want to give you a, some of you may have heard of this, what's called the Stella Report at a high level. I'll put the link up later. At a high level, this report is the result of a year-long project of a consortium of industry partners. IBM, Etsy, and IEX, trading company, a trading exchange in Manhattan. Over this year, folks from the Ohio State University Cognitive Systems Engineering Lab, David Woods, Richard Cook, and a number of other folks looked deeply at an incident in each of those organizations. Despite the fact that those organizations, from a funding, from a resourcing, from a market standpoint, from population standpoint, they found these six themes and that were common across all of them. What's most important is ... Certainly the results are quite important. It's how that research was done that I want you all to take a look at a little bit later, and yeah, just as a quick little bit of a cliffhanger, postmortems as recalibration. I'm going to talk a little bit about that. Blameless versus sanctionless. Controlling the cost of coordination. Visualizations, strange loops, and something that I want to pique your interest on. Dark debt. Okay, so that's the Stella Report. Here are my main points that I'm going to give you. One, we have to start taking human performance seriously in this industry. If we don't, we will continue to see brittle systems with ever-increasing impacts on our businesses and on society. Number two is that we can do this by looking at incidents going beyond what we currently do in postmortems or post-incident reviews or after-action reviews or whatever the hell you'd call them. Number three is that there do exist methods and approaches from the study of resilience in other domains, but they require real commitment to pursue. I'm going to talk about this. Doing this is both necessary and difficult, but it will prove to be a competitive advantage for businesses who do it well. That's the high level...