У нас вы можете посмотреть бесплатно Intelligent Log Analysis and Real-time Anomaly Detection @ Salesforce - Andrew Torson или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Application performance monitoring is one of the key DevOps and SRE duties for any large cloud-based organization, with Salesforce being one of the largest of such. The volume of application logs being collected is enormous and may surpass tens of gigabytes per second for large SaaS operations. Trying to identify a downgrading trend in performance or detect an application anomaly or perform a root-cause analysis is a hay-in-the-needle problem which many companies only perform retroactively on a case-by-case basis, typically knowing what they want to find and commonly using ad-hoc offline queries in Splunk/ElasticSearch or some Big SQL tools. Real-time pro-active log monitoring requires a different approach: a stack of unsupervised statistical analytics and ML models augmenting and classifying the ingested log data at medium-cardinality scopes to identify performance problems as they start to unwind. These methods can not be too complex as statistics need to be continuously re-evaluated online because trend and anomaly detection is a moving-target problem with volatile/spiky dynamics. Moreover, there are many definitions of anomalies and performance degradation events as each software service has unique performance pitfalls, especially when it comes to root-cause analysis. In Salesforce, we decided to build a public-cloud-hosted elastic multi-tenancy real-time log analysis/anomaly detection platform so that each tenant can easily onboard their own anomaly models and log monitoring analytics and scale it independently based on their log volume processing requirements. Apache Flink is the engine that powers our platform: most of the intelligent monitoring can be performed at scale using time-windowing features of Flink, with more complex ML models served through Flink connected streams. Flink also provides basic multi-tenancy support through Application Master sessions running on top of elastic compute cluster managers such as YARN, Kubernetes and Mesos: this allows us to provide per-tenant elasticity and billing cost estimation, often saving tens of thousands of dollars in monthly compute costs. In this talk, I'll provide a high-level end-to-end overview of our platform and show how we deliver the results via alerting services for DevOps monitoring and via real-time BI tools for performance engineering insights.