У нас вы можете посмотреть бесплатно Topic Modeling Theme Discovery for Customer and Brand Insights или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
In this lecture on Topic Modeling (Theme Discovery) for Customer and Brand Insights, we move beyond sentiment polarity and focus on a different question that matters in marketing practice: what are customers actually talking about. Using a Colab notebook and a streaming sample of Yelp-style review text, we build an end-to-end workflow that turns raw customer language into interpretable themes that can be monitored over time and connected to brand and customer decisions. We begin by defining a conservative preprocessing policy designed for topic discovery, including Unicode and whitespace normalization, lightweight HTML removal, and stable placeholder replacements for patterns like URLs, user mentions, and numbers. After quick sanity checks on document length, duplicates, and high-frequency tokens, we establish two classic reference points: LSA (TF-IDF + SVD) and LDA (Count + LDA). The goal is not to “pick a winner,” but to build intuition for how themes can be surfaced, what assumptions each method makes, and what their outputs look like in practice. Next, we shift to the modern topic discovery pipeline: Represent → Organize → Represent Topics → Insights. We represent documents as dense vectors (TF-IDF reduced via SVD and an optional non-transformer embedding baseline), then use dimensionality reduction (UMAP) to reveal local structure and make grouping easier. For clustering, we highlight why density-based methods like HDBSCAN are effective for messy real-world text: they can discover a variable number of clusters and assign ambiguous documents to an outlier group instead of forcing weak topics. To turn clusters into human-readable topics, we implement c-TF-IDF (class-based TF-IDF): we concatenate all documents within each cluster into a “cluster-document,” compute TF-IDF across cluster-documents, and extract the most distinctive keywords per cluster. We then attach exemplar documents selected from the cluster center so topics can be labeled consistently and used in business settings. The lecture culminates in a reusable Topic Dictionary artifact and document-level topic assignments that can be carried forward into downstream analyses (including upcoming brand mapping work). Instructor: Dr. Hyunhwan “Aiden” Lee, CSULB College of Business