У нас вы можете посмотреть бесплатно TF IDF and Text Mining или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
TF-IDF and Text Mining Term Frequency/Inverse Document Frequency/Inverse considers frequency of terms in documents, and frequency of terms within a document. Term frequency: the number of times term t is in document document d. Frequency of Terms within a document: a logarithmic equation. Identifies documents with frequent occurrences of rare terms. Lower value == less unique in documents (a common word, for intance). Components of TF/IDF include: CountVectorizer: compute word counts. Looks at words and creates a matrix of where it sees the words. We can provide it with stop words that it should not count. Tfidftransformer: Computes IDF, and then TF-IDF scores. Tfidfvectorizer: Does all three steps at once. TF-IDF creates a sparse matrix. When you have a matrix with a lot of 0s and/or empty values, it's more efficient to jsut show the cells with the data, and what the data are. It provides the location where it sees nonzero data. For an example, I went to LinkedIn, grabbed the "What you will do" text from job postings. I also copied the "about" from my own LinkedIn profile. Corpus is the body of data we are going to investigate. I read in each of these files to our corpus. Next, I read this into a count vectorizer. Then we look at the word matrix that results. I reduce words with stop_words and lemmatiziation. We can look at words with similar stems in the matrix, and then watch as they consolidate through lemmatiziation. I add tokenizer=LemmaTokenizer to our CountVectorizer constructor. If we lemmatize and use stopwords, stopwords may not find words that have been stemmed. For that, we take out the stop_words=English from CountVectorizer, and add self.stopSWords to our lemmatizer instead. After preparing the data, I use TfidfTransformer to compute frequency and print the term matrix. A low number indicates that the term has been found; a higher number indicates that it has been found frequently in the document. I then use TF-IDF to create a sparse matrix. Once we have our dataset, we can use it for predictive marketing. TruncatedSVD uses linear algebra to compress the dataset and find top features. This will create a list of lists, where each document is represented by a row in the list. Consider these business case: Match employees to jobs. Mine data for job salaries. Mine prices for inflation readings. To demonstrate, I add my resume to the corpus, and see what jobs are a good match for my resume. To look at job pay, I make a separate collection for pay, and I include that as a parameter when I read the job posting text. I treat this as a target I am trying to predict to the regression analysis. #TfidfTransformer #TruncatedSVD #textmining #python