Take 2

/* Creating a text summarization program in Python within Jupyter Notebook involves using libraries like NLTK, spaCy, or transformers (for advanced models like BERT). Here's a step-by-step guide to build a basic text summarizer: Steps: 1. Install Required Libraries Ensure the following libraries are installed. You can install them using: pip install nltk spacy transformers 2. Choose a Summarization Approach Extractive Summarization: Extracts key sentences from the text. Abstractive Summarization: Generates a concise version of the text using advanced models like BERT. 3. Implementation Below is an example code for both approaches. --- Code 1. Extractive Summarization with NLTK This method selects important sentences from the text based on word frequency. # Import required libraries import nltk from nltk.tokenize import sent_tokenize, word_tokenize from nltk.corpus import stopwords from collections import Counter # Download NLTK data (run only once) nltk.download('punkt') nltk.download('stopwords') # Sample text text = """ Text summarization is the process of creating a short and coherent version of a longer document. There are two main types of summarization: extractive and abstractive. Extractive methods involve selecting sentences directly from the document, while abstractive methods generate new sentences based on the content of the original text. Summarization has various applications, including document summarization, news summarization, and more. """ # Tokenize sentences sentences = sent_tokenize(text) # Tokenize words and remove stopwords stop_words = set(stopwords.words('english')) words = word_tokenize(text.lower()) filtered_words = [word for word in words if word.isalnum() and word not in stop_words] # Calculate word frequency word_freq = Counter(filtered_words) # Score sentences based on word frequency sentence_scores = {} for sentence in sentences: for word in word_tokenize(sentence.lower()): if word in word_freq: sentence_scores[sentence] = sentence_scores.get(sentence, 0) + word_freq[word] # Select top sentences summary_sentences = sorted(sentence_scores, key=sentence_scores.get, reverse=True)[:2] summary = " ".join(summary_sentences) print("Summary:") print(summary) --- 2. Abstractive Summarization with Transformers (BERT) This method uses pre-trained models for high-quality summaries. # Install the transformers library from transformers import pipeline # Load the summarization pipeline summarizer = pipeline("summarization") # Sample text text = """ Text summarization is the process of creating a short and coherent version of a longer document. There are two main types of summarization: extractive and abstractive. Extractive methods involve selecting sentences directly from the document, while abstractive methods generate new sentences based on the content of the original text. Summarization has various applications, including document summarization, news summarization, and more. """ # Generate summary summary = summarizer(text, max_length=50, min_length=25, do_sample=False) print("Summary:") print(summary[0]['summary_text']) --- Output Examples Extractive Summarization: Summary: Text summarization is the process of creating a short and coherent version of a longer document. Summarization has various applications, including document summarization, news summarization, and more. Abstractive Summarization: Summary: Text summarization creates concise versions of longer documents using extractive or abstractive methods, which have applications in various fields. --- */

Comments

Popular posts from this blog

Documentation

Text summarization sample