Take 2
/*
Creating a text summarization program in Python within Jupyter Notebook involves using libraries like NLTK, spaCy, or transformers (for advanced models like BERT). Here's a step-by-step guide to build a basic text summarizer:
Steps:
1. Install Required Libraries
Ensure the following libraries are installed. You can install them using:
pip install nltk spacy transformers
2. Choose a Summarization Approach
Extractive Summarization: Extracts key sentences from the text.
Abstractive Summarization: Generates a concise version of the text using advanced models like BERT.
3. Implementation
Below is an example code for both approaches.
---
Code
1. Extractive Summarization with NLTK
This method selects important sentences from the text based on word frequency.
# Import required libraries
import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from collections import Counter
# Download NLTK data (run only once)
nltk.download('punkt')
nltk.download('stopwords')
# Sample text
text = """
Text summarization is the process of creating a short and coherent version of a longer document.
There are two main types of summarization: extractive and abstractive. Extractive methods involve
selecting sentences directly from the document, while abstractive methods generate new sentences
based on the content of the original text. Summarization has various applications, including
document summarization, news summarization, and more.
"""
# Tokenize sentences
sentences = sent_tokenize(text)
# Tokenize words and remove stopwords
stop_words = set(stopwords.words('english'))
words = word_tokenize(text.lower())
filtered_words = [word for word in words if word.isalnum() and word not in stop_words]
# Calculate word frequency
word_freq = Counter(filtered_words)
# Score sentences based on word frequency
sentence_scores = {}
for sentence in sentences:
for word in word_tokenize(sentence.lower()):
if word in word_freq:
sentence_scores[sentence] = sentence_scores.get(sentence, 0) + word_freq[word]
# Select top sentences
summary_sentences = sorted(sentence_scores, key=sentence_scores.get, reverse=True)[:2]
summary = " ".join(summary_sentences)
print("Summary:")
print(summary)
---
2. Abstractive Summarization with Transformers (BERT)
This method uses pre-trained models for high-quality summaries.
# Install the transformers library
from transformers import pipeline
# Load the summarization pipeline
summarizer = pipeline("summarization")
# Sample text
text = """
Text summarization is the process of creating a short and coherent version of a longer document.
There are two main types of summarization: extractive and abstractive. Extractive methods involve
selecting sentences directly from the document, while abstractive methods generate new sentences
based on the content of the original text. Summarization has various applications, including
document summarization, news summarization, and more.
"""
# Generate summary
summary = summarizer(text, max_length=50, min_length=25, do_sample=False)
print("Summary:")
print(summary[0]['summary_text'])
---
Output Examples
Extractive Summarization:
Summary:
Text summarization is the process of creating a short and coherent version of a longer document. Summarization has various applications, including document summarization, news summarization, and more.
Abstractive Summarization:
Summary:
Text summarization creates concise versions of longer documents using extractive or abstractive methods, which have applications in various fields.
---
*/
Comments
Post a Comment