How can Python be used to perform a word frequency analysis on text?

To perform word frequency analysis on text using Python, follow these steps:

  1. Open the text file and read the text content.
with open("text.txt", "r") as file:
    text = file.read()
  1. Tokenize the text.
import re

# 去除标点符号和空白字符
text = re.sub(r'[^\w\s]', '', text)

# 将文本拆分为单词列表
words = text.split()
  1. Count the frequency of each word.
from collections import Counter

word_count = Counter(words)
  1. Sort and output the frequency of words.
for word, count in word_count.most_common():
    print(word, count)

The complete code is as follows:

import re
from collections import Counter

with open("text.txt", "r") as file:
    text = file.read()

text = re.sub(r'[^\w\s]', '', text)
words = text.split()
word_count = Counter(words)

for word, count in word_count.most_common():
    print(word, count)

Please make sure to replace “text.txt” in the code with the actual file path of the text file.

bannerAds