-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathwordcloud.html
More file actions
179 lines (145 loc) · 11.1 KB
/
wordcloud.html
File metadata and controls
179 lines (145 loc) · 11.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
<!DOCTYPE html>
<html lang="en">
<head>
<!-- META TAGS-->
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Wordlcoud Analysis ~ Tanmay's Portfolio</title>
<meta name="description" content="Tanmay's Career Portfolio・タンマイのキャリアポートフォリオ">
<meta name="twitter:image" content="photos/Helloname.jpeg">
<meta property="og:image" content="photos/Helloname.jpeg">
<!-- All the important imports-->
<link rel="icon" type="image/x-icon" href="photos/Favicon.png">
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Abel">
<link href='https://fonts.googleapis.com/css?family=Rubik' rel='stylesheet'>
<link rel="stylesheet" href="https://fonts.googleapis.com/earlyaccess/kokoro.css">
<link href="https://fonts.googleapis.com/css?family=Noto+Sans+JP" rel="stylesheet">
<link rel="stylesheet" href="css_folder/styles.css">
<link rel="stylesheet" href="https://unpkg.com/aos@next/dist/aos.css" />
<link rel="stylesheet" href="css_folder/wordcloud_styles.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.4.0/styles/atom-one-dark.min.css">
</head>
<body class="fade-in-text">
<!-- NAV BAR-->
<header class="nav-bar">
<div id="navitem"><h3><a href="index.html">Home</a></h3></div>
<div id="navitem"><h3><a href="#top">Top of Page</a></h3></div>
</header>
<!--Title-->
<h1 class="articletitle">Japan Wikipedia and WordCloud Project </h1>
<h3 class="tottori_japan_title">日本語での説明はこちら</h3>
<div class="data_intro">
<p class="intro">Do you know you can create some really good visualisations on enormous levels of text using Python? Powerful Libraries in Python enable us to make amazing visualisations with little effort
</p>
<img class="data_intro_photo" src="photos/onsei_ninshiki_smartphone.png" alt="Photo_of_man_speaking_on_phone" >
</div>
<div class="intro_div">
<p class="intro_desc">
The current project will mine small data description from Wikipedia and make a wordcloud on the available text data. We will use Wikipedia and Worcloud Plugins to create a really nice representation of the words obtained
</p>
<p class="intro_desc_jp">日本語説明</p>
<p id="date">Made on 1st April 2023</p>
</div>
<!-- Article photo-->
<img src="photos/osaka_castle.jpeg" alt="TottoriStation Photo" class="tottoristation">
<!-- Article Question -->
<h2 class="wordcloud_body question">What is a Wordlcoud?</h2>
<p class="wordcloud_body answer">
A word cloud is a collection, or cluster, of words depicted in different sizes. The bigger and bolder the word appears, the more often it’s mentioned within a given text and the more important it is.
</p>
<p class="wordcloud_body wordcloud_question">To give context, let's say I want to know about the prefecture of <a href="https://en.wikipedia.org/wiki/Japan">Japan</a> and I want to know what makes it so special and make it easy to present? What can we do?</p>
<p class="wordcloud_body body_padding">One way to find out is to create a Wordcloud and find the relevant keywords for the text or topic.We can automate the process by executing a Python code</p>
<p class="wordcloud_body body_padding">I have tried to solve this problem using the following steps:
<ol class="wordcloud_body projectlist">
<li>Classifying the information needed to for the project and finding it on the internet (in this case 'Japan')</li>
<li>Downloading the information using Python Libraries</li>
<li>Clean the data and remove the stop words</li>
<li>Compile the wordcloud</li>
</ol>
</p>
<p class="wordcloud_body body_padding" id="gettingstarted">Let's get started<br>
We first find the article we need to find to generate the word cloud. For this project we will take the data from the wikipedia page of <a href="https://en.wikipedia.org/wiki/Japan">Japan</a>.
</p>
<p class="wordcloud_body">We will then prepare all the libraries necessary for this projects<br>
⚠️I am using Anaconda on my PC so most of the packages for the project are installed, but these include:
<ol class="wordcloud_body projectlist">
<li><a href="https://pypi.org/project/wikipedia/">Wikipedia Package</a></li>
<li><a href="https://pypi.org/project/wordcloud/">Wordcloud Package</a></li>
</ol>
</p>
<p class="wordcloud_body">We will install the Pillow Package seperately</p>
<pre>
<code>
pip install pillow
from wordcloud import WordCloud, STOPWORDS
import wikipedia
from PIL import Image
</code>
</pre>
<p class="wordcloud_body">With the help of the Wikipedia Package, we can get the summary of a topic directly with a python query<br>
We also check for stop-words for the project.
</p>
<pre>
<code>
stop_w = set(STOPWORDS)
info = wikipedia.summary("Japan")
print(info)
</code>
</pre>
<p class="wordcloud_body">The output summary will look something like this in the terminal</p>
<pre>
<code>
Japan (Japanese: 日本, Nippon or Nihon, and formally 日本国, Nihonkoku) is an island country in East Asia. It is situated in the northwest Pacific Ocean and is bordered on the west by the Sea of Japan, extending from the Sea of Okhotsk in the north toward the East China Sea, Philippine Sea, and Taiwan in the south. Japan is a part of the Ring of Fire, and spans an archipelago of 14,125 islands covering 377,975 square kilometers (145,937 sq mi); the five main islands are Hokkaido, Honshu (the "mainland"), Shikoku, Kyushu, and Okinawa. Tokyo is the nation's capital and largest city, followed by Yokohama, Osaka, Nagoya, Sapporo, Fukuoka, Kobe, and Kyoto.
Japan is the eleventh most populous country in the world, as well as one of the most densely populated and urbanized. About three-fourths of the country's terrain is mountainous, concentrating its population of almost 125 million on narrow coastal plains. Japan is divided into 47 administrative prefectures and eight traditional regions. The Greater Tokyo Area is the most populous metropolitan area in the world, with more than 37.2 million residents.
Japan has been inhabited since the Upper Paleolithic period (30,000 BC), though the first written mention of the archipelago appears in a Chinese chronicle (the Book of Han) finished in the 2nd century AD. Between the 4th and 9th centuries, the kingdoms of Japan became unified under an emperor and the imperial court based in Heian-kyō. Beginning in the 12th century, political power was held by a series of military dictators (shōgun) and feudal lords (daimyō) and enforced by a class of warrior nobility (samurai). After a century-long period of civil war, the country was reunified in 1603 under the Tokugawa shogunate, which enacted an isolationist foreign policy. In 1854, a United States fleet forced Japan to open trade to the West, which led to the end of the shogunate and the restoration of imperial power in 1868.
In the Meiji period, the Empire of Japan adopted a Western-modeled constitution and pursued a program of industrialization and modernization. Amidst a rise in militarism and overseas colonization, Japan invaded China in 1937 and entered World War II as an Axis power in 1941. After suffering defeat in the Pacific War and two atomic bombings, Japan surrendered in 1945 and came under a seven-year Allied occupation, during which it adopted a new constitution and began a military alliance with the United States. Under the 1947 constitution, Japan has maintained a unitary parliamentary constitutional monarchy with a bicameral legislature, the National Diet.
Japan is a developed country and a great power. It is a member of numerous international organizations, including the United Nations, G20, OECD, and the Group of Seven. Its economy is the world's third-largest by nominal GDP and the fourth-largest by PPP, with its per capita income ranking at 36th highest in the world. Although Japan has renounced its right to declare war, the country maintains Self-Defense Forces that rank as one of the world's strongest militaries. After World War II, Japan experienced record growth in an economic miracle, becoming the second-largest economy in the world by 1972 but has stagnated since 1995 in what is referred to as the Lost Decades. Japan has the world's highest life expectancy, though it is experiencing a population decline. A global leader in the automotive, robotics and electronics industries, the country has made significant contributions to science and technology. The culture of Japan is well known around the world, including its art, cuisine, film, music, and popular culture, which encompasses prominent manga, anime and video game industries.
</code>
</pre>
<p class="wordcloud_body">⚠️Due to the encoding nature of HTML, the output above has highlighting, which primarily should not exist</p>
<p class="wordcloud_body body_padding">
We are Almost done, now it is time to plot the wordcloud!
<br>
The wordcloud automates the process of making a wordcloud, we now create a very simple but impactful wordcloud as follows:
</p>
<pre>
<code>
word_cloud = WordCloud(stopwords = stop_w, width=800, height=800).generate(info)
img = word_cloud.to_image()
img.show()
img = word_cloud.to_file('wordcloud_tottori.png')
</code>
</pre>
<p class="wordcloud_body">☀️The final output will look something like this!</p>
<img src="photos/wordcloud_tottori.png" alt="Wordcloud of Japan" class="wordcloud_body japan_wordcloud">
<p class="wordcloud_body">
Word Clouds are a powerful way to visualise what a text tries to convey about a topic.
</p>
<h2 class="wordcloud_body body_padding conclusion">What did we learn from this wordcloud?
</h2>
<p class="wordcloud_body">The wordcloud brings out the keywords in relation to the topic of 'Japan'.
Observing the keywords we can understand that the text looks at Japan through a very holistic lens. But, there are also a lot of keywords which
signify the history of Japan and its geography.
</p>
<h2 class="ending">Thank you very much for going through the project!</h2>
<a class="wordcloud_repo" href="https://github.com/happygoluckycodeeditor/review-wordcloud">Check Project files</a>
<!-- Footer of the page-->
<footer class="foot">
<img src="photos/logo2.png" class="logo2">
<p>©Tanmay Bagwe</p>
<p>Thank you for visiting</p>
<p>Any Questions? Contact me <a href="mailto:tanmay.bagwe.tb@gmail.com">here!</a></p>
</footer>
<!-- Adding AOS Script (Global settings)-->
<script src="https://unpkg.com/aos@next/dist/aos.js"></script>
<script>
AOS.init({
once: true,
});
</script>
<!-- Adding Highlight.js Script-->
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.4.0/highlight.min.js"></script>
<script>hljs.highlightAll();</script>
</body>
</html>