# Pisceans and October Babies More Likely to Become Poets. Scraping Wikipedia Reveals All

/     /   JavaScript     Ruby     Web Scraping

This is the second in a series of articles looking into whether when you were born affects your future. In the previous article I looked at Nobel laureates, which are, of course, from a range of fields. Now it is time to focus on just one discipline with poets. I have again used ScraperWiki to scrape wikipedia. This time using its list of poets and extracting each person's Date of Birth.

## The Findings

The dates of birth were collated and from this frequency charts were constructed for the months of birth and star signs of the poets. The findings are illustrated below.

### Distribution of Months of Birth Among Poets

There is not too much difference here between the months, although October and June do stand out as having more poets or less poets respectively. The difference between October and June is `2.99%`. This is interesting as the result for poets born in June is the inverse of the result for Nobel laureates born in June, where they were most likely to become a Nobel laureate.

Month of BirthFrequencyPercent
January448.21
February468.58
March499.14
April407.46
May427.84
June376.90
July478.77
August468.58
September458.40
October539.89
November448.21
December438.02

Sample Size: 536

### Distribution of Star Signs Among Poets

As with Nobel laureates the difference in distribution of star signs among poets seems to be much greater then the distribution of the months in which they were born. We can see from the graph that Pisceans are more likely to be poets than Geminis and Taureans. However, if you look more closely at Pisces at the top and Gemini at the bottom, you will find that while the difference is `2.8%`, this is actually less than the difference between highest and lowest for months of birth.

Star SignFrequencyPercentDates
Aries448.2121 March - 19 April
Taurus397.2820 April - 20 May
Gemini387.0921 May - 20 June
Cancer448.2121 June - 22 July
Leo478.7723 July - 22 August
Virgo519.5123 August - 22 September
Libra499.1423 September - 22 October
Scorpio438.0223 October - 21 November
Sagittarius448.2122 November - 21 December
Capricorn427.8422 December - 19 January
Aquarius427.8420 January - 18 February
Pisces539.8919 February - 20 March

Sample Size: 536

## Problems With The Study

The problems with this study are similar to those for the article about Nobel laureates.

• The distribution of months of birth and star signs could just represent the normal distribution for that population and therefore should be compared to non prize winners.
• Quite a few of the poets didn't have an accurate date of birth, and were therefore excluded.
• The sample size is relatively small.

## Conclusion

Unlike Nobel laureates the distribution of birth periods among poets seems to be about the same for star signs and months of birth. It seems that if your are a Piscean or born in October then you are more likely to be a poet and if you are a Gemini or born in June then you are less likely.

## Commissions

This series highlights the power of scraping the web to extract these sort of statistics and given the time, this could be extended to increase confidence in the data and draw more accurate conclusions. If you would like to commission, vLife Systems, to create a scraper which will extract data from websites or other data sources of interest to you, please get in touch via email: info@vlifesystems.com.

## Scrapers and Views

I have given examples of the code used to scrape Wikipedia in the previous article. For this article I will just reference the scrapers and provide code for one of the views instead.

### Poets' Month of Birth View

The view is again written in Ruby and has html and JavaScript embedded within an erb template. The JavaScript is being used to create a graph with Google Charts.

``````require 'date'
require 'erb'
sourcescraper = 'https://scraperwiki.com/scrapers/poets_dob/'

class PopulationStats

def initialize(population, field)
@population = population
@frequency = calc_frequency(field)
@population_size = calc_population_size
@percent = calc_percent

end

def calc_frequency(field)
frequency = Hash.new(0)
@population.each do |person|
begin
mob = Date::MONTHNAMES[Date.parse(person[field]).month]
frequency[mob] += 1
rescue
end
end
frequency
end

def calc_percent
percent = {}
@frequency.each do |variable, freq|
percent[variable] = 1.0 * freq / @population_size * 100
end
percent
end

def calc_population_size
population_size = 0
@frequency.each {|term, freq| population_size += freq}
population_size
end

end

MONTH_NAMES = Date::MONTHNAMES[1..12]

def sort_stats(stats)
MONTH_NAMES.collect do |month|
[month, stats[month]]
end
end

PAGE_TEMPLATE = "
<script type='text/javascript'>

// Load the Visualization API and the piechart package.

// Set a callback to run when the Google Visualization API is loaded.

// Callback that creates and populates a data table,
// instantiates the pie chart, passes in the data and
// draws it.
function drawChart() {

// Create the data table.
% population_percent.each do |stat|
['<%= stat[0].capitalize %>', <%= stat[1] %>],
% end
]);

// Set chart options
var options = {'title':'Months of Birth for Poets',
'width':650,
'height':500};

// Instantiate and draw our chart, passing in some options.
chart.draw(data, options);
}
</script>

<h2>The Distribution of Months of Birth Among Notable Poets</h2>

<div>
This was obtained by scraping wikipedia&#039;s
<a href='http://en.wikipedia.org/wiki/List_of_poets'>List of poets</a>.
Then the page of each person listed was scraped to see when they were born.
</div>

<!--Div that will hold the pie chart-->
<div id='chart_div'></div>

<table>
<tr>
<th style='text-align: left;'>Month of Birth</th>
</tr>
% term_index = 0
% population_freq.each do |stat|
<tr>
<td><%= stat[1] %></td>
<td><%= sprintf('%.2f', population_percent[term_index][1]) %></td>
</tr>
%   term_index += 1
% end
</table>

<strong>Sample Size:</strong> <%= population_size %>
</div>
"

ScraperWiki.attach("poets_dob")

data = ScraperWiki.select(
"dob from poets_dob.swdata"
)

popStats = PopulationStats.new(data, 'dob')
population_freq = sort_stats(popStats.frequency)
population_percent = sort_stats(popStats.percent)
population_size = popStats.population_size

puts ERB.new(PAGE_TEMPLATE, 0, '%').result(binding)
``````

## Feedback/Discuss

Delivered by FeedBurner

## Related Articles

### Does When You Were Born Affect Your Chance of Becoming a Nobel Laureate? Scraping Wikipedia to Find Out

There has been a lot of talk in the UK recently about whether when you were born affects your schooling. Lots of teachers have noticed how pupils born at the end of the Summer often struggle compared ...   Read More

### Improving the related_posts feature of jekyll

Now that I have converted TechTinkering over to Jekyll, I have come up against a bit of a problem with site.related_posts: The results are always just the latest posts, and are not filtered or ordered ...   Read More

### Mida - A Microdata parser/extractor library for Ruby

I have recently released Mida as a Gem for parsing/extracting Microdata from web pages. Not many sites at the moment are using Microdata, in fact, apart from this site, I only know of one other: Trust...   Read More

### A Jekyll Plugin to Display Ratings as Star Images

I have been using Jekyll a lot recently on the Trust a Friend website and found the need to display a rating as a series of stars. Initially I implemented this in JavaScript, which worked fine, but I ...   Read More