# Does When You Were Born Affect Your Chance of Becoming a Nobel Laureate? Scraping Wikipedia to Find Out

/     /   Ruby     Web Scraping

There has been a lot of talk in the UK recently about whether when you were born affects your schooling. Lots of teachers have noticed how pupils born at the end of the Summer often struggle compared with those born in the Autumn, which makes sense because the latter group are almost a year older when they start school than the former. However, teachers are not the only ones who think that when you were born affects your future. Astrologers base much of what they do on when a person was born.

To see if when you were born does affect your future, I have decided to look at various groups over a series of articles. For this, the first article, I am beginning with Nobel laureates, as they come from many different countries, represent excellence in their field and they are reasonably well documented on wikipedia. The data for the study was collected by using ScraperWiki to build a series of scrapers and views to go through the List of Nobel laureates and find the Date of Birth for each person.

## The Findings

The dates of birth were collated and from this frequency charts were constructed for the months of birth and star signs of the Nobel laureates. The findings are illustrated below.

### Distribution of Months of Birth Among Nobel Prize Winners

There is not too much difference here between the months, although June does stand out as having significantly more Nobel laureates than other months, and in fact is `3.75%` ahead of the lowest month, January.

February is interesting because it would be expected to be a bit lower since it has less days. If you look at February's lowest number of days compared to the highest number of days in other months you will see that it can have as little as `90%` of the days (`28/31 = 0.9032`). This is enough to account for its low percentage as if you take March with 31 days and take `90%` of its figure you would get `7.74%` (`8.57*0.9032`), which is a little low, but not markedly so.

Month of BirthFrequencyPercent
January506.69
February445.89
March648.57
April577.63
May638.43
June7810.44
July648.57
August689.10
September668.84
October699.24
November608.03
December648.57

Sample Size: 747

### Distribution of Star Signs Among Nobel Prize Winners

The difference in the distribution of star signs among Nobel laureates seems to be much greater then the distribution of the months in which they were born. It is quite clear here that Gemini and Libra stand out from the others, particularly when compared to Capricorn and Aquarius where the greatest difference is `5.09%`.

Star SignFrequencyPercentDates
Aries648.5721 March - 19 April
Taurus618.1720 April - 20 May
Gemini7810.4421 May - 20 June
Cancer729.6421 June - 22 July
Leo547.2323 July - 22 August
Virgo729.6423 August - 22 September
Libra8010.7123 September - 22 October
Scorpio567.5023 October - 21 November
Sagittarius608.0322 November - 21 December
Capricorn425.6222 December - 19 January
Aquarius476.2920 January - 18 February
Pisces567.5019 February - 20 March

Sample Size: 747

## Problems With The Study

• The distribution of months of birth and star signs could just represent the normal distribution for that population and therefore should be compared to non prize winners.
• The list is dominated by Europeans. Therefore there will be some similar conditions, such as weather patterns, although school terms will be different where relevant.
• A few of the laureates didn't have an accurate date of birth, and were therefore excluded.
• The sample size is relatively small.

## Conclusion

There does seem to be some variance between the birth periods and interestingly this seems to be more pronounced for star signs than for months of birth. In particular, Geminis and Libras or people born in June do stand out as being more likely to receive a Nobel prize, whereas Capricorns and Aquariuses or people born in January or February are less likely to receive a Nobel Prize.

## Commissions

This study highlights the power of scraping the web to extract these sort of statistics and given the time, this could be extended to increase confidence in the data and draw more accurate conclusions. If you would like to commission, vLife Systems, to create a scraper which will extract data from websites or other data sources of interest to you, please get in touch via email: info@vlifesystems.com.

## Scrapers and Views

For those interested, links to the views and the code for the scrapers is listed below. The code is current at the time of writing, but may have changed since, so please go to the original source to see the latest versions.

### Nobel Prize Winners Names and Wiki Urls

This was the first stage. The scraper was used to compile a database of Nobel laureates and links to their pages on Wikipedia. The original scraper is to be found on ScraperWiki: Nobel Prize Winners Names and Wiki Urls

``````require 'nokogiri'

html = ScraperWiki.scrape("http://en.wikipedia.org/wiki/Nobel_prize_winners")

winners = {}
doc = Nokogiri::HTML(html)
doc.css('table.wikitable td span.fn a').each do |a|
name = a.inner_text
wiki_url = a.attribute('href')
absolute_url = "http://wikipedia.org#{wiki_url}"
winners[absolute_url] = name
end

# Save data to database
winners.each do |url, name|
data = {
'url' => url,
'name' => name
}
ScraperWiki.save_sqlite(unique_keys=['url'], data=data)
end
``````

### Nobel Prize Winners' DOB

The next stage was to scrape the Wikipedia page of each person and get their Date of Birth. The original scraper is to be found on ScraperWiki: Nobel Prize Winners' DOB

``````require 'date'
require 'nokogiri'

module StarSign
# Dates from: http://my.horoscope.com/astrology/horoscope-sign-index.html
STAR_SIGN_DATES = {
'aries' =>       ['21 March 2011', '19 April 2011'],
'taurus' =>      ['20 April 2011', '20 May 2011'],
'gemini' =>      ['21 May 2011', '20 June 2011'],
'cancer' =>      ['21 June 2011', '22 July 2011'],
'leo' =>         ['23 July 2011', '22 August 2011'],
'virgo' =>       ['23 August 2011', '22 September 2011'],
'libra' =>       ['23 September 2011', '22 October 2011'],
'scorpio' =>     ['23 October 2011', '21 November 2011'],
'sagittarius' => ['22 November 2011', '21 December 2011'],
'capricorn' =>   ['22 December 2011', '19 January 2012'],
'aquarius' =>    ['20 January 2011', '18 February 2011'],
'pisces' =>      ['19 February 2011', '20 March 2011']
}

def star_sign
compare_date = Date.parse(self.to_s.sub(/^\d+-/, "2011-"))
STAR_SIGN_DATES.each do |sign, dates|
if compare_date >= Date.parse(dates[0]) &&
compare_date <= Date.parse(dates[1])
return sign
end
end
# FIX:  It has to be capricorn here, the problem is due to the years
return 'capricorn'
end
end

class Date
include StarSign
end

class DOBScraper

def initialize(dob_database)
ScraperWiki.attach(dob_database)
@population = prize_winners = ScraperWiki.select(
"name, url from nobel_prize_winners_names_and_wiki_urls.swdata
order by name"
)
@last_saved_name = ScraperWiki.get_var('last_saved_name')
end

def dump_dob(name, dob, star_sign)
data = {
'name' => name,
'dob' => dob,
'star_sign' => star_sign
}

ScraperWiki.save_sqlite(unique_keys=['name'], data=data)
ScraperWiki.save_var('last_saved_name', name)
@last_saved_name = name
end

def extract_dob(person)
name,url = person['name'], person['url']
begin
html = ScraperWiki.scrape(url)
rescue StandardError => error
puts "Error: #{error} (url: #{url})"
end

doc = Nokogiri::HTML(html)
doc.css('table.infobox th').each do |th|
if th.inner_text == "Born"
born = th.parent.at('td').inner_text
dob = born.scan(/.*?1[6789]\d\d/).first
begin
star_sign = Date.parse(dob).star_sign
dump_dob(name, dob, star_sign)
rescue StandardError => error
puts "Error: #{error} dob: #{dob} (name: #{name} url: #{url})"
end

end
end

end

def skip_person?(name)
return false unless @last_saved_name
name_index = @population.find_index{|winner| winner['name'] == name}
last_saved_index = @population.find_index{|winner| winner['name'] == @last_saved_name}
last_saved_index >= name_index && last_saved_index != @population.size-1
end

def scrape
@population.each do |person|
unless skip_person?(person['name'])
extract_dob(person)
end
end
end
end

dob_scraper = DOBScraper.new('nobel_prize_winners_names_and_wiki_urls')
dob_scraper.scrape
``````

### Nobel Prize Winners' Star Sign and Month of Birth Views

To visualise the results of the scraping I created a couple of views. I have decided not to include the code for these here as they are quite long and would be better off linked to. They can again be found on ScraperWiki: Nobel Prize Winners MOB and Nobel Prize Winners' Star Signs

## Feedback/Discuss

Delivered by FeedBurner

## Related Articles

### Pisceans and October Babies More Likely to Become Poets. Scraping Wikipedia Reveals All

This is the second in a series of articles looking into whether when you were born affects your future. In the previous article I looked at Nobel laureates, which are, of course, from a range of field...   Read More

### Improving the related_posts feature of jekyll

Now that I have converted TechTinkering over to Jekyll, I have come up against a bit of a problem with site.related_posts: The results are always just the latest posts, and are not filtered or ordered ...   Read More

### Mida - A Microdata parser/extractor library for Ruby

I have recently released Mida as a Gem for parsing/extracting Microdata from web pages. Not many sites at the moment are using Microdata, in fact, apart from this site, I only know of one other: Trust...   Read More

### A Jekyll Plugin to Display Ratings as Star Images

I have been using Jekyll a lot recently on the Trust a Friend website and found the need to display a rating as a series of stars. Initially I implemented this in JavaScript, which worked fine, but I ...   Read More