(Edit: Sorry about that - I clicked submit before I finished writing the post. And its now 11 years, not 10. Anyway...)
Hi folks. I thought you might be interested in a little "back-burner" project I've been working on.
You might have seen a few of my previous threads on yearbook statistics: here and here. I thought these tables were sort of useful, but sadly time-consuming to produce as I was typing them in manually by hand.
I decided to investigate whether I could scan tables from yearbooks and then covert them to Excel worksheets with an OCR program. I discovered that this was possible, and worked pretty well (though is still somewhat a manual process).
Therefore, I can now present to you at the bottom of this post, downloads for yearbook statistics from 2002 to 2012 in Excel table format.
I plan to work back through more years in the coming weeks/months as time allows.
To explain, what you have here below is all the data for all countries given in yearbooks for the last 11 years. However, these are not just "crude scans". As you might be aware, there are some countries that are "dropped" (merged?), while others are "added" (gain independance?). For example, Tajikistan statistics stopped in 2007 etc.
This is where I could do with your help. For some countries, it is possible to recalculate the figures where the parts that change which make up the statistic are known. This can usually allow "like-for-like", year on year comparisons. (sorry - this is maybe more complicated than it sounds)
For example, the "USA" statistics up until 2006 included Alaska and Hawaii. In 2007, Alaska was included in the USA statistic. In 2011, Hawaii was added to the USA statistic. Because we know this, we can add Alaska to the USA statistic for years before 2007, and likewise we can add Hawaii for years before 2011. This allows us to have new row with a "like-for-like" comparison year-on-year, and produce a fair graph, for example. I was able to do the same with Israel/Palistine figures for 2011/2012.
The above scenarios are examples of where countries change - and are possibly relatively simple scenarios compared to the ones below. Now my own geo-historical knowledge isn't great, so I wondered if any of you can help see if I could do to other countries as I have been able to do for the USA and Israel/Palestine. Perhaps it might not be possible for all - Yugoslavia might get tricky, for instance. This is the list of countries I need information for, specifically, what country they merged with or gained independance from:
- Belau - removed in 2005
- Comoros - removed in 2004
- Montenegro - introduced in 2007
- Saint Barthelemy - introduced in 2012
- Saint Martin (watch spelling with "Saint Maarten"!) - introduced in 2012
- South Sudan - introduced in 2012
- Tajikistan - removed in 2007
- Timor-Leste - introduced in 2012
- Yugoslavia, F. R. - removed in 2002
Oh, and lastly the downloads.
The first file has tables presented pretty much as the yearbook presents them. Each year is in a separate tab at the bottom of the spreadsheet. Now while these are possibly "interesting", they don't make it very easy to compare "year-by-year" statistics or create graphs, unless you use complicated cross-sheet formulas (ain't nobody got time for that!). Hopefully you will find the second file more useful.
Yearbook Stats 2002 - 2012:
http://depositfiles.com/files/t4xhx9n5l
MIRROR: http://www.fileflyer.com/view/J2XhWAb
This second .xls file has each statistic field in a separate tab with each year in a column, and is far more useful for year-by-year graphs etc.
Yearbook Stats by Field 2002 - 2012:
http://depositfiles.com/files/aj02x7lej
MIRROR: http://www.fileflyer.com/view/J2XhWAb
Here are the scans of the actual pages if anyone cares to inspect/compare:
http://www.fileflyer.com/view/pSkAqAj ( 5.98 MB PDF )
MIRROR: http://depositfiles.com/files/vjos5a7ws ( 5.98 MB PDF )
As a disclaimer, I should mention that I don't expect these tables to be perfect - there will almost certainly still be errors that I haven't spotted. They were scanned by hand in a flatbed scanner and put through an OCR program, which does recognize characters wrongly sometimes. If you do find any numbers that are wrong, please PM me and I will fix them and re-upload with a "changeset log" for everyone to see. Oh, and lastly, do ignore the grid references in brackets next to the country names - I couldn't be bothered to remove them! I do hope you find the tables useful - I'm excited to see what you can do with them.