Archive for the ‘Statistics’ Category

Forgotten Photos

Posted by striatic in Statistics

Patrick Peccatte of the incredible PhotosNormandie has followed up on his metadata statistics for all Commons institutions by providing statistics on photos in The Commons which have received little attention. If you’re interested in helping to add tags or comments to any of these “forgotten” photos, his new post contains a comprehensive list of links to photos of this kind.

Patrick searched 18,290 photos across all 19 Commons institutions, finding that 5,123 had not received any comments, notes, or tags from Flickr members. That’s fully 28% of the total collection. While that seems like a large percentage, the underlying numbers are more informative. The Commons collection is very large in many respects, but it is still small enough that certain outliers skew the average significantly, as we will soon discover.

Let’s take a look numbers institution by institution, examining how many photos out each collection have not received any tags, comments, or notes. [Data collected on February 11, 2008]

Large Collections

Library of Congress – 44 out of 5,421 (0.8%)
Brooklyn Museum – 167 out of 2,554 (7%)
Smithsonian Institution – 327 out of 1,414 (23%)
Powerhouse Museum Collection– 336 out of 1,101 (30%)
New York Public Library – 561 out of 1,300 (43%)

The Common’s largest and oldest contributor, The Library of Congress, has had tremendous success in attracting attention and metadata from Flickr members. Less than 1% of their collection on Flickr goes without comments or tags from Flickr members. The Brooklyn Museum has had comparable success.  These institutions demonstrate that it is possible to maintain large collections while virtually no photos fall through the cracks. The Smithsonian Institution is also above average, although less obviously so.

The Powerhouse Museum has a primarily regional focus (Australia), which sets it apart from the other large collections, and falls slightly below the average rate of Flickr member contributions. The NYPL is relatively new to The Commons and has uploaded many photos in a short period. It may require time before the Flickr community discovers and interacts with these photos.

Mid-Sized Collections

State Library of New South Wales – 1 out of 250 (0.4%)
George Eastman House – 60 out of 592 (10%)
Nationaal Archief – 141 out of 590 (24%)
Library of Virginia – 93 out of 314 (30%)
Musée McCord Museum – 86 out of 236 (36%)

These collections, between 200 and 1,000 photos in size, show a wide range of activity. The State Library of New South Wales behaves like some of the smaller, more concentrated collections in The Commons. George Eastman House has a broad focus, more like the Library of Congress and Brooklyn Museum, with comment/tag rates to match. The Nationaal Archief is about average, but had Flickr member tagging disabled until very recently.

Like the Powerhouse Museum, two regionally focused collections fall below the average. Musée McCord Museum focuses on Canadian history, and The Library of Virginia focuses on the state of Virginia.

Small Collections

Imperial War Museum – 0 out of 10 (0%)
Australian War Memorial – 1 out of 42 (2.4%)
National Galleries of Scotland – 8 out of 107 (7.4%)
National Media Museum – 16 out of 130 (12 %)
National Library of New Zealand – 36 out of 161 (22%)
National Maritime Museum – 48 out of 191 (25%)
State Library of Queensland – 83 out of 152 (55%)

These institutions are pretty much all above average. The State Library of Queensland provides an exception but is so new to Flickr that it almost shouldn’t be in this list.

Smaller collections concentrate activity, and fewer of their photos are missed by Flickr members.

Non-English Collections

Bibliothèque de Toulouse – 378 out of 652 (58%)
Biblioteca de ArteFundação Calouste Gulbenkian – 2,745 out of 3,073 (89%)

60% of all untagged and uncommented Commons photos are from these two institutions, which are both from non-English-speaking countries. The outlier statistics from Biblioteca de Arte–Fundação Calouste Gulbenkian require a bit of context, however. Unlike most Commons institutions, The Biblioteca uploads photos with a thorough set of tags, applied by library staff. It may be that these photos don’t need as much metadata from Flickr members, and thus receive less.


The analysis presented here is very simplistic, and reaches for only the most simplistic conclusions.

Smaller collections become easily saturated with tags and comments, but very large collections are also capable of similar saturation. Regionally focused institutions have challenges drawing activity through the entirety of their collections if they grow beyond a certain size, and institutions from non-English-speaking nations seem to have even greater challenges in this regard.

Commons Metadata Statistics

Posted by striatic in Statistics

Patrick Peccatte of the incredible PhotosNormandie has just published an article that provides metadata statistics for all Commons institutions. The article also includes detailed information regarding how each institution uses machine tags and photo descriptions, so if you want all the details, be sure to check out the Google translation of the original article.

Here are the statistics relating to comments, tags, and notes. The institutions are displayed in the order in which they joined The Commons. Links are also provided to the photo at the top of each category within an institution. These are useful for discovering photos that have received a lot of attention. [data collected between February 7 and 8, 2009]

Library of Congress, Washington, DC, United States

Launched on 16 January 2008, currently has 5,421 photos in 5 sets.
11,675 comments, for an average of 2.15 per photo. Max = 133
75,143 tags, for an average of 13.86 per photo. Max = 72
2712 notes, for an average of 0.50 per photo. max = 33

Powerhouse Museum, Sydney, Australia

Launched on 7 April 2008, currently has 1,101 photos in 27 sets.
1,464 comments,for an average of 1.33 per photo. Max = 97
4,619 tags, for an average of 4.20 per photo. Max = 34
305 notes, for an average of 0.28 per photo. Max = 19

Brooklyn Museum, New York, United States

Launched on 28 May 2008, currently has 677 Commons images in 6 sets.
[Following are statistics re-collected today Feb, 21]
1,508 comments, for an average of 2.23 per photo. Max = 107
4,875 tags for an average of 7.2 per photo. Max = 65
373 notes or an average 0.55 per photo. Max = 20

Smithsonian Institution, Washington, DC, United States

Launched on 16 June 2008, currently has 1,403 photos in 12 sets.
1,468 comments, for an average of 1.05 per photo. Max = 68
5,687 tags, for an average of 4.05 per photo. Max = 43
238 notes, for an average of 0.17 per photo. Max = 19

Tags per Commons Photo

Posted by striatic in Statistics

Indicommons Chief of Development David Wilkinson recently investigated the distribution of tags across the Flickr Commons, creating the following graph from the data he accumulated.


Since almost every institution adds its institution name as a tag to every photo it uploads, every Commons photo possesses at least one tag. This accounts for the spike on the far left. The second spike, at three tags, is probably due to institutions like the Library of Congress adding a couple of institution specific “machine tags” to every photo they upload.

With this knowledge we can assume that many of the photos in the Commons with 3 or fewer tags have not been tagged by a Flickr member. Perhaps 2,500 or more of the 12,000 or so photos in Commons have not received any “member” tags. At around 20%, these untagged photos represent a sizable percentage of the Commons collection.

While the relatively large number of untagged photos in the collection is unfortunate, the graph also indicates that when Flickr members turn their attention to tagging photos, they add a significant number of tags. The graph’s curve crests at 9 or 10 tags, more than enough to thoroughly describe the visual contents of each image. Many photos receive even more tags than that. Indeed, David’s analysis was spurred by Shelley Mannion’s recent remark on Twitter that the Library of Congress had reached Flickr’s 75-tag-per-photo limit on certain uploads.

The following 15 photos from the Library of Commons collection possess 70 or more tags:

Library of Congress

Spreading this wealth of metadata seems to depend on connecting the untagged Commons photos with tag-happy Flickr members, who are clearly very industrious, in an effort to prevent photos from falling through the cracks and remaining entirely untagged.