User:Maximilianklein/submission

From Wikimania 2013 • Hong Kong
Submission no.
Title of the submission

Authority Addicts: The Rise of Authority Control in Wikipedia

Type of submission

Presentation

Author of the submission

Maximilian Klein

Country of origin

w:USA

Affiliation

w:OCLC

E-mail address

isalix@gmail.com

Username

w:User:Maximilianklein

Personal homepage or blog

notconfusing.com hangingtogether.org

Abstract

Librarians have long known the benefits of Authority Control (AC) in organizing information; it disambiguates between the multiple names of a single entity, or similar names of multiple entities. Since 2009 Wikipedias started taking up the practice themselves. Today .75 million articles have proliferated to include AC by way of crowdsourced drves, and algorithmic bots. The migration of AC to Wikidata in some ways makes Wikidata an Authority File itself, bridging multiple language versions of articles and multiple AC identifiers. Will Wikidata make obsolete traditional AC efforts? While that may be possible, it worth collaborating with each other for the goals of accuracy and coverage.

Detailed proposal

Librarians have long known the benefits of Authority Control (AC) in organizing information; it disambiguates between the multiple names of a single entity, or similar names of multiple entities. Disambiguation, as a term, is now popularized by its prominent use in Wikipedias. In 2009 German Wikipedia launched a large drive to further disambiguate by implementing AC by hand, which now covers nearly all applicable instances in the encyclopedia. Gadget-based computer-assisted editing made in-roads at Wikimedia Commons by adding AC to creator pages. In 2012 your presenter found community approval for - and programmed a bot to reciprocate - more than 250,000 links from VIAF (the Virtual International Authority File, an aggregation standard) in English Wikipedia.

The coming of Wikidata is an impetus to rethink Wikipedians' approach to AC in scope and in necessity. On the topic of scope, most Wikipedians have generally simplified to applying AC only to disambiguated names. However Authority Files in some instances, such as the German w:GND attempt classify all concepts of human knowledge, and uses seven additional categories, like Corporate names, Geographic places, and most generally Subject. The early attempt to blanket Wikidata with GND was rejected by the community since the classification describing most entities as subjects is understandably lacking. Yet there is still a use for classifying non subject entities like Geographic places using AC, and searching for more fitting classification schemes.

In the Person realm of AC, to which we're accustomed, we can merge AC control from multiple Wikipedias, for the advantage of refining and expanding each languages’ AC, and spreading AC to all Wikipedias. That is already happening with further bots written by your presenter and others. This confluence of centralized and editable data has important implications for the future. Wikidata bridges multiple Authority Files with multiple related Wikipedia pages, and thus becomes an Authority File in its own right. Combine that with the power of crowdsourcing for extensibility, and a you have a compelling competitor for a new de facto standard in Authority Files. Yet its acceptance as such will rely on the willingness of both Librarians and the General public to treat it as ‘authoritative’. Treating crowdsourcing as an authoritative source seems far-fetched, but similar scoffs were proven wrong when Wikipedia itself grew to mainstream recognition. Librarians may likewise be unhappy about utilizing Wikidata as an authority file, but it’s open and free nature could be tempting in a future of increasing complexity and financial constraints.

What would an adoption of Wikidata in AC mindshare mean for traditional AC? For one, they would still provide the necessary data for Wikidata until Wikidatians found legal and technical ways to duplicate all the underlying data on the other side of the canonical URI. Secondly, they would have support of the more hardcore users, by being up to 15% more accurate by some measures. Thirdly, they provide coverage on concepts that are not deemed notable by Wikipedia standards. The Library and Wikidata communities should codify their symbioses preempting this disruptive force. Traditional Authority Files, should take requests from Wikidata, treating it as legitimate source. Conversely each Wikidata item should supported by another Authority File grounding it the larger world.

Track
  • Cultural and Educational Outreach
Length of presentation/talk

25 Minutes

Language of presentation/talk

English

Will you attend Wikimania if your submission is not accepted?

Yes

Slides or further information (optional)
Special requests

None


Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

  1. Blue Rasberry (talk) 16:21, 21 April 2013 (UTC)
  2. Ocaasi (talk) 17:27, 30 April 2013 (UTC)
  3. Add your username here.