Difference between revisions of "Submissions:2014/Measuring Editor Collaborativeness With Economic Modelling"

From WikiConference North America
Jump to navigation Jump to search
 
(7 intermediate revisions by 4 users not shown)
Line 27: Line 27:
   
 
;Abstract ''(at least 300 words to describe your proposal)'':
 
;Abstract ''(at least 300 words to describe your proposal)'':
  +
Even though Wikipedia is a vanguard of collaboration, Wikipedians unfortunately have access to few tools for performance metrics - most notably “Edit Count”. The Wikimedia Foundation's WikiMetrics cohort notion <ref>Wikimetrics. [http://metrics.wmflabs.org/] </ref> has begun the search for higher-level metrics, but hasn't yet answered “how” users work together.
In our performance-driven world we care deeply about quantifying our contributions to Wikis, and yet we remain addicted the ''Edit Count'' metric despite all its shortcomings. Smarter metrics have been proposed such as counting hours spent editing, or the survival rate of a users contributed text. We investigated a method from Macroeconomics which considers the “exports” of a User - their contributed-to article portfolio. An unforeseen consequence was found in the results which suggest alternatives to measuring individual performance, but rather editor collaborativeness.
 
   
  +
This presentation will outline new methods for measuring and understanding editor collaborativeness. Borrowing from Economic modelling, new insights on economic competitiveness make the analog of editor collaborativeness possible. A recent stream of research in Macroeconomics has shown simple techniques for predicting GDP with very little information <ref>Hidalgo, Hausmann, The Building Blocks of Economic Complexity [http://chidalgo.com/Papers/HidalgoHausmann_PNAS_2009.pdf] </ref> <ref>Caldarelli et al. Firm Grounds. [https://www.ncbi.nlm.nih.gov/pubmed/23094044] </ref>, which can be translated into the wiki realm. Using only the data of which countries export which products (not even how much of each product), one can quickly predict GDP rankings. Here we re-purpose the algorithm, so that Editors are countries, articles are products, and GDP is “Total Labour Hours” (an edit count derivative <ref>Halfaker, Geiger, Using Edit Sessions [http://www-users.cs.umn.edu/~halfak/publications/Using_Edit_Sessions_to_Measure_Participation_in_Wikipedia/geiger13using-preprint.pdf]</ref>).
In Macroeconomics the assumption is that the best countries produce the best products; and the best products are those produced by the fewest countries (the hardest to produce). Therefore our problem of ''ranking user performance, based on article portfolio'' rests on answering the twin question ''ranking article quality, based on contributor performances''. It is possible to solve these two questions simultaneously, and the solution is similar to Google PageRank algorithm. Specifically we gather the user-article “matrix” of a Category (see Figure below of ''Category:Feminist Writers''). Then we produce the editor and article rankings, and compare them to two ground-truth rankings. For editors our ground-truth is "Labour Hours", which is derived from the editors contribution history. For articles our ground-truth is a mix 5 measures of articles text (citations per sentence, number of images, etc.)
 
   
  +
By constructing a relation between editors and the articles they have touched, we are able to produce an entirely new perspective on Wikipedia (see the Figure below). The simplicity of this model can help us to quickly and easily determine which categories of articles are more likely to be hostile and power-user dominated, and which are more egalitarian and collaborative.
To get an intuition for the method consider these telling extremes. The best editors in ''Category:Military history of the US'' - a category known for being very competitive - are characterized by emphasizing investment in touching many articles in the category. On the other end, the editors in ''Category:Sexual acts'' - a taboo subject where much editing could be considered perverse - are characterized by divesting in touching many articles in the category.
 
   
  +
Incredibly, this borrowed method works even better for Wikis than Economies. Where the maximum achievable correlation in Macroeconomics is about 0.42, we can achieve correlations of up to 0.91. However, the real innovation comes from two factors which are tweaked to optimize the model:
The correlation between our produced rankings and the ground-truth rankings rely on two factors in our model, termed α and β. These determine the ''importance'' of the high quality articles in an editor's portfolio, and conversely highly invested editors in an article's contribution history. When both α and β are optimized to maximize the ranking correlations we find correlations between 0.46 and 0.91 between the model and groundtruth metrics. (see Table below). By finding the optimizing values of α and β we know how characterized a category is by highly invested editors, or by highly developed articles. Taken together we can talk about the collaborativeness of a Category - how close they are to featuring highly divested editors and yet highly developed articles.
 
  +
  +
* ''importance'' of the high quality articles in an editor's contribution portfolio
  +
* (conversely) the ''importance'' of highly-invested editors in an article's contribution history.
  +
  +
These variables can range independently, and characterize our notion of "collaborativeness".
  +
  +
Collaborativeness is determined from the edit patterns of editors. Do they edit many articles? How well developed are the articles they edit? Consider these telling extremes:
  +
 
* The best editors in ''Category:Military history of the US''&mdash;a category known for being very competitive&mdash;are characterized by emphasizing investment in touching many articles in the category. Less collaborative.
  +
* On the other end, the editors in ''Category:Sexual acts''&mdash;a taboo subject where much editing could be considered perverse&mdash;are characterized by not touching many articles in the category. More collaborative.
  +
  +
We hope to receive critique on whether our algorithmic notion of ''collaborativeness'' is inline with community opinion. Additionally we hope to receive requests for different datasets to analyze for future research.
  +
  +
== References ==
  +
<references />
  +
</div>
  +
----
   
   
Line 44: Line 61:
   
 
;Slides or further information (optional):
 
;Slides or further information (optional):
  +
[[File:Feminist writers triangle matrix.png|600px|A triangular matrix from Wikipedia data]]
[[File:Wiki econ stats.png]]
 
   
 
[[File:Wiki econ stats.png|600px|A rendering of a latex table]]
[[File:Category-Feminist writerstriangle matrix corrected.png]]
 
   
 
;Special request as to time of presentations: <!-- (for example - can not present on Saturday) -->
 
;Special request as to time of presentations: <!-- (for example - can not present on Saturday) -->
Line 56: Line 73:
 
'''If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (<nowiki>~~~~</nowiki>).'''
 
'''If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (<nowiki>~~~~</nowiki>).'''
   
  +
# [[User:Rhododendrites|Rhododendrites]] ([[User talk:Rhododendrites|talk]]) 09:52, 10 April 2014 (EDT)
  +
# [[User:Geraldshields11|Geraldshields11]] ([[User talk:Geraldshields11|talk]]) 08:52, 27 May 2014 (EDT)
 
# ''Add your username here.''
 
# ''Add your username here.''
   
[[Category:Submissions]]
+
[[Category:Submissions/2014‎]]

Latest revision as of 01:13, 31 August 2016

Title of the submission

Measuring Editor Collaborativeness With Economic Modelling

Themes (Proposal Themes - Community, Tech, Outreach, GLAM, Education)

Community - presents a way to characterise editors.

Type of submission (Presentation Types - Panel, Workshop, Presentation, etc)

Presentation

Author of the submission

Max Klein

E-mail address

isalix@gmail.com

Username

w:User:Maximilianklein

US state or country of origin

California

Affiliation, if any (organization, company etc.)
Personal homepage or blog

[5]

Abstract (at least 300 words to describe your proposal)

Even though Wikipedia is a vanguard of collaboration, Wikipedians unfortunately have access to few tools for performance metrics - most notably “Edit Count”. The Wikimedia Foundation's WikiMetrics cohort notion [1] has begun the search for higher-level metrics, but hasn't yet answered “how” users work together.

This presentation will outline new methods for measuring and understanding editor collaborativeness. Borrowing from Economic modelling, new insights on economic competitiveness make the analog of editor collaborativeness possible. A recent stream of research in Macroeconomics has shown simple techniques for predicting GDP with very little information [2] [3], which can be translated into the wiki realm. Using only the data of which countries export which products (not even how much of each product), one can quickly predict GDP rankings. Here we re-purpose the algorithm, so that Editors are countries, articles are products, and GDP is “Total Labour Hours” (an edit count derivative [4]).

By constructing a relation between editors and the articles they have touched, we are able to produce an entirely new perspective on Wikipedia (see the Figure below). The simplicity of this model can help us to quickly and easily determine which categories of articles are more likely to be hostile and power-user dominated, and which are more egalitarian and collaborative.

Incredibly, this borrowed method works even better for Wikis than Economies. Where the maximum achievable correlation in Macroeconomics is about 0.42, we can achieve correlations of up to 0.91. However, the real innovation comes from two factors which are tweaked to optimize the model:

  • importance of the high quality articles in an editor's contribution portfolio
  • (conversely) the importance of highly-invested editors in an article's contribution history.

These variables can range independently, and characterize our notion of "collaborativeness".

Collaborativeness is determined from the edit patterns of editors. Do they edit many articles? How well developed are the articles they edit? Consider these telling extremes:

  • The best editors in Category:Military history of the US—a category known for being very competitive—are characterized by emphasizing investment in touching many articles in the category. Less collaborative.
  • On the other end, the editors in Category:Sexual acts—a taboo subject where much editing could be considered perverse—are characterized by not touching many articles in the category. More collaborative.

We hope to receive critique on whether our algorithmic notion of collaborativeness is inline with community opinion. Additionally we hope to receive requests for different datasets to analyze for future research.

References

  1. Wikimetrics. [1]
  2. Hidalgo, Hausmann, The Building Blocks of Economic Complexity [2]
  3. Caldarelli et al. Firm Grounds. [3]
  4. Halfaker, Geiger, Using Edit Sessions [4]


Length of presentation/talk (see Presentation Types for lengths of different presentation types)
75 Minutes

Preferred 30 mins to fit into a thematic session, but could talk longer.

Will you attend WikiConference USA if your submission is not accepted?

Yes, I if receive a travel scholarship as well.

Slides or further information (optional)

A triangular matrix from Wikipedia data

A rendering of a latex table

Special request as to time of presentations


Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

  1. Rhododendrites (talk) 09:52, 10 April 2014 (EDT)
  2. Geraldshields11 (talk) 08:52, 27 May 2014 (EDT)
  3. Add your username here.