Difference between revisions of "2019/Grants/A bot to add reference support to Wikidata statements"

From WikiConference North America
< 2019‎ | Grants
Jump to: navigation, search
(One intermediate revision by the same user not shown)
Line 6: Line 6:
 
* My publications: https://scholar.google.ca/citations?user=u25grGjf85sC&hl=en
 
* My publications: https://scholar.google.ca/citations?user=u25grGjf85sC&hl=en
 
* My Wikimania 2019 presentations: https://commons.wikimedia.org/wiki/Category:Wikimania_2019_sessions_of_Houcemeddine_Turki
 
* My Wikimania 2019 presentations: https://commons.wikimedia.org/wiki/Category:Wikimania_2019_sessions_of_Houcemeddine_Turki
 +
* My Wikimedia profile: https://meta.wikimedia.org/wiki/User:Csisc
 +
As a Wikimedian, I began contributing to Wikipedia in 2009 by adding non-stub articles and ameliorating reference support in French and English Wikipedia. Thanks to this effort, I was a semi-finalist of WikiCup 2015 and my work has been featured in the main page of English Wikipedia several times. I also was among the founding members of Wikimedia TN User Group and launched several initiatives and projects to enhance the coverage of Tunisia-related topics linked to science, cultural heritage, and sociolinguistics. I was also a coordinator of The Wikipedia Library for two years and tried to provide full access to paywalled scholarly resources to Wikimedia Communities. Since 2017, I shifted my interest to Wikidata before retiring from Wikipedia in 2019. I was aware that Wikidata can be interesting for a variety of real life applications including health, industry, science and linguistics. That is why I started since 2018 to work on developing applications of Wikidata in Medicine and succeeded to contribute to Wikidata by adjusting several biomedical knowledge and adding new types of entities particularly in the context of COVID-19 pandemic. Currently, I am vice-chair of Wikimedia TN User Group, a board member of Wikimedia and Libraries User Group and an active member of Wikimedia Medicine. I am also trying during the last few months to co-found the first Wikimedia Research Unit in Tunisia under the supervision of the University of Sfax. This research unit is called "Data Engineering and Semantics" and will include all the research scientist of the University of Sfax working on Wikimedia-related research topics.
 
|geography=Worldwide
 
|geography=Worldwide
 
|type=Technology
 
|type=Technology
Line 16: Line 18:
 
|scalability=Of course, the bot can later evolve so that it can add references to Wikipedia articles.
 
|scalability=Of course, the bot can later evolve so that it can add references to Wikipedia articles.
 
|people=* I am an editor of Wikidata with over 100000 edits. https://xtools.wmflabs.org/ec/www.wikidata.org/Csisc
 
|people=* I am an editor of Wikidata with over 100000 edits. https://xtools.wmflabs.org/ec/www.wikidata.org/Csisc
* I am a published author in Computational Linguistics and I have the required skills to build the bot
+
* I am a published author in Computational Linguistics and Biomedical Informatics. I have a full mastery of the data model of Wikidata and published several research publications about it: https://www.sciencedirect.com/science/article/abs/pii/S1532046419302114 and https://ieeexplore.ieee.org/document/8308319/.
 +
* I have the required skills to build the bot including Python and HTML. I am currently co-founding a research unit in the University of Sfax dealing with Wikimedia Research and have all the capacities to use NLP techniques and data mining tools for our work.
 
|inclusiveness=This bot can reduce deletion rates in Wikipedia and Wikidata. More new editors will be encouraged to contribute more to Wikipedia and Wikidata when they find their work fixed.
 
|inclusiveness=This bot can reduce deletion rates in Wikipedia and Wikidata. More new editors will be encouraged to contribute more to Wikipedia and Wikidata when they find their work fixed.
 
|challenges=* Internet connectivity matters: We will use a high-speed internet connection option (4G).
 
|challenges=* Internet connectivity matters: We will use a high-speed internet connection option (4G).

Revision as of 01:38, 21 May 2020


Title:

A bot to add reference support to Wikidata statements

Name:

Houcemeddine Turki

Wikimedia username:

Csisc

E-mail address:

turkiabdelwaheb@hotmail.fr

Resume:

Born in May 24, 1994, Houcemeddine Turki is a long-term Wikimedian and a medical student at University of Sfax, Tunisia. He is also a published researcher in Computational Linguistics, Scientometrics and Biomedical Informatics.

As a Wikimedian, I began contributing to Wikipedia in 2009 by adding non-stub articles and ameliorating reference support in French and English Wikipedia. Thanks to this effort, I was a semi-finalist of WikiCup 2015 and my work has been featured in the main page of English Wikipedia several times. I also was among the founding members of Wikimedia TN User Group and launched several initiatives and projects to enhance the coverage of Tunisia-related topics linked to science, cultural heritage, and sociolinguistics. I was also a coordinator of The Wikipedia Library for two years and tried to provide full access to paywalled scholarly resources to Wikimedia Communities. Since 2017, I shifted my interest to Wikidata before retiring from Wikipedia in 2019. I was aware that Wikidata can be interesting for a variety of real life applications including health, industry, science and linguistics. That is why I started since 2018 to work on developing applications of Wikidata in Medicine and succeeded to contribute to Wikidata by adjusting several biomedical knowledge and adding new types of entities particularly in the context of COVID-19 pandemic. Currently, I am vice-chair of Wikimedia TN User Group, a board member of Wikimedia and Libraries User Group and an active member of Wikimedia Medicine. I am also trying during the last few months to co-found the first Wikimedia Research Unit in Tunisia under the supervision of the University of Sfax. This research unit is called "Data Engineering and Semantics" and will include all the research scientist of the University of Sfax working on Wikimedia-related research topics.

Geographical impact:

Worldwide

Type of project:

Technology

What is your idea?

My idea consists on creating a bot to process news feed and open source search engines to find references to unsupported statements in Wikidata.

Why is it important?

Wikidata statements that are not supported by references are not trustworthy enough to be considered. Adding accessible reference URLs to them will let possible to verify the accuracy of Wikidata statements and consequently to enhanced the quality of Wikidata database.

Is your project already in progress?

I already developed a Python code to retrieve references to biomedical statements from PubMed Central. The principle of the algorithm is explained in https://www.jclinepi.com/article/S0895-4356(17)31073-9/abstract and in https://linkinghub.elsevier.com/retrieve/pii/S1532-0464(18)30094-7

How is it relevant to credibility and Wikipedia? (max 500 words)

Finding references to Wikidata statements will ameliorate the quality of Wikidata-based bot-generated Wikipedia articles, particularly in the context of COVID-19 pandemic.

What is the ultimate impact of this project?

  • Reducing the number of unsupported Wikidata statements
  • Ameliorate the reference support for Wikipedia articles

Could it scale?

Of course, the bot can later evolve so that it can add references to Wikipedia articles.

Why are you the people to do it?

What is the impact of your idea on diversity and inclusiveness of the Wikimedia movement?

This bot can reduce deletion rates in Wikipedia and Wikidata. More new editors will be encouraged to contribute more to Wikipedia and Wikidata when they find their work fixed.

What are the challenges associated with this project and how you will overcome them?

  • Internet connectivity matters: We will use a high-speed internet connection option (4G).
  • Legal concerns: We will use open license tools and materials.
  • High-scale data to process: We will buy a high performance personal computer.

How much money are you requesting?

7500 TND (2 579,09 USD)

How will you spend the money?

  • 500 TND (171,94 USD) will be used to purchase high speed internet connection for the project.
  • 7000 TND (2 407,15 USD) will be used to purchase a high performance personal computer

How long will your project take?

6 months

Have you worked on projects for previous grants before?

https://meta.wikimedia.org/wiki/Grants:Project/Rapid/Csisc/SPARQL:_Be_connected_to_Wikidata