Wikipedia talk:Wikipedia Signpost/2024-04-25/Recent research

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Discuss this story

Wikipedians are more careful than to believe in the results of convenience sampling. -SusanLesch (talk) 14:21, 25 April 2024 (UTC)[reply]

Huh, can you explain in more detail why you characterize the sampling method used by this survey as "convenience sampling"? That term is most often used for methods that rely on a grossly unrepresentative population (say surveying a class of US college students for making conclusions about all humans). But "people who access the Wikipedia website within a given timespan" is a pretty reasonable proxy for "Wikipedia users" (in the general sense).
For context: Recruitment of survey participants via banners or other kinds of messages on the Wikipedia website itself is kind of the state of the art in this area. (It has also been used in numerous editor and reader surveys conducted by the Wikimedia Foundation.) It e.g. forms the basis of many of the most-cited results on e.g. the gender gap among Wikipedia editors. Yes, it comes with various biases (which, as already indicated in the review, one can try to correct after the fact using various means, see e.g. our earlier coverage here of an important 2012 paper which did this regarding editors: "Survey participation bias analysis: More Wikipedia editors are female, married or parents than previously assumed", and the WMF's "Global Gender Differences in Wikipedia Readership" paper also listed in this issue). But so does any other method (door-knocking, cold-calling landline telephones, etc. - and regarding phone surveys, these biases have become much worse in the last decade or so, at least in the US, as political pollsters have found out).
In sum, it's fine to call out specific potential biases in such surveys (e.g. I have been reminding people for a over a decade now that - per the aforementioned 2012 paper - one of the best available estimate for the share of women editors in the US is 22.7% as of 2008, considerably higher than various other numbers floating around). But dismissing their results entirely strikes me as a nirvana fallacy.
Regards, HaeB (talk) 19:25, 25 April 2024 (UTC) (Tilman)[reply]
Hi, Mr./Dr. Bayer, thank you for your enthusiastic defense. Your sample size is admirable. Maybe our difficulty is in defining terms. I use the term convenience to describe samples created at the convenience of the researcher, to include self-selected participants. The latter is the problem here. I have no knowledge of statistics to share, only the admonition from a former professor that convenience surveys are the weakest sort. It's pretty simple: I never do surveys. My sister always does. The same caveat applied when Elon Musk asked whether he should step down as head of Twitter. His answer looks legitimate and scientific all the way down to one decimal point. I promise to read your article and all of its sources in detail (which I have not had a chance to do) after my editing chores are done. -SusanLesch (talk) 13:55, 26 April 2024 (UTC)[reply]
I still sense a lot of confusion here.
Your sample size is admirable. - Not sure what you mean by the possessive pronoun here, I was not involved at all with this survey.
Maybe our difficulty is in defining terms. - If you were using the term "convenience sampling" in a different meaning than the established one, it would have been good to clarify that from the beginning.
to include self-selected participants - It sounds like you are referring to the mundane fact that participation in the survey was voluntary, which is the case for almost all large-scale social science surveys (and even legally compulsory surveys like the US census have great trouble achieving a 100% response rate and avoiding undercounting). Again, while this might cause participation biases, these can be examined and to some extent handled (see above). It's not a valid reason for dismissing such empirical results out of hand.
I am also very unclear about the relevance of your sister and Elon Musk to this conversation, except perhaps that the latter's social media use illustrates the dangers of shooting off snarky one-sentence remarks based on a very incomplete understanding the topic being discussed. In any case, I appreciate your intention to now actually read the Signpost story that you have been commenting on.
Regards, HaeB (talk) 21:00, 26 April 2024 (UTC)[reply]
Mr./Dr. Bayer, I don't have your fancy vocabulary, nor am I being snarky (nor was Mr. Musk, who asked an honest question). This discussion has become so unpleasant that I no longer wish to read your sources' methodology. The sampling your article describes leads us away from high grade information. -SusanLesch (talk) 16:54, 27 April 2024 (UTC)[reply]
It is great that we have some new good survey data about the community. It is ridcolous they are not available under open licence as open data, and that such a big survey was done without WMF cooperating with this and/or ensuring the data will be available. This is something for the mentioned white paper on best research practices to consider, actually. --Piotr Konieczny aka Prokonsul Piotrus| reply here 00:57, 26 April 2024 (UTC)[reply]
I am a bit confused about what you are referring to.
It is ridcolous they are not available under open licence as open data - the dataset is available (it's how I was able to create the graphs for this review, after all), and licensed under CC-BY SA 4.0.
such a big survey was done without WMF cooperating with this - judging from the project's page on Meta-wiki, the team extensively cooperated with the Wikipedia communities where the survey was to be run (and also invited feedback from some WMF staff who had previously run related surveys). Plus they followed best practices by creating this public project page on Meta-wiki in the first place (actually on your own suggestion it seems?), something even some WMF researchers occasionally forget unfortunately. What's more, the team also notified the research community in advance on the Wiki-research-l mailing list.
Regards, HaeB (talk) 03:46, 26 April 2024 (UTC)[reply]
PS: Also keep in mind that the Wikimedia Foundation has so far not been releasing any datasets from its somewhat comparable "Community Insights" editor surveys. (At least that is my conclusion based on a cursory search and this FAQ item; CCing TAndic and KCVelaga to confirm.) So I am unsure why you are confident that a collaboration with WMF would have been ensuring the data will be available.
PPS: To clarify just in case, I entirely agree with you on the principle that (sanitized) replication data for such surveys should be made available as open data.
Regards, HaeB (talk) 04:08, 26 April 2024 (UTC)[reply]
@HaeB what you write in PPS is pretty much what I meant. Reading the Signpost article gave me the impression this is not the case here (This dataset paper doesn't contain any results from the survey itself. And from the communications around it (including the project's page on Meta-wiki at Research:Surveying readers and contributors to Wikipedia) it is not clear whether and when the authors or others are planning to publish any analyses themselves. Hence we are taking a quick look ourselves at some topline results below (note: these are taken directly from the "filtered" dataset published by the authors, without any weighing by language or other debiasing efforts).) I gather that something is available but not as much as it shoulds be. As for PS, yes, WMF is hardly a paragon of virtue in this regard either, and it is worth complaining about it too. WMF should be a paragon here, and should be both showcasing and enforcing best practices. Piotr Konieczny aka Prokonsul Piotrus| reply here 01:48, 28 April 2024 (UTC)[reply]
Hi @HaeB, thanks for the ping and sharing analysis of this survey data! I'm confirming that we don't release the Community Insights data under open access, as the FAQ states, because we don't have the resourcing to do so (though we are open to working with Affiliates and Researchers under NDA).
To shine a bit of a light on at least my understanding why we don't do this: typically the most interesting data in Community Insights is the demographic data, which also happens to be the most sensitive. As procedures for data re-identification have become more sophisticated (cf. Rocher et al. 2019), survey techniques used for decades for deidentification and anonymization have fallen behind (Evans et al. 2023). This becomes even more complex as lots of data about Wikimedians is open and queryable, and thus provides secondary datasets to potentially use for identification. One current approach to deidentification is differential privacy, which can increase plausible deniability about participation in a survey (ibid.) by shifting data around within the dataset, but this requires resourcing to do it right and increases confidence intervals, which then require larger sample sizes. However, active editors are a finite and relatively small population (compared to, say, the country-level populations for the European Social Survey or American Community Surveys), and with the tools we have to reach them while maintaining data integrity, increasing sample sizes is currently not possible. Another approach would be to do more heavy-handed data suppression and grouping (eg. the US Census ACS data suppression procedure, which suppresses 1-year sample data for any geographic or group with less than 65,000 eligible participants), which would cause discrepancies in independent analyses and remove most variables of research interest. While the Community Insights data may seem quite trivial from a US perspective, we also have to think about a whole world of laws and possibilities where it may not be trivial (“unknown unknowns”). In essence, before we release any data for open access, we want to be extra careful about the privacy implications of that data – because, once it’s out there, it’s out there forever.
Regarding the analyses above and questions others have had about the quality of the data, my instinct from a brief look at the age distribution (specifically the modal age categories of 18-24 and 65+) is that the survey attrition between who started and who got to the demographic questions at the end of a rather long survey may be biased towards people who have more time (in this case, potentially college students and retirees). Based on the CentralNotice banner being displayed to both logged in and logged out users, I would assume that the sample is primarily readers rather than contributors (there are more logged out than logged in people by a large margin), and the gender data more closely resembles that of the Wikimedia Readers Survey recently conducted by @YLiou (WMF). I’m less worried about the sampling bias (as it was technically randomly sampled after the initial 100% display, though different sampling rates and changes within wikis creates some complications for calculating sampling error estimates) than the non-response bias (different response rates from different types of users conditional on being sampled), which could both be introduced during the survey banner display and again in attrition on who is willing to respond to any individual item and go through completing the entire survey. Weights could help the data be more representative – I would at least consider applying weights to the wiki level based on the Wiki comparison data, and potentially by geographic data on Wiki Stats. All of that said, regardless of the limitation of whether the data should be used for population estimates, it can still be very useful for in-group analyses (eg. comparing demographics on a sentiment question) and it’s nice to see it published for open use. - TAndic (WMF) (talk) 15:09, 13 May 2024 (UTC)[reply]

It would be interesting (at least to me) to see the results/analyses of the following questions from the survey:

  • As a reader, overall, how much time do you spend on Wikipedia searching for information, reading articles, etc., on average:
  • Do/did you discuss Wikipedia with...
  • Would you say that Wikipedia has ever contributed to changing your opinion on a political subject?
  • Overall, in your opinion, for the following areas, does Wikipedia have a neutral presentation of the various admissible points of view?
  • In your opinion, are the following statements about the development of Wikipedia true or false?
  • If Wikipedia disappeared, it would be, for you:
  • Do you know any Wikipedia contributors among your friends, family, or professional contacts?
  • What "hinders you" from contributing, or contributing more?

Anyway, thanks for creating those graphs and sharing some of the topline results! Some1 (talk) 00:08, 27 April 2024 (UTC)[reply]

HaeB: The issue with the survey is that the sample is non-random, so the results cannot be relied upon. It is not impossible that the self-selected participants represent a valid sample of the population, but there is no assurance that this is so. Very often, such a sample turns out to be skewed. Chiswick Chap (talk) 11:31, 28 April 2024 (UTC)[reply]

HaeB: I've found at least two attempts to randomize content (which might be easier than randomizing users). That they exist suggests that "state of the art" remains RCTs.

  • The first is by Aaron Halfaker of Microsoft, formerly WMF.[1]
  • The second I daresay had a hilarious hypothesis.[2]

References

  1. ^ Halfaker, A.; Kittur, A.; Kraut, R.; Riedl, J. (October 2009). A jury of your peers: quality, experience and ownership in Wikipedia. 5th International Symposium on Wikis and Open Collaboration. Association for Computing Machinery (ACM). pp. 1–10. doi:10.1145/1641309.1641332 – via Penn State. For our analysis, we used a random sample of approximately 1.4 million revisions attributed to registered editors (with bots removed) as extracted from the January, 2008 database snapshot of the English version of Wikipedia made available by the Wikimedia Foundation.
  2. ^ Thompson, Neil; Hanley, Douglas (February 13, 2018). "Science Is Shaped by Wikipedia: Evidence From a Randomized Control Trial". MIT Sloan Research Paper No. 5238-17. Social Science Research Network. doi:10.2139/ssrn.3039505. From 2013-2016 we ran an experiment to ascertain the causal impact of Wikipedia on academic science. We did this by having new Wikipedia articles on scientific topics written by PhD students from top universities who were studying those fields. Then, half the articles were randomized to be uploaded to Wikipedia, while the other half were not uploaded.

-SusanLesch (talk) 13:41, 4 May 2024 (UTC)[reply]

  • The abstract says, "The survey includes 200 questions about..." and the instructions to respondents say, "It will take you from 10 to 20 minutes to complete it." The survey questions are in a form linked from the meta page but not easy to browse as they are locked in that interface. It is not clear to me which respondents got served which questions, but obviously study design to answer 200 questions in 10 minutes needs explanation. There is no paper so not currently a way to understand methods, right? Bluerasberry (talk) 16:45, 30 April 2024 (UTC)[reply]
    The survey questions are in a form linked from the meta page but not easy to browse - but they are also reproduced in the codebook.
    There is no paper - excuse me? The dataset paper mentioned and cited in the review does discuss methods. It is not clear to me which respondents got served which questions - Tables 1 and 2 in the paper provide detailed information about how many respondents got how far in the survey in which language (we should be so lucky to get that level of detail in every report about a Wikipedia survey).
    As for the duration, that's an interesting question, but honestly this wouldn't be the first survey to make over-optimistic promises about how long it takes to complete it.
    Regards, HaeB (talk) 13:45, 1 May 2024 (UTC)[reply]