Advertisement
Guest User

mastodon dataset ethics email

a guest
Jan 12th, 2020
187
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.09 KB | None | 0 0
  1. To:
  2. committee.etico@unimi.it, matteo.zignani@unimi.it, christian.quadri@unimi.it, alessia.galdeman@studenti.unimi.it, sabrina.gaito@unimi.it, rossi@di.unimi.it
  3.  
  4. Subject:
  5. Privacy problems with paper "Mastodon Content Warnings" public dataset
  6.  
  7. Text:
  8. Hello,
  9.  
  10. I am writing in regards to the recently published paper Mastodon Content Warnings: Inappropriate Contents in a Microblogging Platform by Zignani,  Quadri, Galdeman,  Gaito, and Rossi.
  11.  
  12. It is my opinion that the publication of the associated dataset (currently available at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/R1HKVS) is a significant failure of research ethics. 
  13.  
  14. This corpus of posts was scraped and published with grossly insufficient regard for the consent and privacy of Mastodon users. Prior scraping of Mastodon servers by the Internet Archive and others has been met with widespread outcry among users. Additionally, the platform is intentionally designed such that old posts are not easily mass searchable. The scraping and publishing of posts is both an invasion into the personal privacy of individual users and a major violation of community norms. 
  15.  
  16. While the authors claim that this dataset was anonymzed, this claim is trivially shown to be false. There is zero obfuscation of the usernames present in the URI associated with every single post in the dataset. Additionally, no attempt has been made to obfuscate personally identifiable information within the content of each post, including usernames.
  17.  
  18. I myself am a user of this platform. A simple text search of my username resulted in many dozens of results within this dataset. At no point did I consent to the publication of any of this content outside of the platform itself. Any claims that this dataset has been anonymized are simply not true.
  19.  
  20. While nothing can be done about the private information contained in the hundreds of downloads of this dataset that have already occurred, de-publication and deletion of this data is a necessary measure to remedy this breach of ethics and privacy.
  21.  
  22. I look forward to your responses.
  23.  
  24. Thank you,
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement