February 14-18, 2022! #LoveData22
February 14-18, 2022 | #LoveData22
Data is for everyone! Wait ... data are for everyone? Either way, Love Data Week 2022 is about how different folks use data. If you haven't participated before, Love Data Week is the international celebration of data. This year the event is focused on the people side of data. What does data look like in different disciplines? How about biases in data... who is "in" the data and who is invisible?
Learn more at https://myumi.ch/ICPSRldw2022events.
In honor of International Love Data Week, the CCSS Executive Director and Data Science Fellows reflect on the power of data. Read their insights and reflections on data justice, the power of data, data-caused headaches and data-caused highs, and using data to undo harassment on social media. Read them below:
By: Remy Stewart, PhD Candidate, Sociology
As much as I love your uniqueness, text data, you’ve certainly given me a lot of headaches. When I started to specialize in natural language processing (NLP) as a researcher- referring to the use of computational tools to analyze written language- I really didn’t understand the roller-coaster relationship I was getting into. The highs are delightful, and lows are hair-pullingly frustrating. Time and time again though, I’m so glad that I’m able to work through our challenges together.
I originally became interested in text as data since I saw it as a wonderful opportunity to draw from my mixed-methods research background by quantitatively analyzing language to uncover emergent qualitative themes and patterns expressed within writing. I was enthralled by the potential to incorporate innovations from data science and machine learning with classic social science methods to make new discoveries regarding human behavior and social systems. Digging into real text-based research, however, made me quickly learn the difference between the romanticism of exciting research powered by digital text data versus what it would entail to make said discoveries actually happen.
Text sourced from online platforms such as social media sites that I use within my work is often filled with unwanted noise such as HTML code, random characters, or excessive punctuation, and it therefore requires a lot of data cleaning. Natural language processing methods are often computationally complex to run, and I can’t even count the number of times my hours-long analysis ended up crashing in the end. What makes me keep coming back to text data, however, is just how wonderful it is when an analysis comes together despite all the inevitable setbacks and bugs in the code along the way. I’ve been able to learn about how individuals who use digital platforms navigate personal experiences, politics, social inequality, and community through the insights embedded in their writing. Text data and the methods that allow me to computationally process and analyze it make me lucky enough to be able to understand these perspectives from a much larger number of voices than what I’d ever be able to read just on my own.
The data we explore in the social sciences are often representations of common human experiences. I personally believe that text data is a particularly special mirror of our social world, as it’s a snapshot of a person’s own thoughts and ideas expressed through their writing. For that, text data, I promise to keep working through our rough patches together with you. I know it always ends up being so worth it in the end.
P.S. – Interested in sparking up your own special relationship with text-based data? Check out my Natural Language Processing 101 workshop this March- you can learn more about it here!
By: Aishat Sadiq, PhD Student, Human Ecology
Growing up, “where are you from?” was the most anxiety-inducing question you could have asked me. My two Nigerian parents and I traipsed across the Caribbean and the States in my early years. My formative years were spent in predominantly white magnet schools in Texas. In name, in culture, and race, I was othered. However, this sense of otherness drove me to build community with similarly displaced and marginalized individuals, regardless of the difference of adversity. Through finding answers in every place and individual I have developed a connection with, each part of my identity is now more wholly defined in its relation to my person.
James Baldwin once said, “if I love you, I have to make you conscious of the things you don’t see.” Today, I thank you, Data Justice. Thank you for being more than words on a page, for showing me that science is made up of ALL the people who contribute to it. Thank you for being a witness to those who live within the intersections of our society. Thank you for sharing our vulnerability, our bodies, our knowledge, and our experiences with the world with humanity and care.
At times, I grow frustrated with the progress we have yet to make. As another close woman of color recently told me about a different injustice, “It’s not the best, but it’s better than nothing and it’s better than putting myself in harm just to try and prove that there’s goodness in the world.” Change moves at the pace of relationships and relationships move at the pace of trust. That change is going to take time and sustainable progress, so who am I to rush it? It’s not in my power, nor is it my duty. Because of you and the thoughtful researchers who work with you, I hope that the harm between the scientific community and marginalized people begins to mend. I hope, through time and trust, we are able to build a table for all of us to equitably share from.
Moving forward, I plan to grow deeper in my relationship with you and learn how I can better myself as a researcher and global citizen.
By: Aspen Russell, PhD Student, Information Science
Our journey began with an audacious goal: to solve harassment on Twitch, a major live streaming platform. With the brash mind of an undergraduate and an abnormal training, computer science and gender studies, I started trying to find you. There were a myriad of questions that overloaded me. On a platform of billions of comments, millions of users, and thousands upon thousands of live streamers, where do I begin? Who matters, how many matter, and what features matter? You proved elusive. At the time, there weren’t many examples of how to contact or work with you, let alone academic examples of your true nature. This led me down a path to various GitHub repositories and suspect developer forums. Eventually I found a way to contact you. An unofficial application programming interface (API) made by a European hobbyist. It allowed me to ask for your help. Comments, users, streamer IDs, timestamps, emojis, and more. All fractured. I chose to collect it all. I got a small sample of popular women streamers and an equivalent sample of men streamers. Based on my intuitive sense of how the internet handles gender, I dove in.
Truth be told, I wish the state I found you in was… cleaner. But I finally had someone to work with! There were so many remnants of your old home. HTML, converted emoji icons, and other strange symbols around and in between comments. Some of these issues were fixed by leaning on the hand-crafted packages created by the community around me (at the time, it felt like magic), others were annoyingly manual. Investigating word frequency, common word misspellings, stop words, and more. Trial and error. All of these steps made your form more manageable and legible. They also brought meaning and context to the almost 200,000 comments. I felt like I really understood the dynamics at play. From there I was able to refine and filter highly specific information: the most unique words for each streamer. The results were disturbing but expected. The popular women were primarily receiving comments that disparaged their bodies and abilities, whereas the men’s comments focused on entertainment value and skill.
What our journey revealed to me was the beauty in the mess. I reset my expectations, took each step iteratively, and learned to keep the limitations in mind. Imperfections are perfectly manageable if you know where they come from and why they aren’t addressed. I was able to take the fractured whole of something, the comments and metadata from sixteen streamers, and learn something from it all. Something that would be helpful to community health researchers, developers, and policy makers. I chose to use the computational approach to refine a sample for interpretive research. To me, the beauty of data-informed research is that data can reveal interesting phenomena, test a hypothesis, complement qualitative findings, and reveal spaces for qualitative inquiry. Regardless, the necessity for interpretation is clear. Data are not neutral. Our acquisition, alteration, and interpretation of them form knowledge.
Interested in using APIs for social science research? Check out my workshop series here!
By: Kimberly Williamson, PhD Candidate, Information Science
Hey. I’m writing this letter because I think it’s finally time we talk about some things. I love you. But data… you don’t love me back the way you should. For years, I’ve said with excitement “I love data!” And I DO! I love all of the insights and decisions I can make with data! The stories we can tell together! But data, you don't reciprocate that love. You can describe an entire society, but historically, you haven’t always included people like me. So how can I keep loving data when it does not love me back enough to represent me? How can I reconcile loving the possibilities data offer, when data doesn’t include everyone in that future? I’ve dedicated my professional career to understanding you better and leaning on you whenever my ideas needed support. I wish I had noticed earlier, that our relationship was one-sided but alas, here we are.
I’ve asked and asked, “Why can’t you see me like you see others?” How have I invested so much in you, but continue to see how I and other minoritized individuals get lifted out of a story? I’ve seen parts of you dropped because there wasn’t enough of a group to bother remembering the count. Why is the solution to ignore entire populations? I know that you have given me much, I mean c’mon, I study Information Science, I don’t hide my love for data. But somewhere along the way, I let you become a part of me and never demanded that I be a part of you. You see, relationships are mutual, reciprocal, a give and take. I’ve learned so much from you, but did I teach you too? Did I expect that you know me as much as I know you? I suppose I did expect it, but when you let me down I thought, well, [shrug] it’s hard to know me. That’s okay, I guess you tried. But I was wrong.
Because now I know, this love is one-sided. It turns out, I have all the power! I can be creative in how I collect data. I can find new methodologies that value tiny populations and reflect the complexities of our identities. I can make data see everyone. And you, and all the other researchers, can too. The first step is loving yourself and acknowledging that you belong in the data and should be included. I swallowed the old adage about “statistical significance” and forgot that those standards were created by unimaginative scholars eager to maintain the value of the “majority.” But no more! I love my complexities enough to know that we need to account for that experience in the stories we tell with data. The second step is sharing this commitment. Talking to scholars, researchers, data storytellers, and anyone else who will listen. Collaborating and cultivating interdisciplinarity to solve these holes in our data that we have long resigned to complacency. And finally, applying innovative new solutions to your data. Loving your data by making it better. So, you can do it, you can love your data too! But take the steps for your data to love everyone.
By: Claudia von Vacano, Executive Director, Cornell Center for Social Sciences
As I reflect upon the tumultuous love affairs you have had with others, I can’t help but think that you are a thorny rose and filled with contradictions.
No one can deny your beauty—when you show yourself through visualizations, intricate networks that boggle the mind—you are most attractive. The patterns that you allow to be revealed are both utilitarian on the most basic level but also mind-blowing and lead to exciting results. I love the way you hide parts of yourself until I ask you the right sets of questions, and then you reveal the true nature of who you are. In a word, you are “complex,” but I can think of many other words to describe you, “multifaceted,” and even “two-faced."
You do anger me at times--when you allow yourself to be manipulated. To be honest, I dislike your name, plural but singular, very misleading with an air of finality and objectivity when you are truly illusive.
Well, in this letter, I aim to set you on the right path. From now on, Data, I want you to be open, have equitable distribution, equitable design, ethics, and promote justice. To be honest, you are pretty biased, and frequently you allow people to say problematic things with your backing.
You will reveal who you represent and why across different datasets and platforms. You will be designed in inclusive ways that minimize bias and maximize transparency. You will respect privacy and give yourself back to the people you drew your information from. I will not permit you to aggravate social biases, sexism, or racism. But, most importantly, you will address social injustices, not perpetuate and accentuate them.
Together, there are no limits to what we can do with a solid ethical foundation. We can introduce sound theoretical frameworks to challenge your assumptions. We can measure complex variables on a continuous interval spectrum and augment your potential through supervised deep learning models. As we decompose constructs of interest, we can model and solve major problems confronting our world.
But, Data, as others have told you, you will need to take a long hard look at yourself.
All my love,
P.S. Read about using data to measure hate speech.