The Importance of Data Ethics

Imagine this scenario: you’re in a bustling grocery store, maneuvering a cart filled with snacks and study supplies for the week. You spot your favorite brand of chips on the shelf. As you reach out to grab it, you notice a surveillance camera 📹 pointing at you.

Suddenly, it’s not just about you picking up a bag of chips anymore. It’s about what that surveillance camera observes – the way you move, the items you pick, your outfit, and perhaps even your facial expressions. Picture all this information being collected, stored, and maybe even shared without your express consent or awareness. How does that make you feel?

Now, let’s translate that feeling into the digital world, where data about our behaviors, preferences, interests, and even friendships is collected on a far larger scale. We’re here today to talk about data collection, a fundamental part of any data project. But we’re going beyond the ‘What’ and ‘How.’ We’re delving into the ‘should’ 🤔.

Today, we’re going to discuss why you, me, and all of us should care about the ethics of data collection. Just like that surveillance camera in the grocery store, data collection has implications on our privacy, safety, freedom, and dignity. Whether you’re an aspiring programmer, an entrepreneurial student, a social media user, or just someone who enjoys online gaming, this is a conversation that affects us all 🗣️.

 

Why is Data Ethics Important?

Let’s think about it this way: imagine you’re on a basketball team, and your coach has a video camera. They record every game and practice, promising to use the footage to help you improve your shots. But what if they started sharing those videos online without your permission? Or used them to make fun of you? That would feel unfair, right? That’s why data ethics is important. It’s the difference between using your “game footage” to help you or harm you.

Here’s what we need to keep in mind:

  • Respect for privacy: Just like you wouldn’t want your coach sharing videos without permission, we shouldn’t collect, store, or use personal data without asking first. It’s like knocking on someone’s door before entering their room.  An essential part of the data collection is ensuring that Personally Identifiable Information (PII) is only gathered and stored with explicit consent.
    • PII refers to any data that could potentially identify a specific individual. This can include straightforward information like a person’s name, physical address, email address, or telephone number. More sensitive PII can also include social security numbers, driver’s license numbers, bank account numbers, passport numbers, and biometric data like fingerprints or retinal patterns.
  • Accountability: If your coach makes a promise to only use the videos to help you, they should keep that promise. Accountability implies that those who collect and handle data should be responsible for their actions. It means that we need to properly track who has access to the data, what they’re doing with it, and why. In the data world, we need to make sure we only use the data the way we said we would, and that we track who has access to the data, what they’re doing with it, and why.
  • Accuracy and integrity: Ethical issues such as data fabrication or falsification can undermine the trustworthiness of the data and any conclusions drawn from it. Imagine if your coach only showed videos of you missing the shots or edited the video so that some of the shots were not yours. That wouldn’t be the complete truth, would it? If data is collected unethically (e.g., bias in sampling, inappropriate data handling procedures), it can lead to distorted results and unreliable insights.
  • Legal compliance: Just like there are rules in a basketball game, there are laws and rules we need to follow when handling data (data collection, storage, and processing).  Adhering to these rules reduces the risk of legal issues such as:
    • Privacy and confidentiality: These laws require explicit consent for data collection and provide individuals with certain rights over their data, such as the right to access, rectify, delete, and port their data.
    • Data security: There are legal obligations for organizations to protect the security of data, especially Personally Identifiable Information (PII) and sensitive personal data.
    • Data ownership: The question of who owns data can bring up legal issues, particularly when data is shared between different entities.
    • Data accuracy: Individuals have the right to have inaccurate personal data corrected.
    • Bias and discrimination: If data or data analysis techniques result in unfair or discriminatory outcomes, legal issues can arise.
    • Data monetization and consumer protection: There are ethical and legal considerations when data is used for profit.
    • Failing to handle data ethically can lead to reputational damage and loss of trust among consumers or users.
  • Fairness and bias: Imagine if your coach only focused on players with blue shoes? That would be unfair, right? Similarly, when collecting data, we should make sure we consider everyone, not just a select few. Unethical practices in data collection and preparation, such as over-sampling a particular group or not adequately considering the diversity of the population, can introduce bias into the dataset. This bias can then skew results and make the conclusions less accurate or even misleading.

 

How Do We Use Ethics When Dealing with Data?

  • Seek informed consent: We get explicit consent before collecting data. And make sure subjects are aware of and understand the purpose of data collection, how their data will be used, stored, processed, and that they have given their explicit consent.
  • Minimize data: We only gather what we need, reducing the risk of infringing on individuals’ privacy.
  • Maintain data quality: Use accurate methods for data collection and ensure that data is reliable and valid. This can involve cross-checking the data, ensuring the sample is representative, and keeping track of any transformation applied to the data.
  • Secure the data: Just like locking the door to protect valuables, we protect data from those who aren’t supposed to see it. When storing data, anonymize it to ensure individual privacy. This can involve removing PII or using techniques like differential privacy. 
  • Be transparent: We’re clear about what we’re doing with the data.  Maintain a thorough record of how the data is collected, stored, and prepared. This documentation should be available to all stakeholders.

 

Ways to Avoid Common Data Ethics Mistakes

  • Lack of informed consent:
    • This could lead to legal issues and damage to reputation.
    • Countermeasure: Develop clear and concise consent forms and ensure participants understand them before they give their consent.
  • Over-collection of data:
    • This could lead to privacy concerns.
    • Countermeasure: Apply data minimization principles and only collect data that is necessary.
  • Poor data security:
    • This can lead to data breaches.
    • Countermeasure:  Use robust anonymization techniques and avoid collecting unnecessary sensitive information as a countermeasure. Regularly review and update your data security measures, use encryption, and limit who has access to the data. 
  • Lack of transparency:
    • This can lead to a loss of trust from participants and the public.
    • Countermeasure: Be clear about your methods and use open-source tools when possible to allow for external scrutiny.
  • Ignoring bias:
    • This can skew results and lead to unfair outcomes. 
    • Countermeasure: Ensuring the data collection process is comprehensive and representative of all relevant groups. Stratified sampling methods can be used to ensure adequate representation from all groups.

Remember, as data detectives, it’s our responsibility to handle data correctly. After all, good ethics make a great detective! 

 

Leveling Up with Data: Emily’s Ethical Data Quest on Video Gaming Habits

The moment her Statistics teacher announced the end-of-semester data project, Emily knew exactly what her focus would be: video gaming habits among high school students. As an avid gamer herself, Emily was curious to uncover patterns and trends among her peers.

She started by crafting a comprehensive survey that included questions about favorite genres, time spent gaming, preferred platforms, and even the potential impact of gaming on school performance. But before hitting the ‘send’ button on her survey link, Emily paused. She remembered the lessons from her teacher on data ethics.

First, she needed to respect her peers’ privacy. Adding a preamble to her survey, she explained that all responses would remain anonymous and no personally identifiable information would be requested. She assured her respondents that their answers wouldn’t be shared outside the scope of the project.

Knowing she was accountable for the data she collected, Emily created a secure Google account dedicated solely to her project. This would ensure that only she had access to the responses, keeping her promise of privacy intact.

Emily was also mindful of accuracy and integrity. She checked and double-checked her questions, ensuring they were unbiased and clear. She also decided to include an optional field for any additional comments or experiences her classmates might want to share.

To ensure legal compliance, she reviewed her school’s code of conduct regarding surveys and research involving fellow students. She made sure to follow the school’s guidelines to the letter, also asking her teacher to review her approach before proceeding.

Emily was aware of the potential for bias in her data. To ensure fairness, she invited all students in her grade to participate, rather than only her circle of fellow gamers, allowing for a broad, representative sample of data.

Once the responses started rolling in, Emily was meticulous about storing and preparing the data. She removed any potentially identifiable information – like IP addresses – and stored the data on a password-protected, encrypted drive.

Emily’s project was a success. Her presentation highlighted intriguing trends about gaming among high schoolers, contributing to ongoing conversations about balancing screen time and study. But more importantly, she had successfully navigated the complex path of data ethics, demonstrating how the right approach to data collection can lead to trustworthy and valuable insights.

Emily’s journey was not just about the thrill of unraveling the mysteries hidden within the numbers. It was about a commitment to ethical data handling, a lesson that, like a video game’s high score, she would carry with her long after the project’s end.


Related Tags: