Data stewardship is described by DAMA International (a global data management community) as the accountability and responsibility for data and processes that ensure effective management and use of data assets. According to DAMA, the stewardship of data “can be formalized through job titles and descriptions, or it can be a less formal function driven by people trying to help an organization get value from its data.”
Data stewards manage and oversee an organization’s data assets in order to provide businesses with high-quality data that is consistently and easily accessible. Here’s a chart from TUDelft with a thumbnail representation of the components of data stewardship:
What is data stewardship?
The role of data steward has only emerged within the last decade, as the salience of data in our technical world (and, by extension, every other part of our world) has been increasing exponentially. Today, the quotidian and long-term strategic decision-making of most companies is informed largely or completely by data insights. And the need for data begets further need for data—data is used to create and improve contemporary technologies such as machine learning and artificial intelligence programs, and these essential programs are progressively dependent on data in order to function. In addition to being responsible for inventorying data, understanding how to access it, and keeping track of when it’s needed, data stewards can also be responsible for identifying and expressing methods of utilizing data in order to create a competitive edge in the market. Data is everywhere, and the more technology advances, the more necessary it is that we have the ability manage and maintain our understanding of it and its vast array of uses, as depicted in the graphic below:
Data stewards are responsible for understanding the nuances of data usage and security policies as assigned by enterprise data governance initiatives. Data stewards are also responsible for enforcing these policies, and for acting as middlemen between the IT department and the business department of a company. As such, data stewards much possess a skillset with both technical and business components, and must be well-versed in such disciplines as data warehousing, storage concepts, enterprise strategy, programming, and data modeling. Soft skills like communication and collaboration also come in handy for data stewards.
Data stewards in some organizations might be full-time stewards located within business units, while other organizations assign data stewardship responsibilities to employees who already have other responsibilities—stewardship, in this case, is the icing, rather than the whole cake. Two components of data stewardship are coordination and correction: The former involves tracking the movement of data intra-organizationally, while the latter is about interpreting and enforcing intra-organizational policies on data usage.
Businesses use data stewards for a myriad of other functions, including but not limited to: complying with industry regulations; reducing risk (data security and privacy); improving processes related to data quality and managing metadata; managing data from a variety of sources; defining policies and processes surrounding intraorganizational data; defining roles and responsibilities for data; improving data documentation; performing various types of data analytics with more accuracy and efficiency; guaranteeing the quality of data gathered, stored, and used; documenting the rules of data collection, storage, and use; sharing, protecting, defining, archiving, accessing, and synchronizing data; remediating data and data-related issues and problems; operationalizing data governance; ensuring access to the right data (private, corporate, sensitive, etc) by the right people at the right time; helping to create processes and procedures for data collection, storage, use, and security; and accomplishing the consistent use of data management as an effective resource.
Why do we need to steward our data?
It’s impossible to explain why data stewardship is important without emphasizing why data itself is important. Data is, essentially, information collected in order to support intraorganizational decision-making and strategy. The Council on Quality and Leadership created a comprehensive list of all the reasons why data is an essential component of our modern world and everything data can do to aid an organization, including: improving people’s lives, helping companies make informed decisions, keeping molehills from turning into mountains, helping companies achieve desired results, finding solutions to problems, backing up a company’s arguments, keeping companies from making decisions based on guesswork, being more strategic in approaches, understanding what’s working, keeping track of baselines, benchmarks, and goals, making the most of financial assets, and accessing all possible resources (you can check out the article here for more information on how data helps companies achieve these aims).
If your data is sensitive (and most organizations possess swathes of data they’d certainly consider sensitive), you probably don’t want it falling into the wrong person’s lap. In fact, even if your data isn’t sensitive, it’s likely that you don’t want it falling into the wrong person’s lap. When organizations play fast and loose with their data, it’s possible for data to get lost—say a human resources employee is writing a report on a recent series of raises given out to select other employees in a company. If that data is uploaded onto a physical device such as a hard drive, and the employee misplaces it, this sensitive data now belongs to anyone and everyone who can find that hard drive. Had the information been entrusted to the digital domain of a data steward, the steward would have been the sole arbiter of who had access to that data and when.
Data stewards set up data security specifications and manage data sources for data warehouses. They analyze data, look for problems and glitches, and recommend methods a company can use to improve the quality of its data. A data steward effectively takes ownership of a company’s data, and then has authority over the data’s distribution in terms of time, place, and personage. The data a company uses is generally as important as the product it sells—would you leave the door to your grocery shop unlocked all night for thieves of cauliflower and organic strawberries and gluten-free bread to take their pick? Then how can you justify leaving your data unstewarded?
How is data stewardship related to data governance?
The Data Governance Institute defines data governance as “a system of decisions, rights, and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.”
Essentially, data governance is an umbrella term referring to a set of various practices and processes which are meant to ensure the formal management of an organization’s data assets. These practices include data stewardship, data quality, and others. Quality data governance helps organizations establish control over data assets, including methods, technologies, and processes for the correct management of data. Data governance also involves security and privacy, integrity, usability, integration, compliance, availability, and roles and responsibilities regarding overall management of an organization’s internal and external data flows.
Data governance is all about understanding what, exactly, your data means, and where that data is stored. It should help an organization’s leaders comprehend all of that organization’s data assets. Additionally, by assigning permissions and procedures to different people, data governance allows an organization to keep its employees accountable and understand who is responsible for which specific sets of data.
Data governance also aids organizations in standardizing their data, and making their processes and discoveries more consistent and trackable. Data quality refers to how useful and comprehensive data is, and data governance refers to where the data is and who is responsible for what data. Data governance improves data quality, because knowing where data is and who has control of it is essential to detailed improvement.
Some (but not all) of the benefits of data governance are as follows:
- Lower costs in other areas of data management
- More accurate regulation and compliance procedures
- Greater revenue growth
- Greater operational efficiency
- Greater ability to standardize data systems, data policies, data procedures, and data standards
- Increased value of an organization’s data
Which employees need to understand data stewardship principles?
Everyone needs to understand data stewardship principles. Most organizations have plenty of data amassed, but without cross-organizational understanding of what data and data stewardship are and how they help an organization, employees are liable to interpret and use data incorrectly, defeating the purpose entirely and often resulting in disaster.
Different platforms use different definitions to explain data—one platform within an organization might use an acronym that describes something entirely different when used on another platform. Messages are sent from different platforms based on each individual platform’s vocabulary, making it easy for employees to end up with incomplete or inaccurate information. Understanding data stewardship is a key component of data literacy, the ability to read, write, and communicate data in context.
If you want to be a truly data-driven company (and you should, because data never lies), all of your employees should have basic knowledge of what data science, data engineering, and data analysis mean, as well as how they help drive an organization further towards its goals. All employees should have a comprehensive understanding of data and data stewardship—what’s what, where’s what, and who owns what—across the board.