Let’s summarize by saying that a lot of huge technological changes happened which escalated big data volumes, variety, and velocity. Around 2011 the term “Data Engineer” started to crop up in the circles of new data-driven companies such as Facebook and AirBnB. Sitting on mountains of potentially valuable real-time data, software engineers at these companies needed to develop tools to handle all the data quickly and correctly.
The term “data engineering” evolved to describe a role that moved away from using traditional ETL tools and developed its own tools to handle the increasing volumes of data. As big data grew, “data engineering” came to describe a kind of software engineering that focused deeply on data – data infrastructure, data warehousing, data mining, data modeling, data crunching, and metadata management.
Why the Critical Need for Data Engineering Now?
By now you’ve heard/read about Gartner’s determination back in 2017 that 85% of big data projects fail. This was largely due to a lack of reliable data infrastructures. Data could not be trusted enough to base key business decisions on it. Fast forward to 2019 and things had not improved. The CTO of IBM said that 87% of data science projects never make it into production. Gartner reiterated its prediction that now just 80% of projects would fail. A New Vantage Report produced similar stats.
So why is this?
Over the last decade, most companies have completed a digital transformation. This has produced unimaginable volumes of new types of data and much more complicated data at a higher frequency. While it was previously apparent that Data Scientists were needed to make sense of it all, it was less apparent that someone needs to organize and ensure this data’s quality, security, and availability for the Data Scientists to do their jobs.
So in the early days of big data analytics, Data Scientists were very often expected to build the necessary infrastructure and data pipelines to do their work. This was not necessarily in their skill sets or expectations for the job. The result was that data modeling would not be done correctly. There would be redundant work and inconsistency in the use of data among Data Scientists. These kinds of issues prevented companies from being able to extract optimal value from their data projects, so they failed. It also led to a high rate of Data Scientist turnover that still exists today.
Today with the onslaught of completed corporate digital transformations, the Internet of Things and the race to become AI-driven, it is crystal clear that companies need Data Engineers in abundance to provide the foundation for successful data science initiatives.
This is why will we continue to see the role of Data Engineers grow in importance and breadth. Companies need teams of people whose sole focus is to process data in a way that allows them to extract value from it.
What is the Relationship and Difference between Data Scientists and Data Engineers?
Much has been written about the relationships between these two roles, so we’ll be brief. In the past, companies thought that they could get away with having Data Scientists do the role of Data Engineers. This is what has caused much of the “unicorn effect” and shortage in Data Scientist recruitment.