Data Integrity: The Foundation for Trustworthy AI/ML Outcomes and Confident Business Decisions

4 min readApr 28, 2023

Editor’s note: Tendü Yoğurtçu, PhD is a speaker for ODSC East 2023 this May 9th-11th. Be sure to check out her talk, “Power trusted AI/ML Outcomes with Data Integrity,” there!

Due to the tsunami of data available to organizations today, artificial intelligence (AI) and machine learning (ML) are increasingly important to businesses seeking competitive advantage through digital transformation. That’s why over 75% of enterprises prioritize AI and ML over other IT initiatives, and 94% of business leaders believe AI is critical to success over the next five years.

But before AI/ML can contribute to enterprise-level transformation, organizations must first address the problems with the integrity of the data driving AI/ML outcomes. The truth is, companies need trusted data, not just big data. According to a Data Trends survey, almost half of newly created data has at least one critical error. That’s why any discussion about AI/ML is also a discussion about data integrity.

Let’s explore the elements of data integrity, and why they matter for AI/ML.

As critical data flows across an organization from various business applications, data silos become a big issue. The data silos, missing data, and errors make data management tedious and time-consuming, and they’re barriers to ensuring the accuracy and consistency of your data before it is usable by AI/ML. Data silos prevent the inclusivity of all relevant data for advanced analytics, often causing bias in AI.

Silos need to be broken down, and data needs to be integrated, standardized, deduplicated, and validated — at which point it can be considered of high enough quality to feed the AI/ML pipelines. These are critical steps in ensuring businesses can access the data they need for fast and confident decision-making. As much as data quality is critical for AI, AI is critical for ensuring data quality, and for reducing the time to prepare data with automation.

Data quality also works hand in hand with data governance. Trust in data comes from knowing how the data’s been prepared, where it came from, its auditability, and its rights management.

Additionally, adding third-party data for critical context to the organization’s internal data can make data more meaningful, improve insights, and reduce bias in AI outcomes.

Whether it’s an insurance company leveraging location for better underwriting or risk assessment, a financial services organization enriching transactions for validation and accurate merchant assignment, or a telecommunications company optimizing 5G rollouts and creating new services, there’s one essential commonality: location data.

How does this all tie into AI/ML? The location and other 3rd party data assets add critical context and improve the outcomes from data models and predictions.

For instance, in the environmental, social, and governance (ESG) initiatives, automating the ESG data supply chain, and making recommendations for data enrichment such as with wildfire data, demographics data, or with datasets for underrepresented groups, is essential to help organizations remove bias from their data.

Furthermore, data enrichment can help ensure that AI algorithms are trained on diverse data, reducing the risk of bias. Adding datasets for underrepresented groups can help ensure that AI algorithms are not perpetuating any preexisting biases. For example, if an organization is using AI to make hiring decisions, it’s important that the algorithm is trained on a diverse range of resumes to avoid perpetuating biases based on race, gender, or other factors.

In conclusion, data integrity is essential for successful AI implementations and for making informed decisions, and achieving success in today’s data-driven world. By breaking down data silos, ensuring data quality, and incorporating location intelligence and data enrichment, organizations can improve data integrity, reduce bias, and derive trusted business insights from AI and ML.

About the author/ODSC East speaker:

Tendü Yoğurtçu, Ph.D., is the Chief Technology Officer (CTO) at Precisely. In this role, she directs the company’s technology strategy and innovation, leading all product research, and development programs.

Prior to becoming Chief Technology Officer, Tendü served as General Manager of Big Data for Syncsort, the precursor to Precisely, leading the global software business for Data Integration, Hadoop, and Cloud. She previously held several engineering leadership roles at the company, directing the development of the Integrate family of products.

Tendü has over 25 years of software industry experience, with a focus on Big Data and Cloud technologies. She has also spent time in academics, working as a Computer Science Adjunct Faculty Member at the Stevens Institute of Technology.

In 2019, Tendü was named CTO of the Year at the prestigious Women in IT Awards, and in 2018 was recognized as an Outstanding Executive in Technology by Advancing Women in Technology (AWT).

Tendü received her Ph.D. in Computer Science from Stevens Institute of Technology, NJ, a Master of Science in Industrial Engineering, and a B.S. in Computer Engineering from Bosphorus University in Istanbul.

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Subscribe to our fast-growing Medium Publication too, the ODSC Journal, and inquire about becoming a writer.

Data Integrity: The Foundation for Trustworthy AI/ML Outcomes and Confident Business Decisions

Written by ODSC - Open Data Science