There’s a lot of data out there these days. Thanks to the Internet of Things, it seems like everything is collecting data – your smartphone, your laptop, maybe even everyday appliances in your home, such as your toaster. But what use is all of this data? In recent years, companies have turned to data analytics powered by Artificial Intelligence (AI) to take advantage of big data. These AI applications seek to make sense of it all through computationally intensive data analytics, driving novel insights which can be used to improve anything from our day-to-day lives to company executives’ decisions.
But what does this data all really mean, and how can we make sure the insights we are gleaning from it are truly real and accurate enough to drive decision-making?
At this chaotic new intersection of AI and Big Data, one fact is emerging, and it’s that data quality is important. In fact, AI is only as good as the data collected. Whether you have a small repository or troves of data, ensuring that this data is of the best quality should always be paramount.
As you will learn in this article, the quality and relevance of data is vital to AI systems. Organizations can use the most state-of-the-art algorithms, but if the data fed into the algorithms is questionable, the AI algorithms will still fail to produce any real, actionable insights.
The Problem: Garbage In, Garbage Out When it comes to AI-powered analytics, the old adage from computer science rings true: “garbage in, garbage out.” Data quality is even more important than data volume. If you use bad data, the algorithm will churn out incorrect results. If you don’t give the algorithm all of the data it needs to be successful, the end result is not going to bring you the useful insights you are expecting. That’s why good quality data is so important.
The opposite also rings true – that with better data you can get better, more actionable insights from AI algorithms. The benefits of obtaining the best quality data include better decision-making and improved productivity. Better decision-making is a result of the fact that high quality data that is crunched through algorithms will have a more accurate output, which serves to both decrease risk and improve efficiency of the system. When outputs are reliable, there is no guesswork, so you can be more confident that your algorithms are finding the right answers to drive your organization forward.
AI in the Recruitment World: Why the Data Matters Recruitment is a great example here because there is such limited data on networking sites such as LinkedIn. So far, AI in recruitment has delivered little value, and companies who purport to use AI typically don’t have enough information for predictive models to work well. As a result, what these companies do deliver is limited because the core data is flawed.
Let’s say an AI system combs through several LinkedIn profiles to look for the best diabetes researchers at top institutions in the United States that have specific expertise in a given research protocol. For example, if you are looking for diabetes researchers to work at your pharmaceutical company, you may want to find people skilled in performing preclinical studies of candidate drugs to lower insulin levels in laboratory animals. But because websites such as LinkedIn rely on user input, and these biotech professionals are busy, it is unlikely that they have written a list of all assays and laboratory methods in which they are skilled on their LinkedIn profile, if they even have a LinkedIn profile. Many people in this type of time-consuming work may be too busy to develop a significant web presence for themselves, which would make recruiting this type of talent exceptionally difficult. As a result, it would be impossible to find preclinical diabetes researchers for your pharmaceutical company this way. AI applied to the incomplete, user-reported data typically found on LinkedIn can only be as insightful as the data available on LinkedIn itself. Therefore, one must look to other data sources and solutions to obtain good quality data for recruitment.
This problem is not unique to the biotech industry. The same issues can be observed across all industries, job functions, and professions. People are too lazy or too busy to fill out their profiles, or they don’t know the technologies and skills to list. Therefore, networking sites such as LinkedIn tend to lack the types of data that AI models need to deliver the candidates that recruiters are seeking.
The Solution: When It Comes to Data, Focus on Quality as Much as Quantity It’s important for companies to question the value associated with the claim that “multiple sources of data” truly make the output better. Remember that, as we just discussed above, even a single data source which is of high quality can yield much more insight than many channels of incomplete data. Instead, ask your vendors the following questions:
How much data is being collected?
What are the sources of said data?
How frequently is their data aggregated and refreshed?
How do the data sources help the AI model deliver what we need?
How does this solution address my specific issue?
Recruiters should ask themselves additional questions about potential AI-based recruiting software vendors, such as:
Will the data that is being collected truly help me evaluate candidates’ qualifications, interest, or availability?
Is there enough good-quality data on each candidate to enable the vendor to deliver on their promises?
Remember that any company claiming to use LinkedIn profiles for their AI recruiting solution is likely not getting the AI insights and efficiency – in other words, results – that you need. LinkedIn is an incomplete data source to power AI for efficient and effective decision-making for potential hires.
Conclusion There is a lot of data out there for recruiters, but what matters the most about this data is not its volume, but its quality. Vendors who claim to use AI on incomplete or poor sources of data are likely not giving customers the decision-making insights needed for successful hiring. Better data, while it can be more time-consuming, costly, or more complex to obtain, leads to cost savings and maximized productivity in the long run. As a result of using better data to drive AI solutions, staff can spend less time validating and fixing data, or perhaps even trying to extrapolate missing data. This leaves more time for staff to focus on their core company objectives – including better and more targeted marketing – while the computer does the ‘thinking,’ so to speak. Therefore, the benefits of obtaining good quality data far outweigh the disadvantages.