Start Collecting More Data & Enriching It to Have Better Results with Predictive AI
Predictive AI relies on training algorithms with data, and often the data that is used by the algorithm is actually derived from your raw data into a set of "features". Let's explore why you want collect and enrich your data to make this process more impactful.
For context, the process to create features from your existing data involves tasks like:
Changing the shape so a single column becomes many columns or many rows are aggregated into a single cell
Handling empty or errant values to fill in a default or calculated value based on your other data
Resolving outliers that are edge or exceptional cases that you would not want to model, as these fall outside of your normal business processes
Even with those activities, you still need to have data to work with. Without a variety of data to use, you end up limited in what you can do, and your predictive AI models may not perform as well as you would hope.
Some industries have a head start collecting more data because they need it from a regulatory or compliance perspective.
For example, many industries in financial services require knowing a customer's name, address, birthdate, and many other details that can be used for feature engineering.
However, other industries operate with much less data, which makes it harder to start working with predictive AI.
For example, imagine if you only have first and last names, an email address, and transaction history to rely upon; there are many other details that are missing like their geographic location, demographic details, and behavioral or psychographic elements that are valuable to use when modeling.
You can start to fill these gaps by collecting more information using:
First party data, whereby you're collecting that directly from your customers, donors, etc by asking or requiring it to engage with your organization
Third party data, whereby you're enriching your existing data by augmenting or comparing what you have to research done by outside firms
Both approaches can be used together, although you must appreciate that:
First party data likely has a higher likelihood of being accurate and useful to you
Third party data can accelerate your time to market for projects around analytics and predictive AI
TLDR: You likely need more data from your customers or donors before exploring predictive AI; third party is fastest, first party is best.