Skip to content
All posts

Evidence-Based AI

Over the next several months, we expect that the buzz around Generative AI is going to increasingly turn into a conversation around what we at AI Bloks call “Evidence-Based AI.” As businesses look for practical applications of Generative AI that drive real outcomes and value, we believe that there are two key questions at the core of most enterprise use cases:  

  1. What is the knowledge base that the Generative AI will use as the foundation for the generation of its answer; and
  2. How can we systematically assess and audit the factual reliability of the generated AI output?

In short, both of these questions come down to a matter of evidence. 

Part of what is so remarkable about technologies like ChatGPT is its fluency and ability to wax poetically (sometimes literally) about any number of obscure topics. This remarkable fluency is based on the sheer width of the training materials, e.g., most of the English Internet (!), and the enormous depth and width of the model (175 billion parameters) that can capture a lot of information, and at times, resemble our friend who “knows enough to be dangerous” (sometimes literally). 

If even Google can be surprised in a high-stakes public demo of the truthiness problem of these models, how can a business deploy these technologies to do real work?

As wonderful as the open Internet and tools like Wikipedia are for knowledge, for most companies, the real work happens with contextual, private knowledge – policies, customer contracts, product specifications, financial documents, training materials – and without the ability to intelligently draw on these resources as a knowledge base, it is going to be difficult to build powerful enterprise generative AI systems. 

We believe that an automated Evidence-based AI system will be a multi-stage process for the foreseeable future, and comprise at least six foundational steps: 

  1. Inquiry -> Research – using state-of-the-art, industry and domain fine-tuned AI embedding models applied to private contextual knowledge (combined with public information sources) to automatically retrieve and assemble a set of prioritized “research materials” that form the wider knowledge base for the given topic; 
  2. Research -> Evidence – even the best semantic retrieval mechanisms may generate an abundance of information materials that is too large for the relatively narrow aperture of the generative model’s input span. So, there is a critical capability required to prioritize, sub-sample, aggregate and batch information from the “research materials” to create an “evidence base” that can be reviewed by the model;
  3. Evidence-based Generative Model inferences – remember this formula:  Evidence + Prompt + Instruction + Model – this is a winning recipe to systematically combine evidence and instructions along with the prompt/query to invoke the generative model;
  4. Post-Processing -> Fact-Checking: upon receiving the model output, it needs to be cross-referenced through a variety of technologies, both rules-based and supporting models, to confirm the key factual points from the model output. Also, it further needs to confirm that the model drew from the evidence and not its wider incidental pseudo-knowledge base; 
  5. Build Deliverables: to create true productivity benefit, the elements need to be integrated and assembled over multiple passes through the model, aggregating both AI output analyses with evidence into larger more meaningful work product deliverables; and  
  6. Record-keeping & Audit: every call into a model should be captured and tracked, along with the evidence and query, and the details of the model called. Over time, this becomes a critical artifact for audit processes, and just as importantly, for gaining confidence in the process accuracy- and driving systematic improvements where required. 

At Ai Bloks, we apply these concepts in our generative AI output. We offer the fastest, no code AI generative data pipelines to get started with this process – and start realizing value right away – while fine-tuning for your industry domain and yet preserving the flexibility to integrate new models as they become available.