Combining AI Output Using Ensemble Reasoning

Introduction

There are two main issues with LLM AI output that occur due to the probabilistic nature of LLM models:

The output is often incomplete.
LLM responses can contain inaccuracies and inconsistencies.

The question then arises: is it possible to combine multiple AI responses, each of which may possibly be slightly inaccurate, into a single, more accurate and more comprehensive result than any of the individual inputs?

In general, the answer is yes, provided the following conditions are met:

Each response has a degree of validity (i.e., is not entirely erroneous).
The process of combination is performed with a logical framework that distinguishes overlapping points from contradictory elements.

Expressed mathematically, we can say that by viewing each LLM output as a set of possible solutions or constraints, we can then find their intersection (area of overlap), which defines a more accurate solution space.

The more overlapping regions we identify among these sets, the more confidence we can have in the convergent subset.
Contradictions or discrepancies highlight areas requiring further scrutiny or additional external validation.

In other words, when each LLM output is conceptualized as a set of possible solutions or constraints, intersecting these sets narrows the solution space. Overlapping regions indicate a higher likelihood of correctness; discrepancies point to areas needing verification.

Preconditions for Combining AI Outputs

Two main factors are required to integrate multiple imprecise or partially flawed AI responses into a single, more accurate outcome are crucial: the inherent reliability of each individual response and a structured method for merging them.

Degree of Validity: First, each response should contain at least some elements of correctness or relevance. If a response is almost entirely speculative or riddled with proven inaccuracies, including it in a combined analysis may add more confusion than clarity.
Structured Logical Framework: Second, the manner in which these responses are compared and fused must follow a well-defined logical framework. The process of combination must distinguish overlapping points (which reinforce confidence in certain details) from contradictory elements (which require further scrutiny or external corroboration).

Together, these two preconditions—response validity and methodical integration—provide a foundation for Ensemble Reasoning. Even if no single LLM output is perfectly accurate, these conditions enable a process through which the combined result can be more robust than any single, partially flawed response would be on its own.

Ensemble Reasoning for Hard Facts and Qualitative Knowledge

Not all LLM outputs deal with the same type of knowledge. Some discuss empirical data (dates, numbers, or verifiable occurrences), whereas others provide subjective evaluations, predictions, or opinions.

Two main approaches are used in Ensemble Reasoning for these two contexts:

Set Theory for Combining Hard Facts

Constraint Satisfaction: Each factual statement is treated as defining a subset of possible worlds in which that statement holds.
Set Intersection: Factual consistency is achieved by intersecting these subsets. If all statements can coexist without contradiction, they collectively form a more constrained (and hence more precise) set of potential truths.
Contradiction Handling: Should certain statements contradict one another (e.g., conflicting numerical values), further checks or external verification is necessary to resolve or discard inconsistent claims.

Abstract Logic for Combining Qualitative Knowledge

Dialectical Reasoning: Building on classical logic and philosophical discourse, qualitative claims are scrutinized for overlapping themes, rather than exact numeric agreement.
Non-Contradiction Principle: Elements that do not explicitly negate one another can be combined to form a broader perspective. Where divergence does occur (e.g., differing recommendations), it may still be possible to expand the discussion in order to provide a valid context for each viewpoint.

Ensemble Reasoning for Hard Facts: The Classical Detective

Ensemble Reasoning mimics the classic human approach to detective work: if three witnesses offer different but overlapping accounts of an event, investigators combine their testimonies to narrow down the most likely version of the truth.

In other words, we are used to the idea that witnesses might be imperfect or partially inaccurate, yet we still combine their accounts to narrow down suspects.

Consider the following case (all text generated by ChatGPT o1).

Scenario

A detective gathers three approximate witness statements:

Witness A: The suspect is about 6 feet tall and wears dark clothing.
Witness B: The suspect was last seen near the bakery at around 9 PM.
Witness C: The suspect is left-handed and has a scar on the left cheek.

Step 1: Model Each Statement as a Constraint

Constraint A (K₁): “Suspect is ~6 feet tall, wearing dark clothes.”
Constraint B (K₂): “Suspect near bakery at ~9 PM.”
Constraint C (K₃): “Suspect is left-handed and has a scar on the left cheek.”

Each of these constraints can be viewed as a subset of people who meet that criterion.

Step 2: Apply Formal Triangulation

We interpret K₁, K₂ and K₃ as partially overlapping groups within the local population.
The intersection K₁ ∩ K₂ ∩ K₃ reveals individuals that satisfy all

Step 3: Analyze the Result

A non-empty intersection—possibly a single person—indicates a narrowed set of prime suspects who match every approximate statement.
An empty intersection signals potential contradictions among witnesses or errors in their observations.

Illustration
Suppose:

K₁ (tall, dark-clothed individuals) includes ten people.
K₂ (those present near the bakery at 9 PM) includes five people, two of whom appear in K₁.
K₃ (left-handed with a cheek scar) has three people, exactly one of whom also appears in K₁ ∩ K₂.

The unique individual in all three sets emerges as a likely suspect.

As demonstrated, AI can use Ensemble Reasoning to mirror the typical detective process – combining partial, flawed observations into a collectively stronger conclusion.

Ensemble Reasoning for Qualitative Knowledge

Similarly to the detective scenario, Ensemble Reasoning can be extended to any context where multiple statements (factual or opinion-based) need to be reconciled. (The following examples were also generated using ChatGPT o1.)

Deciding on a Programming Degree

The following is an example of Ensemble Reasoning used to combine qualitative knowledge that was generated by AI.

Scenario
A student worries that AI-driven automation might render coding skills less valuable. They consult three different LLMs, each of which offers the following partial advice:

Response 1: “A programming degree covers algorithms, data structures, and theory, all of which remain relevant.”
Response 2: “Certain coding tasks may become obsolete; consider AI or data science degrees if you want future-proofing.”
Response 3: “Human programmers are still vital for architecture and oversight. Look for a degree with interdisciplinary components.”

Step 1: Identify Key Claims

Overlap: All three responses agree AI will reshape programming but also emphasize the importance of solid foundational knowledge.
Differences: Degree of pivot away from standard programming vs. retaining a classic CS track supplemented by AI.

Step 2: Map Claims Using Dialectical Reasoning

Points of consensus (e.g., fundamentals matter) bolster the final recommendation.
Divergences (degree choice) aren’t direct contradictions, but rather nuances; these can be merged by suggesting a blended approach.

Step 3: Synthesize a Balanced Conclusion

A programming degree remains a valid path, especially if combined with AI or data science modules.
Incremental steps—like certificates or bootcamps—can be a trial before a full degree.

Integrating Factual and Qualitative Claims

Now let’s use the same programming-degree question, but this time each LLM answer will include both data and advice, and we will use an AI to merge the individual AI answers.

Response 1
- Fact: “Developer roles will grow ~25% from 2021 to 2031.”
- Opinion: “Focusing on machine learning is wise.”
Response 2
- Fact: “Average U.S. developer salary is $110k.”
- Opinion: “Broad CS fundamentals let you pivot to AI roles.”
Response 3
- Fact: “Dev job postings rose ~30% in four years; AI tools are booming.”
- Opinion: “Employers still need human oversight, so combine programming with AI/data science.”

Analysis

Hard-Fact Intersection: No contradictory figures—growth and high salary trends align well, reinforcing that developer demand remains strong.
Advisory Synthesis: All emphasize that AI is changing coding, but skilled human programmers retain critical roles—particularly if they incorporate AI expertise.
Merged Result: There is compelling evidence for continuing demand and respectable salaries, alongside widespread agreement that combining traditional programming with AI-related knowledge is a prudent career strategy.

Summary

Ensemble Reasoning offers a versatile method for reconciling multiple AI outputs—even if each contains inaccuracies or incomplete information. By pairing a formal approach for factual data (constraint satisfaction, set intersection) with a dialectical approach for qualitative or subjective inputs:

Contradictions are flagged for deeper investigation.
Overlaps provide reinforced confidence in the combined claims.

Whether in detective work or in synthesizing data-driven advice about a career path, Ensemble Reasoning yields a more robust, logically consistent outcome than any single AI response alone could provide.