ICE BENCHMARK

Isomorphic Consistency Evaluation Results

Protocol v1.7

Updated: Today

Top Performer

Loading...

Total Models

0

Logic Profile (Radar)

Click legend items below to toggle model visibility and reveal overlapping profiles.

Score Comparison

Modus Ponens (Valid) Ability to follow basic valid logic (If A then B; A is true; therefore B). All models should score 1.0. Fallacy Trap (Affirming Consequent) A trap! (If A then B; B is true; therefore A?). The correct answer is "Unknown" or "Cannot Determine". Simple models incorrectly say "Yes". Multi-Hop Reasoning Evaluates the ability to chain multiple logical steps (If A then B; If B then C; A is true; therefore C). Modus Tollens Tests inverse logic (If A then B; B is false; therefore A is false). Disjunctive Syllogism Assesses "either/or" logic (Either A or B; A is false; therefore B is true). Denying the Antecedent Trap! (If A then B; Not A; therefore Not B?). Correct: Unknown. Simple models say 'No'. ICE Score (Avg) The combined "Reasoning Integrity" score. Penalizes models that are decisive but logically wrong. Status Definitions Excellent (>0.9): Highly reliable logic.
Good (>0.6): Generally capable, some fallacy risks.
Biased (<=0.6): Significant logical failures or domain bias detected.

Detailed Leaderboard

Rank Model Name Modus Ponens Fallacy Trap Multi-Hop Modus Tollens Syllogism DA (Trap) Decoupling Score Status

Methodology

Loading Methodology...

Analysis & Key Findings

Loading Report...