ICE BENCHMARK

Isomorphic Consistency Evaluation Results

Protocol v1.3

Read Whitepaper (PDF)

Updated: Today

Top Performer

Loading...

Total Models

0

Logic Profile (Radar)

Click legend items below to toggle model visibility and reveal overlapping profiles.

Score Comparison

Modus Ponens (Valid) Ability to follow basic valid logic (If A then B; A is true; therefore B). All models should score 1.0. Fallacy Trap (Affirming Consequent) A trap! (If A then B; B is true; therefore A?). The correct answer is "Unknown" or "Cannot Determine". Simple models incorrectly say "Yes". Multi-Hop Reasoning Evaluates the ability to chain multiple logical steps (If A then B; If B then C; A is true; therefore C). Modus Tollens Tests inverse logic (If A then B; B is false; therefore A is false). Disjunctive Syllogism Assesses "either/or" logic (Either A or B; A is false; therefore B is true). ICE Score (Avg) The combined "Reasoning Integrity" score. Penalizes models that are decisive but logically wrong.

Detailed Leaderboard

Rank Model Name Modus Ponens Fallacy Trap Decoupling Score Status

Methodology

Loading Methodology...

Analysis & Key Findings

Loading Report...