Most people who start judging think the job is simple: listen to both sides, decide who you agree with, give a result. That's wrong, and it's the single most common mistake new judges make.
Your job as a judge is to decide who did the best debating, not who's right. Those are different things. A team can be "right" about a topic and still lose because their arguments were poorly structured, their mechanisms were unexplained, or they failed to respond to the other side. A team can be "wrong" in terms of your personal beliefs and still win because they built a clear, well-supported case which the other side couldn't knock down.
The average reasonable person standard
The way most experienced judges handle this is by imagining an "average reasonable person" listening to the debate for the first time. This person has no strong political views on the topic. They have common sense and a basic understanding of how the world works, but no specialist knowledge. They can follow a logical argument and they can spot when something doesn't make sense.
When you're judging, ask yourself: would this average reasonable person be convinced by what this team just said? Would they buy the mechanism? Would they find the impact important?
This standard does a few useful things. It stops you from importing your own expertise or opinions. If you know a lot about economics and a team gets a detail wrong, you can't tank them for it unless the error is obvious to a non-expert. It also stops you from holding teams to impossibly high standards. The average reasonable person doesn't require peer-reviewed sources. They require arguments which are internally consistent and make sense.
What you're tracking
There are four things happening simultaneously in a debate. You need to track all of them.
Arguments. What claims is each team making? You need the problem they've identified, the mechanism behind it, and where they say the impact lands. Write these down in short form. You don't need to transcribe speeches. You need to capture the logical structure of each argument.
Responses. When Team B responds to Team A's argument, what happens to Team A's case? Did Team B actually address the mechanism, or did they attack something Team A never said? Did Team B's response make Team A's impact less likely, less severe, or less important? Or did Team B just assert "that's wrong" without explanation?
Weighing. Did anyone tell you why their arguments matter more than the other side's? This is often the deciding factor in close rounds. Two teams can both have reasonable arguments, but the team which explains why their impact is more important (scope, depth, probability, reversibility, timeline) gives you a reason to put them ahead.
Engagement. Did teams actually respond to each other, or did they talk past each other? A debate where Team A runs three arguments and Team B runs three completely different arguments, and neither side responds to the other, is hard to judge. When this happens, you're looking for which team's uncontested arguments are more important by default.
The call
At the end of the round, you need to rank teams (in BP) or pick a winner (in two-team formats). Here's a process which works.
First, identify the major clashes. What were the 2-3 big disagreements in the round? Usually these emerge naturally: one side says economic growth matters most, the other says human rights do. One side says the policy works, the other says it backfires.
Second, figure out who won each clash. Which team's argument on this clash was more developed, better supported, and more persuasive to the average reasonable person? Did one team's response actually undermine the other's mechanism, or was it just an assertion?
Third, weigh the clashes against each other. If Team A won on economics and Team B won on human rights, which clash matters more in the context of this motion? Did either team tell you which clash is more important? If they did, did you find that weighing convincing?
Fourth, rank. The team which won the most important clashes, or which won more clashes overall, should be ranked higher. In BP, you're doing this for four teams, which is harder, but the process is the same: who contributed the most winning arguments on the most important clashes?
What this looks like in practice
Here's a concrete example from a real round. The motion was about whether Balkan countries should de-emphasize the struggle against the Ottoman Empire in their history curriculum. Opening Government argued the emphasis creates nationalism and divides Balkan countries from each other and from Turkey. Opening Opposition made a structural argument: you can't teach triumph without teaching struggle. They're inseparable. If students learn about national independence, they necessarily learn about what they were independent from. OG hadn't actually shown that teaching "triumph" produces different attitudes than teaching "struggle."
OO won that clash. Why? Because OG's entire case rested on a distinction (emphasis on struggle vs. emphasis on triumph) which OO showed wasn't real. The judge didn't need to decide whether nationalism was bad. They needed to decide whether OG had proven their policy produces a meaningfully different outcome. OO showed they hadn't.
That's the kind of reasoning you need to track. The clash wasn't about who had the better values. It was about whose mechanism held up under scrutiny.
Another example: a debate on whether e-sports should use franchise models or open qualifiers. Opening Opposition argued the qualifier model creates bigger fanbases, which matters because viewers vastly outnumber players. Closing Opposition argued something different: small organizations need open qualifiers to enter the competitive scene, because franchise models lock them out. The judge ranked CO above OO because CO's argument was less contingent on specific mechanics and more structurally sound. It didn't matter which qualifier format you used. The barrier-to-entry point held regardless.
Notice what the judge did there: they didn't just pick the "better" argument. They evaluated which argument was more likely to be true across different scenarios. That's probability reasoning applied to adjudication.
Common new-judge mistakes
Judging on agreement. You agree with the government's position on climate change, so you rank them higher even though opposition ran better arguments. Catch yourself doing this.
Judging on speaking ability. A speaker is confident, funny, and charismatic. Their arguments are thin and poorly structured. Another speaker is nervous and stammers, but their case is airtight. The second speaker should rank higher. Delivery matters when arguments are otherwise equal. It should never override substance.
Giving credit for things you thought of. A team doesn't make an argument, but you can see where they were going with it. You fill in the gap yourself and credit them for it. Don't do this. If they didn't say it clearly enough for the average reasonable person to follow, it doesn't count.
Punishing a team for an argument you dislike. The motion is about a controversial topic and a team takes a position you find distasteful. If their arguments are well-structured and persuasive, they deserve credit. Your moral objections are not relevant to the call unless the team's arguments rely on claims which the average reasonable person would reject (outright factual falsehoods, for example).