I logged onto Smogon today and was disturbed to discover this thread, which proposes to ban a long list of Pokémon before the game is even released.
It's worth noting that the goals mentioned in the original post of that thread fly in the face of the ostensible philosophy of Smogon:
This is not a mistake to be taken lightly. It is for all intents and purposes irreversible. As soon as you publish some long ban list, two things happen: (1) It develops a notion of certain Pokémon being intrinsically Uber or OU just because of their name. We saw this happen last generation not just with the standard metagame, but also with UU. It was eventually fixed with UU. However, the standard metagame resisted correction, largely due to tradition, and a lack of desire to challenge established, but arbitrary, norms. (2) It makes it all too easy to ban a host more Pokémon, because the power level for ubers has already been fixed at a point that is not significant for the new game.
It's easy to say that we will just recognise big shifts in the metagame and adjust this "preliminary ban list" accordingly. Experience shows that doesn't happen. It took years for it to happen with UU this generation, thanks to some dedicated people with a basic grasp of competitive games. It never happened with the standard game.
Before we ruin generation five prior to its release, let's consider an objective review of tiering in generation four. We all know the cliche about letting history repeat itself.
In June 2007, DP tiers are discussed seriously for the first time. There is no consensus in the thread. Some people want to ban fewer Pokémon than others. Obi and AA argue for handling Pokémon like a real competitive game and not hashing out a long ban list based on nothing. Let's keep one thing in mind. In this generation, there are already enough ubers that there is a very playable uber metagame. It's not close to being as balanced as standard, but it's playable. How many changes do you think it will require to make it reasonable balanced? Just sprinkling a few moves here and there, and maybe adding a couple more Pokémon at the highest echelons of power.
One argument repeated in that thread is not people will play ubers anyway. That misses the point. It may be possible to have a balanced game with far fewer Pokémon banned. People today talk occasionally about making a "balanced ubers", but if this can be done, it would be standard. This isn't just a linguistic argument over what to name the tiers. The tier that is identified as OU will get the most play, be the most explored, and generally be the focal point of competitive play.
Ultimately, the discussion in that thread was shut down with such characteristically solid arguments as "Can we please prevent the consensus that was building from being undone by a wall of text?" (Emphasis mine.) If you examine the thread, you will notice that there is nothing even vaguely resembling a consensus. Nonetheless, banning a long list of Pokémon early in the game's life went through and stuck until the end.
The power level defining ubers was never serious reconsidered. We have a duty as competitive players to explore that power level properly, especially in the face of a new game. I have seen a lot of posts by people so confidently stating that not much will change. This couldn't be more wrong. The truth is you have no idea what subtle changes to move pools, move stats (e.g. power, PP), and new Pokémon will have on the relative quality of Pokémon. It doesn't take much to shift the game significantly, and deciding a ban list in advance will effectively blind you to it.
This mistake, made early in the history of DP, laid the foundation for all of the tiering debates to come. It is a mistake that should have been avoided. Only banning broken Pokémon, after plenty of play experience, would have been years shorter than the process that actually ensued, and not tainted by doubts of illegitimacy.
By November 2007, Shoddy Battle 1 had ladder functionality. The Smogon arbitrary ban list had not changed in that time. Unfortunately, that said arbitrary ban list was already well ingrained, and any major change to it was impossible. Independent of Smogon, we (me, AA, obi, tenchi, and others) adopted a very minor testing scheme, involving a tournament to test Deoxys-S. One thing we learned from the tournament is that Swiss tournaments are too complex for most players in this community, at least without software support. More significantly, not a single person of the hundreds of people who had played in the tournament voiced a problem with Deoxys-S being unbanned.
Two weeks after the conclusion of the tournament, some notable Smogon members who were up to that point uninvolved with official server, were so excited by our unbanning of Deoxys-S that they asked if I could unban Wobbuffet immediately, without the benefit of another tournament. It turned out that Wobbuffet was the next item on our list anyway, but we mulled over whether another tournament was worth it. Ultimately, in light of the fact that the previous tournament had failed to convince anybody of anything, we decided to forge ahead and unban Wobbuffet. The backlash was intense. No one wanted to test Wobbuffet. In public, I defended our move, but in private, I was quite upset with AA. I had put in hundreds of hours of work writing a Pokémon simulator, which was extremely popular, and was the basis of competitive Pokémon on the internet at that point, and everybody hated me for some minor tier experimentation that wasn't even my idea. This was extremely grating.
I was so uspet by the backlash that I attempted to devise a statistical argument for banning Wobbuffet. Unfortunately, it couldn't be done. Barely anybody even used Wobbuffet on the ladder. You could play the game as though Wobbuffet did not exist, and you would only lose the occasional match. In effect, this was not a broken Pokémon, because it didn't affect how you constructed your team at all, as far as ladder play was concerned. This never changed for the entirety of Official Server.
The lesson learned here is that popular opinion cannot be ignored in tiering decisions. Strong feelings that a Pokémon is broken prevent it from being tested. In fact, the hatred for this Pokémon was so intense that any vote to ban it would have easily been by a 2/3 supermajority, and probably much more.
Smogon proper re-entered the tiering scene in March 2008. A process was devised to decide whether to ban Garchomp, a Pokémon that everybody knew was popularly disliked. The first Smogon attempt at a Pokémon banning system was the closest the commmunity has come to a good process. It was very simple. Anybody who met simple rating and deviation checks on the ladder got to vote on whether Garchomp should be banned. A supermajority should have been required to ban the Pokémon, but ultimately, the process was still quite good. It didn't even take that long, and it directly measured opinion among competitive players.
Unfortunately, things went very far downhill shortly after this. The next year was spent on entirely pointless "tests" because by its very design, so-called "Stage 2" was 100% pointless. Eventually, when Stage 3 rolled around, the results of Stage 2 were irrelevant.
Let's take a step back and think about the previous paragraph. A whole year was wasted by a process that was designed out of the box to be pointless. I want to make sure that is very clearly understood. Stage 2 was pointless. This is so important to understand because it is often bandied about that proper tiering processes take too long. In reality, poor decisions regarding the tiering process is what makes it take too long. Unfortunately, DP was a case of the latter. A sane process would have been similar to stage 3 from the start. Also important is that a sane process would have stopped at the design of the first test, and considered only a simple rating and deviation check.
Another way in which things went far downhill was the introduction of two extra metrics to filter the voter pool. First, voters had to submit "paragraphs" which were never published for public inspection, and which were arbitrarily used to decide who would vote. This measure alone ruined the system. Particularly ironic is the fact that in ruining the system, it was also made slower, and one big complaint is always how slow things are; this was the fault of the people making this complaint.
The second big mistake that was made around this time was the introduction of "suspect experience". This is a secret measure that no one except for three people know the definition of. We were told repeatedly that it was good, and useful, but of course, since we couldn't see it, we had no idea. At this point, the process was devastated. Voters were excluded on completely mysterious grounds, both through paragraph submissions and a top secret formula that was a terrible idea, and remains a terrible idea.
As previously mentioned, the next year was a complete waste of time, and it was wasted entirely by the same people who complain that the process took too long. No one wanted the terrible process they devised, including paragraph submissions and voodoo forumulas. Most people wanted a populist system like the first Garchomp test. This would have been the way to go.
The next substantative thing to happen wasn't until August 2009, when so-called "Stage 3" started. This represented a process similar to what the process should have been from the start. Particularly jarring was the way it had been designed to make the previous year's work useless. The flaw here was wasting the previous year; Stage 3 should have been the entire process. Stage 3 was still a mess though. My attempts to improve it slightly ended up wasting many dozens of hours of my time, and ultimately led to nothing, despite the large number of people who supported something along the lines I was proposing.
After Stage 3, things became even worse. After messing up immensely over the last year and a half, the wasted time was used as a reason to introduce an another bad process. First of all, after messing up so badly, there should have been a major leadership change in tiering policy. How does it make any sense that after messing up badly you get a second chance? We have plenty of people far more capable of handling tiering than the people who handled it this generation. We need people with special skills. People who not only enter tournaments, but place well in them. People who engage with strategy and the community in Stark Mountain. People who have contributed to site content more recently than two or three years ago. People who are capable of putting in the technical work required to make processes a reality. It's time for other capable members of the community to set the direction for tiering policy.
The Smogon Council was a very bad idea. When it was first mentioned it in #stark, I said in a private message that it was not even worth the time to argue with it, because no one would swallow it. Obviously, I was wrong. Smogon's culture of respect (people with status must be respected unconditionally) has prevented people from pointing out the obvious: that the smogon Council was the worst idea since suspect experience. The Council was not even faster than a simple vote based on a simple rating/deviation metric. The Council consists of people handpicked by two people in a process based on nothing tangible and with no oversight. It's effectively no different from those two people banning pokemon by fiat. It may be better than the previous process, but that's a low bar.
That brings us to today. Everybody knows the first process was a disaster. After all, the flaws with that first process are continually cited as the reason to introduce the council. This alone should raise eyebrows about the same people who designed that previous process having continuing influence on Pokémon policy. Although they don't realise it yet, they also messed up a second time with the "Smogon Council". Twice is more than enough chances. You may not agree with my personal position of not banning Pokémon before the game is released, but if there is one thing you should take away from the history of tiering in DP, it's that some new qualified people need to step up to the plate to spearhead tiering in the next generation. We should avoid banning things hastily. We have plenty of time to do it right. So long as we avoid developing a process as bad as paragraph submissions, top secret formulas, and other arbitrary delays and exclusions, we don't run the risk of wasting years this time. Such a working process is a simple vote with the only filter being a ladder statistic check.
The bottom line is that there is no justification for starting off the next generation with arbitrary bans. The DP ban list is already very long, and the next generation is only going to introduce more pokemon of a similar level of power, or revise older pokemon up to that level. Even the argument about saving time doesn't hold water, because, using a good process, we can balance the game far faster than was done this generation. The best process is a simple vote based on a completely open metric. This is efficient, fair, representative, and completely peer reviewable. Most importantly, we should not ban any pokemon without having played the game for a while.
It's worth noting that the goals mentioned in the original post of that thread fly in the face of the ostensible philosophy of Smogon:
Smogon attempts to avoid bans as much as possible—only when it becomes very apparent that a Pokémon is far too powerful to be in line with a balanced metagame is it banished permanently from the standard arena.
They also fly in the face of Smogon's front page slogan: "Smogon is a Pokémon website and community specializing in the art of competitive battling." If the community chooses to accept the proposals in that thread, we will have to change the slogan to "Smogon is a web site dedicated to having a good time playing Pokémon with friends. No skarmbliss, no substitute, no vilopumes!"This is not a mistake to be taken lightly. It is for all intents and purposes irreversible. As soon as you publish some long ban list, two things happen: (1) It develops a notion of certain Pokémon being intrinsically Uber or OU just because of their name. We saw this happen last generation not just with the standard metagame, but also with UU. It was eventually fixed with UU. However, the standard metagame resisted correction, largely due to tradition, and a lack of desire to challenge established, but arbitrary, norms. (2) It makes it all too easy to ban a host more Pokémon, because the power level for ubers has already been fixed at a point that is not significant for the new game.
It's easy to say that we will just recognise big shifts in the metagame and adjust this "preliminary ban list" accordingly. Experience shows that doesn't happen. It took years for it to happen with UU this generation, thanks to some dedicated people with a basic grasp of competitive games. It never happened with the standard game.
Before we ruin generation five prior to its release, let's consider an objective review of tiering in generation four. We all know the cliche about letting history repeat itself.
In June 2007, DP tiers are discussed seriously for the first time. There is no consensus in the thread. Some people want to ban fewer Pokémon than others. Obi and AA argue for handling Pokémon like a real competitive game and not hashing out a long ban list based on nothing. Let's keep one thing in mind. In this generation, there are already enough ubers that there is a very playable uber metagame. It's not close to being as balanced as standard, but it's playable. How many changes do you think it will require to make it reasonable balanced? Just sprinkling a few moves here and there, and maybe adding a couple more Pokémon at the highest echelons of power.
One argument repeated in that thread is not people will play ubers anyway. That misses the point. It may be possible to have a balanced game with far fewer Pokémon banned. People today talk occasionally about making a "balanced ubers", but if this can be done, it would be standard. This isn't just a linguistic argument over what to name the tiers. The tier that is identified as OU will get the most play, be the most explored, and generally be the focal point of competitive play.
Ultimately, the discussion in that thread was shut down with such characteristically solid arguments as "Can we please prevent the consensus that was building from being undone by a wall of text?" (Emphasis mine.) If you examine the thread, you will notice that there is nothing even vaguely resembling a consensus. Nonetheless, banning a long list of Pokémon early in the game's life went through and stuck until the end.
The power level defining ubers was never serious reconsidered. We have a duty as competitive players to explore that power level properly, especially in the face of a new game. I have seen a lot of posts by people so confidently stating that not much will change. This couldn't be more wrong. The truth is you have no idea what subtle changes to move pools, move stats (e.g. power, PP), and new Pokémon will have on the relative quality of Pokémon. It doesn't take much to shift the game significantly, and deciding a ban list in advance will effectively blind you to it.
This mistake, made early in the history of DP, laid the foundation for all of the tiering debates to come. It is a mistake that should have been avoided. Only banning broken Pokémon, after plenty of play experience, would have been years shorter than the process that actually ensued, and not tainted by doubts of illegitimacy.
By November 2007, Shoddy Battle 1 had ladder functionality. The Smogon arbitrary ban list had not changed in that time. Unfortunately, that said arbitrary ban list was already well ingrained, and any major change to it was impossible. Independent of Smogon, we (me, AA, obi, tenchi, and others) adopted a very minor testing scheme, involving a tournament to test Deoxys-S. One thing we learned from the tournament is that Swiss tournaments are too complex for most players in this community, at least without software support. More significantly, not a single person of the hundreds of people who had played in the tournament voiced a problem with Deoxys-S being unbanned.
Two weeks after the conclusion of the tournament, some notable Smogon members who were up to that point uninvolved with official server, were so excited by our unbanning of Deoxys-S that they asked if I could unban Wobbuffet immediately, without the benefit of another tournament. It turned out that Wobbuffet was the next item on our list anyway, but we mulled over whether another tournament was worth it. Ultimately, in light of the fact that the previous tournament had failed to convince anybody of anything, we decided to forge ahead and unban Wobbuffet. The backlash was intense. No one wanted to test Wobbuffet. In public, I defended our move, but in private, I was quite upset with AA. I had put in hundreds of hours of work writing a Pokémon simulator, which was extremely popular, and was the basis of competitive Pokémon on the internet at that point, and everybody hated me for some minor tier experimentation that wasn't even my idea. This was extremely grating.
I was so uspet by the backlash that I attempted to devise a statistical argument for banning Wobbuffet. Unfortunately, it couldn't be done. Barely anybody even used Wobbuffet on the ladder. You could play the game as though Wobbuffet did not exist, and you would only lose the occasional match. In effect, this was not a broken Pokémon, because it didn't affect how you constructed your team at all, as far as ladder play was concerned. This never changed for the entirety of Official Server.
The lesson learned here is that popular opinion cannot be ignored in tiering decisions. Strong feelings that a Pokémon is broken prevent it from being tested. In fact, the hatred for this Pokémon was so intense that any vote to ban it would have easily been by a 2/3 supermajority, and probably much more.
Smogon proper re-entered the tiering scene in March 2008. A process was devised to decide whether to ban Garchomp, a Pokémon that everybody knew was popularly disliked. The first Smogon attempt at a Pokémon banning system was the closest the commmunity has come to a good process. It was very simple. Anybody who met simple rating and deviation checks on the ladder got to vote on whether Garchomp should be banned. A supermajority should have been required to ban the Pokémon, but ultimately, the process was still quite good. It didn't even take that long, and it directly measured opinion among competitive players.
Unfortunately, things went very far downhill shortly after this. The next year was spent on entirely pointless "tests" because by its very design, so-called "Stage 2" was 100% pointless. Eventually, when Stage 3 rolled around, the results of Stage 2 were irrelevant.
Let's take a step back and think about the previous paragraph. A whole year was wasted by a process that was designed out of the box to be pointless. I want to make sure that is very clearly understood. Stage 2 was pointless. This is so important to understand because it is often bandied about that proper tiering processes take too long. In reality, poor decisions regarding the tiering process is what makes it take too long. Unfortunately, DP was a case of the latter. A sane process would have been similar to stage 3 from the start. Also important is that a sane process would have stopped at the design of the first test, and considered only a simple rating and deviation check.
Another way in which things went far downhill was the introduction of two extra metrics to filter the voter pool. First, voters had to submit "paragraphs" which were never published for public inspection, and which were arbitrarily used to decide who would vote. This measure alone ruined the system. Particularly ironic is the fact that in ruining the system, it was also made slower, and one big complaint is always how slow things are; this was the fault of the people making this complaint.
The second big mistake that was made around this time was the introduction of "suspect experience". This is a secret measure that no one except for three people know the definition of. We were told repeatedly that it was good, and useful, but of course, since we couldn't see it, we had no idea. At this point, the process was devastated. Voters were excluded on completely mysterious grounds, both through paragraph submissions and a top secret formula that was a terrible idea, and remains a terrible idea.
As previously mentioned, the next year was a complete waste of time, and it was wasted entirely by the same people who complain that the process took too long. No one wanted the terrible process they devised, including paragraph submissions and voodoo forumulas. Most people wanted a populist system like the first Garchomp test. This would have been the way to go.
The next substantative thing to happen wasn't until August 2009, when so-called "Stage 3" started. This represented a process similar to what the process should have been from the start. Particularly jarring was the way it had been designed to make the previous year's work useless. The flaw here was wasting the previous year; Stage 3 should have been the entire process. Stage 3 was still a mess though. My attempts to improve it slightly ended up wasting many dozens of hours of my time, and ultimately led to nothing, despite the large number of people who supported something along the lines I was proposing.
After Stage 3, things became even worse. After messing up immensely over the last year and a half, the wasted time was used as a reason to introduce an another bad process. First of all, after messing up so badly, there should have been a major leadership change in tiering policy. How does it make any sense that after messing up badly you get a second chance? We have plenty of people far more capable of handling tiering than the people who handled it this generation. We need people with special skills. People who not only enter tournaments, but place well in them. People who engage with strategy and the community in Stark Mountain. People who have contributed to site content more recently than two or three years ago. People who are capable of putting in the technical work required to make processes a reality. It's time for other capable members of the community to set the direction for tiering policy.
The Smogon Council was a very bad idea. When it was first mentioned it in #stark, I said in a private message that it was not even worth the time to argue with it, because no one would swallow it. Obviously, I was wrong. Smogon's culture of respect (people with status must be respected unconditionally) has prevented people from pointing out the obvious: that the smogon Council was the worst idea since suspect experience. The Council was not even faster than a simple vote based on a simple rating/deviation metric. The Council consists of people handpicked by two people in a process based on nothing tangible and with no oversight. It's effectively no different from those two people banning pokemon by fiat. It may be better than the previous process, but that's a low bar.
That brings us to today. Everybody knows the first process was a disaster. After all, the flaws with that first process are continually cited as the reason to introduce the council. This alone should raise eyebrows about the same people who designed that previous process having continuing influence on Pokémon policy. Although they don't realise it yet, they also messed up a second time with the "Smogon Council". Twice is more than enough chances. You may not agree with my personal position of not banning Pokémon before the game is released, but if there is one thing you should take away from the history of tiering in DP, it's that some new qualified people need to step up to the plate to spearhead tiering in the next generation. We should avoid banning things hastily. We have plenty of time to do it right. So long as we avoid developing a process as bad as paragraph submissions, top secret formulas, and other arbitrary delays and exclusions, we don't run the risk of wasting years this time. Such a working process is a simple vote with the only filter being a ladder statistic check.
The bottom line is that there is no justification for starting off the next generation with arbitrary bans. The DP ban list is already very long, and the next generation is only going to introduce more pokemon of a similar level of power, or revise older pokemon up to that level. Even the argument about saving time doesn't hold water, because, using a good process, we can balance the game far faster than was done this generation. The best process is a simple vote based on a completely open metric. This is efficient, fair, representative, and completely peer reviewable. Most importantly, we should not ban any pokemon without having played the game for a while.