"No bans" Testing Proposal

cim

happiness is such hard work
is a Contributor Alumnusis a Smogon Media Contributor Alumnus
This is just an idea, feel free to throw it out and start over.

A very common criticism of the "no bans" approach is time consumption; tests traditionally take forever, so no bans metagames would waste lots of time! I've consistently argued that this doesn't have to be the case, to no avail. So to help give people an idea of what I'm thinking, I've decided I should write up a more specific proposal that outlines how we can save time and preserve a solid process.

The main assumptions I'm working under here are as follows.
  1. Better battlers give a more accurate tiering result. Thus a hard voter cutoff at an arbitrarily high ladder ranking will get us a pool of voters that balances out. I know I've doubted this idea in the past but it seems like the most fair way to do things now. We all remember the horror story of Shaymin, but with a fresh ladder reset that kind of thing shouldn't happen and the outliers should cancel out.
  2. The most obvious of bans can be determined with only two weeks of play if they are overhyped or not. The first Shaymin-S ban and the current Heracross in UU test are my benchmarks here. By the end of two weeks, people generally have "calmed down" enough to make sure that the Pokemon they're talking about really are that bad. I don't think two weeks is enough for a full test of borderline cases, but it is enough to get the stuff that is "obvious" out of the way.
  3. Swift batch bans are okay initially with a supermajority of votes for the ban.

Now onto that, here's my proposed testing cycle:

  • Begin Week 1: Ladder opens, 2 week test begins
  • End Week 2: Eligible Voters Posted On Site
  • Week 3: One week to accept "nomination" and post your vote
  • End Week 3: Supermajority is banned, Majority on the chopping block
  • Begin Week 4: Ladder ranks reset, 3 week test begins
  • End Week 6: Elgible Voters Posted
  • Week 7: One week to accept "nomination" and post your vote
  • End Week 7: Supermajority is banned, Majority on the chopping block
  • Begin Week 8: Same as above continuously, but after the 5 week cycle repeating that so that a new ban list comes out every 6 weeks for easier organization with the usage stats / tier changes

With a very simple test using a single ladder, and simple arbitrary deviation requirements, what could go wrong?

Edit: As for a quick estimated "time for stability", I'd say it would only take 3 - 4 cycles to approach stability. This is based off of the UU process, but I'm giving people the benefit of the doubt and assuming OU is dramatically worse than UU ever was.
 
First off, thanks for posting and getting the ball rolling Chris. This is a good initiative.

Better battlers give a more accurate tiering result.

I actually think this is a pretty baseless assumption-- rather, it is simply incorrect. First off the word "accurate" is inappropriately used here, since there is nothing we are actually "measuring." There is no "correct answer" in tiering, so "accuracy" is a completely irrelevant concept here.

What we really accomplish by having people play, is to educate their opinions. Note that this in no way implies that the stronger players have better or more valuable opinions-- merely that we have forced the players to attain more knowledge that may sway their opinions.

There is, no where in this logic, reason to believe that the opinions of stronger players are more valuable than those of weaker players.

I doubt this point will change the fact that we (smogon) always turn only to the stronger players, but if we continue to choose to do so I want us to be sure we do so knowing there is really no logical reason to do so.

I personally, would make the voting much more open and allow more people to have a say. I want to have more faith in the community, and also value our members-- even those who aren't at the top of the ladder or in PR.



That out of the way, can we be more concrete as to what this timeline means? Maybe it's just me, but I am unsure of what you mean by the terms super majority, majority, and chopping block. Let's get some numbers on the board (in terms of defining majorities and rank cut offs), and say exactly what actions we will take. Let's define chopping block (ie. will be voted on the following week?).


I fully support Chris's view that this process can be done swiftly, efficiently, and should be done with minimum time wasted on "what ifs" and straight, cold-hearted use of the ban button.

You will notice I never used the word "test" in this whole post, as I will hold firm to the position that it is not testing when there is no correct solution-- we are merely aiming to have people make their opinions based on actual play.
 
Supermajorty: 67% of the vote tally
Majority: 51% of the vote tally
"Chopping Block" - Needs one more majority vote

This is all referencing the late changes in DPP tiering that required a supermajority or two consecutive majority votes to rule something Uber. I feel this policy applies more now than it ever did back then.

I use the terms "accurate" and "correct" a little less than literally; I mean this as in "as close to our ideal as we can get".
 
Thank you Chris, my apologies for not reading the relative literature and bringing in that tedious terminology discussion. I just wanted to clarify.

I'd also point out though that no one knows (or can know) "what the ideal is" exactly, and that's probably at the heart of why this discussion has been so infuriating. :/ This is why I am extremely hesitant (rather opposed to) the notion that stronger players "know better" what the ideal is when the ideal is unknowable and ultimately based on opinion-- no real reason to limit to the opinions of only a few.
 
Here's an idea, just so we can be efficient with the Suspect ladder and everything. First time around, we have our "no-bans" metagame. After the test period is over, we'll accept nominations/eligible voters/maybe paragraphs and then the vote is held. If a supermajority is reached to ban (>2/3), then the suspect is shown the straight away and they leave forever unless we decide to test them later (Which is unlikely). If they end up between 50% and 2/3, we then make the use of the Suspect Ladder and do what we did with Mence. That way, we can both please the people who voted Uber so they can finally have the OU they want, but still making sure we get it right. If it reaches another majority, it gets banned permanently. If it doesn't we test it again. If the second time also doesn't yield a majority, then we re-implement it onto the OU ladder and the cycle continues. Here's a simple diagram of what I mean:

uberstesting.jpg


Enjoy.
 
How about "intelligent users" are more likely to produce an "intelligent tier." There is a reason everything major is decided by a small group of intelligent people.

Begin Week 8: Same as above continuously, but after the 5 week cycle repeating that so that a new ban list comes out every 6 weeks for easier organization with the usage stats / tier changes

Can you elaborate on this? I'm not quite sure what you're getting at.
 
I haven't had time to write a proper thread, but this is the type of process I would like to see:

1. The first N weeks of play are not part of any process. This not only gives time to play the game without the spectre of bans looming overhead, but it also gives us real data on the distribution of ladder ratings in an environment not involving testing. We actually have no idea what this distribution will look like, because ratings in Shoddy Battle 1 are bogus due to people using alts, so having this distribution is necessary to proceed.

2. After that, we proceed in M week cycles. At the end of a cycle, people exceeding rating metric A vote on pokemon to ban or clauses to implement or pokemon to unban. Then we ask people exceeding metric B, which is higher than A, first whether to want to implement any bans. If not (by simple majority), there are no bans this cycle. Otherwise, they vote on the top Q nominations proposed by the bigger group. Since we are talking about highest of power, it's commonly thought that these pokemon are "obviously broken", and hence, a 2/3 supermajority should be required to ban them.

3. If a pokemon that was nominated to be banned in one period is not banned, it cannot be nominated in the next period. This is to avoid pokemon which are merely controversial, rather than agreed to be broken, from dominating the Q nominations.

4. We repeat these cycles until the threshold becomes too high to sustain more bans. At this point, we have finished defining the first ban list. Now we move onto the next mode of play and repeat the process.

5. After X cycles in a single mode, the ratio required to ban becomes even higher. Allowing things to be banned at any time leads to a "if we can't deal with it, let's ban it" mentality. Eventually bans should become more difficult in a mode to prevent this from happening.


Constants I think would be appropriate are N=4, M=4, Q=5, X=6. A and B would be determined through analysis of the rating distributions we obtained in step (1).

An exact timetable will be published with votes published to be on days known months in advance. This means that people will know exactly when to log to vote, and can make plans accordingly, and as such, the votes can be open for only several days.

Also note that this process is completely mechanical. It literally requires no "steering" once we decide on the values of these constants, which we do before it is set in motion.
 
Is there a reason you want to limit the number of pokemon that can be voted on. Couldnt it be something like, the top A people can nominate any change to a ruleset, and if it get's seconded by someone else in A then it gets added to the vote. We may have to do an ST vote or something.

It also means people can vote on changes other than what pokemon to ban. Like whether to ban certain moves or something. Though this could be dangerous, I think we would need a stronger explanation of in what circumstances to make rule changes other than bans..

One other thing that hasnt been specified here, but I think should be the case, and may in fact be what you were intending, is that whatever testing is done should be done on the standard ladder. A separate suspect ladder is unnecessary.

Have a nice day.
 
I heavily disagree with Cathy's process, which to me looks like it could easily get side tracked into another who-knows-how-many-years-spiral. The whole point of this discussion is that CIM and others believe we can do this without going for years on end to accomplish it. Really, it was be just as much a fiasco as 4th Gen if this whole process is not done by the end of half a year. Ideally less than that-- more like 4 months as I have previously posted.

In any case, what ever plan we use should be specific with the dates, and aim to have a specific end period-- I don't like us talking about "continuous cycles."

Obviously there should be a need a process built into the system that deals with something like Yache-Chomp (a relatively acceptable OU pokemon suddenly discovers a set that creates universal dominance), and call for re-examination. However, we want to have a relatively concrete list within a reasonable time frame.

Instead of aiming for some unknown and unknowable "ideal" (since all tiering is opinion based anyway), it's better to enjoy the real benefits of a stable rule system.


In any case, GF has released new threats and mechanics to the BW game so powerful that there is definitive need to have the capacity to ban any number of threats if necessary, especially if we start with no bans. The system should be "let everything in, but can easily give it the boot."
 
I agree with a stable ruleset being important, and I explicitly mentioned this with part (5). It's also implied by part (2) and part (4).

As for timeframe, this process, with the constants I gave, could recover the arbitrary ban list (containing a whopping 25 pokemon) proposed in that other thread in only six months (including the month that isn't a part of the process). However, I contend that it's highly likely that the ban list we arrive at for the first mode will be much shorter, so we're already looking at less than six months, even with the Q=5 that I suggested. Q could be higher; however, there is a worry that too much stuff would be banned at once, which is to say that, after removing the most broken pokemon, pokemon that had been nominated in the same round may not seem so broken anymore. The finite Q is designed to allow for that to be discovered.

Too expedited of a process is practically the same as an arbitrary initial ban list, as was pointed out by some posters in the Uncharted thread. This process is actually pretty fast and is designed to come to an end (with the relatively high ratio required to ban, and that ratio increasing eventually).

One other thing that hasnt been specified here, but I think should be the case, and may in fact be what you were intending, is that whatever testing is done should be done on the standard ladder. A separate suspect ladder is unnecessary..

I agree with this. Using only a single ladder has many advantages which I don't think I need to enumerate.
 
I like Cathy's process, but I think it could be expedited a bit further. Mainly, I'm concerned that something will go wrong and we'll have to backpedal and 6 months turns to 8 or 9...
 
Let's make is not "could be" but "will be." 6 months is still reasonable to have it practically finished (finished except for yache-chomp instances). I still disagree with limiting the numbre of pokemon banable (or chop-blockable) at any given step.
 
I wouldn't be fooling anyone by saying testing will "finish". While after a time it will basically stop changing, I doubt it will "finish" any time.

The main thing I want out of a process is a faster "initial" period and slower "normal" period. That's why I thought a 3 week first test, 4 week second test, 5 week third test, and then 6 weeks after that would be the most sensical, a "ramp up" if you will.
 
I don't really get these warnings about a "who knows how many years spiral". The main problem with Gen IV "suspect test" process that brought about the delays is that its function fundamentally depended on judgment calls made by two people, and the requirement for their input alone caused massive delays like the ones seen during Stage 3. By implementing a more mechanical process, this is completely removed. This means that, to preserve the mechanical nature of the process, the most important thing is to forge a process that we are as certain as possible of knowing will be agreeable regardless of what Gen V throws at us. This is much more important than pretending that we know how long a proposed process is going to take. Hell, by trying to guess at how long this will take, we're already contradicting the whole point of this thread by making assumptions on what the game will look like.

The Gen IV UU process simply did not have the major delay problems that Gen IV OU had. i.e. We have a working precedent for this. This is mainly why I really like Cathy's process. The one main point of contention seems to be the rating metrics, and on that note I would like to reiterate my opinion from the other thread that we should focus on weeding out the "bad", not just inviting in the "good". (As for "bad", I'm talking about noobs who love to lose all the time with bad Pokémon for no reason (i.e. not playing the real game), and similar people - not "weak" players but "not serious" players.) So we should definitely have a reasonable, not too obnoxious "lower requirement", not unlike the 1600/55 that's been used in Gen IV UU. (I'm aware that Poke Lab / whatever what we use as the Gen V simulator will be called will probably use a different rating system.) As Synre used to say all the time in past UU threads, it's not that difficult for a fairly active, realistic player to meet a requirement like that.
 
In attempting to determine any "fair" rating metric by which to judge players, I'd like to point out what I feel was one of the major lessons learned by the Gen IV suspect testing process, at least during those rounds in which I actually participated. Reaching a certain rating on a dedicated suspect ladder is much harder than reaching the same rating on the standard ladder because of the stronger caliber of players who typically play there.

Especially for someone who may not be "the best," but who has hit the required rating and still needs to put in a few battles during the last couple days of the test to get their deviation down, this can be a rather daunting obstacle to overcome. You either get a completely dead ladder full of people who refuse to battle for fear of wrecking their records and losing voting privileges, or you get repeated matches against the same one or two people, where winning and losing ceases to even be a proper measure of how well you know the suspect, but rather how well you and your opponent can counterteam each other.

If it has already been determined that we are going to only use the standard ladder for all tests and rating determinations, then I suppose that this potential issue has already been dealt with. Otherwise, I strongly suggest that the rating metric focus more on deviation (or some other measure of battles played, but not SEXP or any other hidden criteria which for all the rest of us know means pulling names out of a hat) than on rating or any other measure of battles won.
 
Well, we need to get this thread rolling, so I'll start with a doubt I - and possibly many non-PRs - have. As I can see in Cathy's and chris's processes, we will have to wait X weeks to nominate a certain pokémon for Uber or a certain clause to be implemented. Now let's say Arceus is, well, considered "ridiculous" on day one. We may say it's not enough time for a nomination/ban, but what if it only gets worse by the end of the first week? Are going to have to wait till Week 2/4/6 to even have a say on it? Or will it be possible for us to hold "urgent votings" in situations such as this one? I believe this is one of the problems the pro-banlist side has with ours, as it may seem the process may take "too long" in the beginning.


I'm not advocating all of Arceus's formes will be banned on day one but seriously guys, this was just an example.
 
Well, we need to get this thread rolling, so I'll start with a doubt I - and possibly many non-PRs - have. As I can see in Cathy's and chris's processes, we will have to wait X weeks to nominate a certain pokémon for Uber or a certain clause to be implemented. Now let's say Arceus is, well, considered "ridiculous" on day one. We may say it's not enough time for a nomination/ban, but what if it only gets worse by the end of the first week? Are going to have to wait till Week 2/4/6 to even have a say on it? Or will it be possible for us to hold "urgent votings" in situations such as this one? I believe this is one of the problems the pro-banlist side has with ours, as it may seem the process may take "too long" in the beginning.


I'm not advocating all of Arceus's formes will be banned on day one but seriously guys, this was just an example.
I think that waiting is good, arceus may seem ridiculous on day one only because people haven't learned how to play against it. waiting a few weeks gives more accurate results.
 
It's far, far better to wait a longer time initially (probably 6 months maximum going with Cathy's proposed process) before we arrive at a truly stable metagame that has 'staying power' than it is to wait a short time between each almost-stable metagame that is created by a constant cycle of banning and unbanning. In Gen IV, we had the latter situation because of the testing process we chose to use, and look how long it took to get to the stable OU tier we have now. Not only would a 'no bans' testing process reach the desired result - an effective and fair banlist - faster, but it would also be 'clean' in that it would eliminate Pokémon from the top down, practically eliminating the possibility that something previously deemed too strong would be reconsidered as a suspect after 'not-as-broken' stuff was banned. That is why I unequivocally support 'no bans' testing for all Pokémon in Gen V. I'm sure we can agree that clauses are another matter, but we have Firestorm's thread in which to discuss that issue.
 
Well, we need to get this thread rolling, so I'll start with a doubt I - and possibly many non-PRs - have. As I can see in Cathy's and chris's processes, we will have to wait X weeks to nominate a certain pokémon for Uber or a certain clause to be implemented. Now let's say Arceus is, well, considered "ridiculous" on day one. We may say it's not enough time for a nomination/ban, but what if it only gets worse by the end of the first week? Are going to have to wait till Week 2/4/6 to even have a say on it? Or will it be possible for us to hold "urgent votings" in situations such as this one? I believe this is one of the problems the pro-banlist side has with ours, as it may seem the process may take "too long" in the beginning.


I'm not advocating all of Arceus's formes will be banned on day one but seriously guys, this was just an example.
Absolutely not. Giving preferential treatment to anything makes the process pointless.
 
The problem I have (in a similar vein to what others have mentioned in this thread) with a fixed date system is the possibility of some pokemon X overcentralizing the metagame. If one pokemon is proving overwhelmingly broken during testing period Y, wouldn't people focus only on that pokemon, and hence only nominate it during the following evaluation/voting period? Even if there are other mons brought up, if one is centralizing the metagame far more than the rest the others might not get the support needed to get on the voting ballot.

Where I'm going with this is that we don't absolutely need a "quick ban" option, but it might help prevent us from going through more testing cycles than would be necessary.
 
I'd like everyone to chime in on this post if possible, as it's kinda fallen by the wayside and is fairly relelvant. Mainly the bottom part, though the top part is kinda important-ish too I guess.
 
Back
Top