In a quandary about how am I supposed to calculate the OU list next October

X-Act · Sep 14, 2008

Yeah, you read the title right. This is because of many reasons:

1) No weighted statistics
2) Statistics now count every Pokemon in the team, not every Pokemon used.
3) Garchomp being banned in September
4) Can't put any weightings on July, August and September stats since they're the first stats being released in this new way.

Solutions:

For 1), there's nothing I can do except use the unweighted stats (duh!).

For 3) and 4), I can either use September's stats only, or use July, August and September stats and ignore Garchomp's stats for July and August.

For 2), this means that the 75% limit of usages calculation is gone. The stats now don't convey usages, but Pokemon put in teams during battles. This is why the number of Pokemon in the first 75% was so low for July and August... it wasn't providing the same number as before!

To this end, I suggest those Pokemon that are not rarely put in teams to be called OU now. Note that I didn't say 'frequently' but 'not rarely'. The truth is that few Pokemon appear frequently in OU. Last month, the only Pokemon that had a chance of appearing in one out of every 4 teams more often than not were Garchomp, Gengar, Gyarados, Metagross, Lucario, Deoxys-S, Blissey and Heatran - only 8 Pokemon. Compare this to Ubers where 6 Pokemon appear in one out of every TWO teams, which means that any random Uber battle will contain the top 6 Pokemon more often than not. This fully illustrates the point that OU is really not the Pokemon that show up frequently in teams, but the Pokemon that don't appear rarely in teams. We thus need to find a cut-off point for 'rare', not for 'frequent'.

Now this cut-off point can be as 'rare' as we want. Maybe a Pokemon is considered 'not rare' if, when looking at 15 random teams, it appears among them at least once more often than not. Or maybe the number of teams should be 20, or 10, or 16, or whatever. If I use '15', the number of Pokemon satisfying this criterion would be 40 for the Standard Ladder and 39 in the Suspect Ladder. Here is a chart noting the number of Pokemon satisfying this criterion for 10 to 25 teams in Standard and in Suspect of last August's stats:

Code:

Number of Teams T    Number of Pokemon that appear among T teams    Number of Pokemon that appear among T teams 
                    at least once more often than not in Standard  at least once more often than not in Suspect
        10                               28                                             27
        11                               33                                             30
        12                               34                                             32
        13                               37                                             35
        14                               38                                             38
        15                               40                                             39
        16                               42                                             41
        17                               44                                             41
        18                               45                                             42
        19                               47                                             45
        20                               49                                             46
        21                               51                                             47
        22                               51                                             48
        23                               52                                             49
        24                               54                                             50
        25                               55                                             52

Our choice of T depends on how many OU Pokemon we want, really. So, how many do we want?

Caelum · Sep 14, 2008

I actual wondered how you would get around this but figured you'd bring it up if you had a problem with someone.

For the Garchomp issue I think the best solution would be to ignore Garchomp's usage in July & August. Yes this is ignoring a major part of the metagame in those months (Garchomp) but using just September wouldn't seem accurate to me. Luckily this issue won't be around next time (hopefully :p).

Now, for the selection of T. I'm uncomfortable making a call but I can't think of another way to do it myself so I guess I'll just have to settle for it. I would find T=16 to be appropriate myself, I could go as low as T=14 and as high as T=19 and still be comfortable with the results.

Also, I'm just taking the time right now to thank you X-Act since you really do so much for this community and your post just shows that.

Hipmonlee · Sep 14, 2008

Hmm.. I guess the question is, what do we expect from our OU list. To me it is the pokemon that are used by the most people. Not the pokemon that are used in the most battles on ladder.

I mean, if I was intuitively making an OU tier, like we used to do, I wouldnt consider the fact that I fought one person 5 times and another person once to mean that the pokemon in the first persons teams are 5 times as likely to be used.

I am not sure whether this is what the OU tier ought to represent, but it is what I expect it to represent.

This of course isnt an answer to your question, but I think that perhaps our whole way of looking at OU is being distorted here.

It'd be interesting to see how much of an impact people who battle a lot as opposed to people who dont battle much have.

But, if that were the case, then I wouldnt think the number would need to be bigger than 10. Otherwise I am unsure.

Have a nice day.

X-Act · Sep 14, 2008

Hipmonlee said:
Hmm.. I guess the question is, what do we expect from our OU list. To me it is the pokemon that are used by the most people. Not the pokemon that are used in the most battles on ladder.

I mean, if I was intuitively making an OU tier, like we used to do, I wouldnt consider the fact that I fought one person 5 times and another person once to mean that the pokemon in the first persons teams are 5 times as likely to be used.

I am not sure whether this is what the OU tier ought to represent, but it is what I expect it to represent.

This of course isnt an answer to your question, but I think that perhaps our whole way of looking at OU is being distorted here.

It'd be interesting to see how much of an impact people who battle a lot as opposed to people who dont battle much have.

But, if that were the case, then I wouldnt think the number would need to be bigger than 10. Otherwise I am unsure.

Have a nice day.

This is an interesting point. What Hipmonlee is saying is that only the differently used teams used by everyone should be counted for OU. So, for example, if I use my team 200 times in one month and you use your team 5 times in the same time frame, the Pokemon used in both teams are to be counted once, not 200 times and 5 times respectively.

However, there's no way that we can extract this information from Doug's stats, or Colin's for that matter.

Although, now that I think about it, let's consider the following scenario. Suppose I have a team of 6 Pokemon and play 20 games with it, and realise that one of the 6 Pokemon is not doing well for me and decide to replace it. Then I play 50 games with this new team. I think it would be fair to say that 5 Pokemon were used 70 times, 1 Pokemon was used 50 times and 1 Pokemon was used 20 times, rather than say that 5 Pokemon were in 2 teams and 2 Pokemon were in 1 team.

X-Act · Sep 16, 2008

Any more input?

I've been briefly on mIRC lately and people there suggested T=20. Seemingly, the people there wanted around 50 OU Pokemon, which personally I find to be too high a number. But I've got no qualms about it if the majority want that number.

To be honest, the number of Pokemon in OU is only important so that we know from where to start BL/UU. There's nothing important about OU other than that. The important tiers are Uber and BL (and possibly a ban tier for NU).

Also, I need more suggestions about what am I to do with the July and August stats from the Standard ladder, since they include Garchomp. And does it make sense to use August's Suspect Ladder stats when they don't include Deoxys-S?

As you can see, I'm still very confused about what to do.

Hipmonlee · Sep 16, 2008

I think 20 is too big a number. I mean, you are looking at pokemon used in about 1 in 40 teams then? That isnt really over use.

[edit] - or rather it's you'll face one in every forty battles.

Have a nice day.

X-Act · Sep 17, 2008

No, the list of Pokemon with T=20 would signify that each Pokemon in the list will be in at least one in 20 teams. So roughly, each Pokemon will be in at least one in 10 battles more often than not, because each battle would contain 2 of these teams.

If we go by this, the algorithm would be quite simple:

1) Sum up all the usages. Call this S.
2) Calculate the cut-off point C = S x (1 - (0.5)^(1 / T)) / 6.
3) Those Pokemon whose usage is at least C are OU.

Of course, if we finalise the value of T, the value of C can be calculated significantly easier. For example, if T=20 is chosen, then C = S / 176.1407. The 'magic numbers' that need to be divided from S depending on the value of T are:

Code:

T=10:  89.5964
T=11:  98.2494
T=12: 106.9029
T=13: 115.5569
T=14: 124.2111
T=15: 132.8657
T=16: 141.5204
T=17: 150.1753
T=18: 158.8303
T=19: 167.4855
T=20: 176.1407
T=21: 184.7961
T=22: 193.4515
T=23: 202.1070
T=24: 210.7625
T=25: 219.4181

Also, I have requested Doug to give me statistics from the date Garchomp was banned on the Standard ladder onwards. I have decided that these will be the only statistics that are going to be used for October's OU list.

I'm aware that there's an endeavor to create weighted statistics. If Doug manages to issue them in the September stats, I'll use those instead of the unweighted ones.

Just as an aside, those Pokemon that have more than (S / 37.7113) usages are those that are used frequently. My definition of 'frequently' is T=4, that is, a Pokemon is used frequently if it is in one out of every 4 teams, or roughly, in one out of every 2 battles. To test this formula, here are the Pokemon that were used frequently in the Standard, Suspect, Underused and Uber ladders last August:

Standard: Garchomp, Gengar, Gyarados, Metagross, Lucario, Deoxys-S, Blissey, Heatran (8 Pokemon)

Suspect: Gengar, Heatran, Salamence, Lucario, Metagross, Gyarados, Tyranitar, Celebi (8 Pokemon)

Underused: Claydol, Steelix, Clefable, Rotom, Hitmontop, Ninetales, Aerodactyl (7 Pokemon)

Uber: Kyogre, Rayquaza, Groudon, Dialga, Palkia, Darkrai, Deoxys-A, Latias, Mewtwo, Blissey, Lugia, Giratina, Metagross (13 Pokemon)

I think that this can also be used as a crude measurement of centralisation. The more frequently used Pokemon are there, the more the metagame is centralised. But I don't know if this is true all the time.

X-Act · Sep 18, 2008

Yesterday I was briefly again on mIRC, and Stellar suggested this to me.

Why not look at the amount of OU Pokemon in RBY, GSC and ADV in relation with the total number of Pokemon there to decide how many Pokemon there should be in DP?

And so I did.

RBY has 151 Pokemon, of which 14 are OU: 9.3%.
GSC has 251 Pokemon, of which 27 are OU: 10.8%
ADV has 389 Pokemon, of which 35 are OU: 9%

The average number of OU Pokemon is 9.7% of all Pokemon.

Hence, DP should have about 9.7% of 498 Pokemon to be OU, which is 48.

This suggests that T = 20 is indeed what should be used.

So, to summarise:

1) DougJustDoug should provide me with stats for the Standard ladder from September 12th onwards, i.e. a day after Garchomp got banned. This list will be used to determine the new OU list.
2) The usages (weighted or not, depending on what Doug gives me) are summed up. Call this total S.
3) The cutoff point for "Used Rarely" is calculated by the formula R = S / 176.1407. The cutoff point for "Used Frequently" is calculated by the formula F = S / 37.7113.
4) The Pokemon having usage at least R make the OU list.
5) As a bonus, the number of Pokemon having usage at least F is a value that is directly proportional to the amount of centralisation in the metagame.

If nobody has anything to add, this is what I'll do next October.

obi · Sep 18, 2008

I don't know that % of total Pokemon in OU is really that important. If next generation GameFreak decides to add 200 pre-evos that learn no new moves, and they do not change any mechanics at all, that will have virtually no effect on OU battling, but if we decide to shoot for keeping the % of total Pokemon as OU (roughly), then we will need to make OU larger despite there being no real differences in the landscape.

Great Sage · Sep 18, 2008

We could modify it to a percentage of competitive Pokemon.

RBY: 16.87%
GSC: 19.29%
ADV: 15.77%
Average: 17.31%

It still translates to approximately 48 Pokemon, and avoids the problem Obi described.

We might also want to exclude Ubers from the pool of competitive Pokemon; here are the calculations for that.

RBY: 17.28%
GSC: 20%
ADV: 16.91%
Average: 18.06%

This translates to 46 Pokemon.

DougJustDoug · Sep 18, 2008

I plan to generate statistics from September 12th onwards. I also might generate stats for September 1st-11th, but I'm not sure. It depends on whether I get some reporting automation coded. If I don't automate the reports, and I have to compile the reports by hand, then I will only make the post-Garchomp reports. Reporting is just too time-consuming to do it twice.

As for weighted stats, I don't see it happening for September. There are a couple of problems -- for one, it would require some tricky changes to the gathering code. If I sit down to tackle it, I'm sure I can do it. But, I'm not very motivated to do it. Mainly because I am personally dubious of the value of weighted statistics. I'm not convinced that weighted statistics are inherently better than unweighted stats. I don't think supposed "good player" usages should be given additional emphasis for overall tiering. We have debated this issue on IRC, and I've heard solid arguments from all sides. Ultimately all the weighting schemes I have heard are too arbitrary to be indicative. Hipmonlee's suggestion is an interesting twist that I have not considered. But, until I'm given a convincing argument for a clearly superior weighting system, I'm not jumping with excitement about changing the stat gathering programs.

If I am truly in the minority in my opinion, then I'll set aside my distaste for weightings and I'll change the programs. I won't let my personal opinions stand in the way of the community getting statistics that are the consensus favorite. Since there is not a clear consensus to make a change, I am tempted to stick with the status quo.

obi · Sep 18, 2008

Great Sage said:
We could modify it to a percentage of competitive Pokemon.

RBY: 16.87%
GSC: 19.29%
ADV: 15.77%
Average: 17.31%

It still translates to approximately 48 Pokemon, and avoids the problem Obi described.

That doesn't avoid the problem. I just used NFE because they're generally not OU Pokemon. They could add 200 Dustox / Luvdisc quality Pokemon and it would be the same thing. Pokemon like that shouldn't change the size of OU.

X-Act · Sep 19, 2008

Okay, Obi makes a good point. I hope Gamefreak do not invent 200 Unown-like Pokemon in the next generation. They never did anyway, so I hope they don't.

I'll still go for T=20 anyway, since T=20 will in no way guarantee us having 45, or 46, or 48 Pokemon. It might have a number that's near to this, but not exactly this. The new metagame might have 60 non-rare Pokemon, or 30.

To Doug, that's perfectly okay. I don't pretend you to implement weighted stats for this month. I can use the unweighted stats perfectly. Maybe for October it would be dandy... although I think we need to agree on what weighted stats to release exactly first.

X-Act · Sep 20, 2008

Ah, so, to make this even more complicated, it seems like Shoddy has released the Platinum moves and stuff, and our server will soon (or has already) followed suit.

Expect the new OU list of October to be a bit strange.

Maybe it would be a better idea to wait for the October stats and release a new OU list in November?

Or maybe we should release the new OU list for October, followed immediately by a more stable list in November. What do you guys think?

Great Sage · Sep 20, 2008

I'd choose to release an OU list for October and a stable one in November. Though the October one will be kinda screwy, I don't think it will be too ridiculous other than undercounting the new forms, and it's best to have something to work from.

X-Act · Sep 22, 2008

Okay, I'll release an OU list in October, followed by another OU list in November.

The next OU list would then be released normally in January 2009.

X-Act · Oct 6, 2008

While I'm waiting for Doug to release the September stats, I need to clarify something.

Usually, we get stats of the previous three months and are combined together. This month is going to be an exception, since, as mentioned previously, only the stats from September 12th onwards are going to be used. I thought a bit about how the stats are to be combined.

As I said in the previous OU thread, for three months, they were weighted as follows, after seeing which percentages fit best to predict what Pokemon would be used in subsequent months:

Last month: 83.5%
Month before that: 12.6%
Month before that: 3.9%

This is roughly on a ratio of 20:3:1, so that's the ratio I'm going to use when we get three month statistics.

If, for some reason, we only get statistics for the previous two months, the ratio 20:4, or 5:1, will be used.

EDIT: As an aside, the reason Doug hasn't revealed the statistics yet is that he's coding the statistics generation, instead of doing it by hand like he usually does.

In a quandary about how am I supposed to calculate the OU list next October

X-Act

np: Biffy Clyro - Shock Shock

Caelum

qibz official stalker

Hipmonlee

Have a nice day

X-Act

np: Biffy Clyro - Shock Shock

X-Act

np: Biffy Clyro - Shock Shock

Hipmonlee

Have a nice day

X-Act

np: Biffy Clyro - Shock Shock

X-Act

np: Biffy Clyro - Shock Shock

obi

formerly david stone

Great Sage

Banned deucer.

DougJustDoug

Knows the great enthusiasms

obi

formerly david stone

X-Act

np: Biffy Clyro - Shock Shock

X-Act

np: Biffy Clyro - Shock Shock

Great Sage

Banned deucer.

X-Act

np: Biffy Clyro - Shock Shock

X-Act

np: Biffy Clyro - Shock Shock