To get us started off on the right foot, I'm going to work with a nice, small subset of the 174k battles that vbaddict has supplied us as a seed.
Why Province? In all honesty, the fact that it is tier limited makes it a very attractive way to keep the data set small(ish), as well as only being availible as standard mode. Being limited to tier 3 or less tanks (tier 2 artillery) means that we are only looking at 53 different tanks that can possibly be in the battle. That means I can set up my spreadsheets before my morning coffee runs out.
Of the 174k battles the vbaddict supplied, 2950 of the took place on Province. I am going to simplify things and assume that vbaddict doesn't capture a large portion of the total battles that take place during the day, thus treating the total battle population as infinite relative to our sample here.
The straight forward and simple:
Side 1 Win Rate: 54.7% with a standard error of 0.9%
Side 2 Win Rate: 41.8% with a standard error of 0.9%
Draw Rate: 3.5% with a standard error of 0.3%
Really, you could have found that on any WoT Statistics site. It's widely known that Province is unbalanced in favor of side 1, anyone who has spent significant time at tier 3 or lower is aware of this. But we're not here to look at things as simple as map biases, we're looking for answers.
One possible explanation for the map bias is issue with the matchmaker when dealing with the tier limitation. Unfortunately if you look at the distribution of tank tiers and types across the two teams, this theory falls rather flat on its face.
Next, since we have the data on the types and tiers of tanks, let's take a look at how they effect your chances of winning. This is going to be a logistic regression on all tiers and types of tanks to see if the proportion of any of them is a significant predictor of success on the battlefield. Note that because of the fact that that we can determine certain variables from the others (ie Tier 1 from Tier 2 and Tier 3 tanks)
The first thing to realize is that we are looking at the data from the point of view of side 1. I set up our predictors as net values, ie the number of artillery on side 1 minus the number of artillery on side 2.
If we look at artillery and tank destroyers, leaving other tanks out to avoid over-determination, there is a slight positive effect from tank destroyers and a bigger negative effect from artillery. So each artillery that you have more than the other team decreases your teams chances by about 10%. Similarly, TDs have a positive effect on win chance, ~1.1%.
The 'Intercept' value tells us what we already know, the base chance of side 1 winning is about 56.7%. As outcomes for the regression can only be absolute win or loss, this is actually the chance of side one winning with out draws as an option. Comparing this to the earlier values, it would seem that introducing draws as an option makes it look like most draws come from Side 2's win chance. This may lead one to a theory that most of the draws on province are from side 2 teams that can't entirely capitalize on a weak side 1.
Now all this sounds good, but there is one more column to take note of: P(>|Z|), also known as the P value. In general, for a result to be statistically significant, you want a P-value of