+1(514) 937-9445 or Toll-free (Canada & US) +1 (888) 947-9445

Statistical estimation of EE enrty, what score @ when

carl321

Hero Member
Feb 23, 2015
400
123
Job Offer........
Pre-Assessed..
Re: Statistical estimation of EE enrty, what score @ when

shasha1111 said:
I am trying to post a google-doc l-----ink.....but not allowed to...
@ docs.google.com/spreadsheets/d/1o2LU_lsYEykl1UMOkiZF78ds0Q1XTJS1TdfUh8qaA8M/edit?usp=sharing @

The above is its google doc l---ink.
please leave comments so that I can perfect the model and make it close to real.
Data sources are that google spread sheet online[you all know that :)].
I have not built a good enough model yet, considering in my model, the # of ppl in the pool will be 0 after a certain time period. :) But I can always adjust it, based on your suggestions!
Thanks
Have fun...

Good effort !

But how do you say its normal distribution ? or any distribution as we don't have enough data to analyze..

You maybe able to use Weibull; but you will at least need 30 samples.. so may be for 2017 you can do a great assumption ?
 

atmtaatmta

Star Member
Jan 31, 2015
90
7
Re: Statistical estimation of EE enrty, what score @ when

I have a better proposition. Once I promised to create a modelling software here: http://www.canadavisa.com/canada-immigration-discussion-board/gathering-statistics-t269968.0.html
Today I had some spare time to develop it. You can download it here: http://www.mediafire.com/download/jdq7b2h17r0dfl3/EEModel.exe
It doesn't contain any viruses, you can scan it with virustotal or anything else you like.

I still haven't found correct constants, so the predictions are not very good.
Maybe the distributions should be changed.

1. For now all of the distributions are considered to be constant. It is likely to change in future.
2. You can change the visual options during simulations, but if you want to chage statistical ones, you will have to restart the simulation.
3. Draws take place every 1-3 weeks randomly, you can't change that.
4. Target ITAs for all draws (except predetermined ones) are 2500.
5. You need .NET Framework 4.5 installed to run the program.
 

shasha1111

Full Member
Mar 2, 2015
41
3
Re: Statistical estimation of EE enrty, what score @ when

Interesting!

  • Quebec does its own thing so EE won't select any provincial nominees for them.
  • Manitoba will be issuing 5,000 PNPs but "Manitoba has been allocated a maximum of 500 additional nominations to directly select applicants for Manitoba from Express Entry."
  • SINP (Saskatchewan) plans to take at least 9,550 in their program but only 1,000 through EE
    • Ontario NP plans to take 5,200 for 2015 but we don't know how many through EE because the program details haven't been announced
    • Alberta NP can't be used for Express Entry
    • PEI - Numbers couldn't be found
    • Newf & Lab - NP Numbers haven't been announced but they took 579 in 2013
    • New Brunswick - numbers couldn't be found
    • BC has 5,500 places but how many will go through EE is unclear
    • Nova Scotia - 350 through EE
    • Yukon - Couldn't find figures on their NP
    • NW Territories NP - 100 through EE
    [/quote]


    Great to know!
 

shasha1111

Full Member
Mar 2, 2015
41
3
Re: Statistical estimation of EE enrty, what score @ when

Usually for population, we do norm distribution

carl321 said:
Good effort !

But how do you say its normal distribution ? or any distribution as we don't have enough data to analyze..

You maybe able to use Weibull; but you will at least need 30 samples.. so may be for 2017 you can do a great assumption ?
 

tymix

Star Member
Feb 23, 2015
54
16
Re: Statistical estimation of EE enrty, what score @ when

Instead of developing a statistical model that makes no sense, why don't you guys find a way to hack into the CIC's database and get all the figures out? LOL
 

shasha1111

Full Member
Mar 2, 2015
41
3
Re: Statistical estimation of EE enrty, what score @ when

tymix said:
Instead of developing a statistical model that makes no sense, why don't you guys find a way to hack into the CIC's database and get all the figures out? LOL
first of all, making statistic tool is interesting;
next, hacking is illegal;
last, it is my way of laughing at the CIC EE system. I have never seen such a stupid system ever. If Canada only needs IT people, they can continue using this system.
 

the_lion

Hero Member
May 24, 2012
394
29
Job Offer........
Pre-Assessed..
Re: Statistical estimation of EE enrty, what score @ when

Lolzzz. Already they are taking eternity amount of time to make draws. If they find out that their database is compromised. They take a he'll lot of time. Lolzzz...
 

shasha1111

Full Member
Mar 2, 2015
41
3
Re: Statistical estimation of EE enrty, what score @ when

I have a better proposition. Once I promised to create a modelling software here:
Today I had some spare time to develop it. You can download it here:
It doesn't contain any viruses, you can scan it with virustotal or anything else you like.

I still haven't found correct constants, so the predictions are not very good.
Maybe the distributions should be changed.

1. For now all of the distributions are considered to be constant. It is likely to change in future.
2. You can change the visual options during simulations, but if you want to chage statistical ones, you will have to restart the simulation.
3. Draws take place every 1-3 weeks randomly, you can't change that.
4. Target ITAs for all draws (except predetermined ones) are 2500.
5. You need .NET Framework 4.5 installed to run the program.

[/quote]

I am not sure about your first assumption. I think all new incomers follow the same distribution, which is over all a global distribution.
2.what you mean by visual options?
3. I am not try to change that. I did not specify the date, but using # of draws.
4. should that be around 2000-2100?
 

atmtaatmta

Star Member
Jan 31, 2015
90
7
Re: Statistical estimation of EE enrty, what score @ when

shasha1111 said:
1. I am not sure about your first assumption. I think all new incomers follow the same distribution, which is over all a global distribution.
2.what you mean by visual options?
3. I am not try to change that. I did not specify the date, but using # of draws.
4. should that be around 2000-2100?
It is all about program I wrote and posted link to.
Here's it again: http://www.mediafire.com/download/jdq7b2h17r0dfl3/EEModel.exe
1. I, on the contrary, am sure, that score distribution changes overtime. Albeit, I am not sure, that this changes a big enough to screw the model.
What's larger, is that the number of applicants, that submitted their profiles in january differs from february number. And we really have no way to determine, how many are there applicants in the pool and how fast this pool grows.
When I first submitted my profile in the beginning of january. i thought, that i would see, how many applicants are in the pool and how many of them have bigger score, than me.
Surprisingly, CIC doesn't share this info with us. I don't know why, since it cannot harm anyone in any way.
Back to the point, since we don't know the growth rate of the pool, I programmed it to be variable but constant. Looking at the result of my modelling, I suspect, that it is not the case.
2. You should download it, run and see for yourself.
3. You clearly misunderstood me.
4. I don't know.

I haven't really stated the proposition, I mentioned in my previous post. Here it comes.
I think, we should join our forces instead of doing separate models.
I wrote this program after you created your model because I believe, that it has better potential for enhancing. It has nice visual interface and anyone can play with distribution parameters, thus bringing it closer to reality. We need to use the community help, it will be more efficient, than try to model the whole system alone.
So I think, we should better work on my program, make it more flexible, transform constants into variables and give it to community, so people can change numbers and see the values, with which it models the actual draws and scores best. Thus we will be able to predict results better and everyone will be happy.

I can upload the source code to github, if you want.
 

atmtaatmta

Star Member
Jan 31, 2015
90
7
Re: Statistical estimation of EE enrty, what score @ when

I'd like to sum up what we have.
There are different variables and we must calculate them somehow.
What we don't know by know:

1. Number of people added to the pool each day (or week, whatever).
We can assume, that it is a random variable with Poisson distribution.
We don't know λ and in fact it surely is λ(t), where t is date/timestamp.
There were huge amount of those, who were waiting for EE to open and submitted their profiles in the first days of January.

2. Score distribution.
We have a clue — spreadsheet.
We also can aassume, that it is mixture of normals. 0-600 candidates a normally distributed and 600-1200 candidates are normally distributed.
Using statistics sofware i have calculated, that 0-600 = N(386, 53.7) and 600-1200 = N(921, 184).
But after some simulations i decided, that actualy 600-1200 = N(825, 135) or close to this.
That means either the distribution is not normal, or the spreadsheet data is too small and doesn't reflect real distribution.
Nothing can be said about 0-600, becuse there have been no draws <600 up to now.

Also, there is a chance, that January score distribution differs from February one. It will be really hard to calculate the parameters, so I think, we should assume, that it is constant.
If it changes during time severly, we won't be able to predict anything.

The thought just struck me: while parts for education/age/language are distributed normally, we also have bonus point for having both good language and good education and good language and big work exp.
That means, that people having high score recieve more of this bonuses, having their points boosted even more.
So it is really not normal. Maybe log-normal would suffice. Need to investigate further.
 

atmtaatmta

Star Member
Jan 31, 2015
90
7
Re: Statistical estimation of EE enrty, what score @ when

Ok, I'm starting to feel ronery.

0-600 candidates fit into log-normal distribution much better, than in normal.
I tried to analyze score parts (main factors, spouse points, transferability points), but they fit different distributions much worse.
So we should stick to log-normal distribution for overall 0-600 score.
That is lnN(5.9472538, 0.021085201646996)
 

shasha1111

Full Member
Mar 2, 2015
41
3
atmtaatmta said:
Ok, I'm starting to feel ronery.

0-600 candidates fit into log-normal distribution much better, than in normal.
I tried to analyze score parts (main factors, spouse points, transferability points), but they fit different distributions much worse.
So we should stick to log-normal distribution for overall 0-600 score.
That is lnN(5.9472538, 0.021085201646996)
log-normal sounds very exciting!
I do not think segment score into different part at this moment is a good idea. That will only increase the complexity and won't help us get the initial goal.
Poisson is a good choice. However, for simplification, I made increase rate steady. Also, if you try to stand at the some future point and look back, there should be a steady increase in population with a steady number of ppl selected. so the pool won't dry. Therefore, I am thinking that new income rate should be very close to selection rate, also similar to generation alternation rate in Canada. Assume, they won't change this system and use it forever.
 

munnakha05

Star Member
Aug 18, 2014
55
0
Re: Statistical estimation of EE enrty, what score @ when

shasha1111 said:
I updated the modeling method. Now I simulate this problem with Monte Carlo. Link is still the same.
Comments on variables are welcomed!


[size=10pt]Current prediction is once the score is below 600, it takes about 10 months to go under 400. Good luck, guys~[/size]

I am trying to post a google-doc l-----ink.....but not allowed to...
@ docs.google.com/spreadsheets/d/1o2LU_lsYEykl1UMOkiZF78ds0Q1XTJS1TdfUh8qaA8M/edit?usp=sharing @

The above is its google doc l---ink.
please leave comments so that I can perfect the model and make it close to real.
Data sources are that google spread sheet online[you all know that :)].
I have not built a good enough model yet, considering in my model, the # of ppl in the pool will be 0 after a certain time period. :) But I can always adjust it, based on your suggestions!
Thanks
Have fun...
Nice attempt. Could you please simplify for us with Number of draw and amount of ITA with score breakdown? I know it is just an assumption. Still it will make some sense. Thanks.