Apologies for the delayed update. Been having stonking good summer weather here, so been enjoying a bit of R&R...cheers!
It was supposed to get cold and turn miserable, but alas working at my desk is still very slippery, thus it hasn't yet....the swimmable water calls. Regardless, I poked around tonight shortly...
@WurstEver yeah, there are some tricky spots. Don't see an easy way around these (but then haven't given it any thought yet). Easy to have hindsight when the answer is there (i.e. as in the test values with known dates), but when one doesn't know the answer (i.e. when we have thrown every known sample into a model, and new queries are unknown), how to chose the correct method? Regarding the test you ran using polynomials : did you remove the duplicate data points from your model before you tested these? I ask as
- I am sure you have a lot of these points in your model (i.e. I am sure the same kind soul who sent them to me also sent them to you)
- If they are in the model, the exact dates should be returned by it. I have explicitly not incorporated these datapoints in the model yet...
That brings me to the bothersome SN : 24530000 I mentioned in my last post. I figured out the problem. Its a tricky spot to do with incomplete data on two fronts.
Firstly, in my model I have the following two samples :
A) 24538069, 1967/5/26 (this corresponds to an image I found where everything is visible)
B) 2453XXXX, 1967/8/7 (this corresponds to an image I found where the last 4 digits are not visible)
And then in the test data I have been given (thanks once again to the donor
馃憤)
C) 24530XXX, 1966/12/27 (this corresponds to an image I was PM'd, where the last 3 digits are not visible).
Clearly B and C, when replacing 'X's with '0's, become the same SN, but at two different points in time. This becomes a non-linear problem with two correct answers. And we cant have that as most of us don't have time machines yet (I still beleive some of the honourable Speedmaster hoarders on this site do
馃槜).
The answer given was : >>> 24530000 = 1967 / 5.0, which is acceptable as one of the answers. But clearly B should be removed, as the 4th unknown significant digit can make all the difference...and I suspect that is where the problem lies...
Complete data is king!
Anyhoo. Then I also had a few more ideas which I quickly implemented.
1) How to improve the Hybrid method some (i.e. bring it closer to the stats method) -> my previous results were not expected.
2) Use Xth order polyfit in the local neighbourhood as opposed to over the entire model (3rd order gave best results, I tried a bunch).
Then I thought, seeing as there are a bunch of methods all running concurrently, lets add the original Xth order polyfit over the entire model just for comparison (again, 3rd order seems to give best results, I tried a bunch).
Here are my the findings :
**** Linear ****
>>> 26551800 = 1968 / 10.0 ( 1968/10/30 ) Error : 0.06620261728 months
>>> 25443000 = 1967 / 11.0 ( 1967/11/8 ) Error : -0.763252391541 months
>>> 32193764 = 1971 / 9.0 ( 1971/10/8 ) Error : -0.114064046471 months
>>> 32830000 = 1973 / 5.0 ( 1973/12/1 ) Error : 5.7660295521 months
>>> 39925152 = 1977 / 2.0 ( 1977/9/1 ) Error : 6.21523581682 months
>>> 31622000 = 1971 / 7.0 ( 1971/8/16 ) Error : 0.680136240947 months
>>> 30598000 = 1970 / 10.0 ( 1970/12/16 ) Error : 1.83898878765 months
>>> 22824000 = 1965 / 1.0 ( 1965/12/20 ) Error : 10.9126634154 months
>>> 24530000 = 1967 / 5.0 ( 1966/12/27 ) Error : -4.82805783247 months
>>> 26079000 = 1968 / 5.0 ( 1968/7/2 ) Error : 0.841275708722 months
>>> 19832000 = 1963 / 7.0 ( 1963/6/5 ) Error : -2.3274449314 months
>>> 26077000 = 1968 / 5.0 ( 1968/6/7 ) Error : 0.107990143636 months
>>> 29636000 = 1970 / 5.0 ( 1970/6/8 ) Error : 0.223321579461 months
>>> 29112000 = 1970 / 4.0 ( 1970/3/12 ) Error : -1.29475342794 months
Accumulated Error : 35.9794164919 months ( Av. 2.56995832085 month per lookup)
**** Approach 1 (Stats) ****
>>> 26551800 = 1968 / 10.0 ( 1968/10/30 ) Error : [-0.08553981] months
>>> 25443000 = 1967 / 10.0 ( 1967/11/8 ) Error : [ 0.09247382] months
>>> 32193764 = 1971 / 10.0 ( 1971/10/8 ) Error : [-0.5194193] months
>>> 32830000 = 1973 / 7.0 ( 1973/12/1 ) Error : [ 3.64766273] months
>>> 39925152 = 1977 / 6.0 ( 1977/9/1 ) Error : [ 1.7125164] months
>>> 31622000 = 1971 / 7.0 ( 1971/8/16 ) Error : [ 0.70175474] months
>>> 30598000 = 1970 / 10.0 ( 1970/12/16 ) Error : [ 1.94408995] months
>>> 22824000 = 1965 / 7.0 ( 1965/12/20 ) Error : [ 4.56719176] months
>>> 24530000 = 1967 / 5.0 ( 1966/12/27 ) Error : [-5.17238503] months
>>> 26079000 = 1968 / 5.0 ( 1968/7/2 ) Error : [ 0.92155251] months
>>> 19832000 = 1963 / 10.0 ( 1963/6/5 ) Error : [-5.10445993] months
>>> 26077000 = 1968 / 5.0 ( 1968/6/7 ) Error : [ 0.10585481] months
>>> 29636000 = 1970 / 5.0 ( 1970/6/8 ) Error : [ 0.2761651] months
>>> 29112000 = 1970 / 4.0 ( 1970/3/12 ) Error : [-1.28906345] months
Accumulated Error : [ 26.14012935] months ( Av. [ 1.8671521] month per lookup)
**** Hybrid Approach 2 ****
>>> 26551800 = 1968 / 10.0 ( 1968/10/30 ) Error : [-0.08553939] months
>>> 25443000 = 1967 / 10.0 ( 1967/11/8 ) Error : [ 0.09247282] months
>>> 32193764 = 1971 / 10.0 ( 1971/10/8 ) Error : [-0.5194193] months
>>> 32830000 = 1973 / 5.0 ( 1973/12/1 ) Error : 5.7660295521 months
>>> 39925152 = 1977 / 2.0 ( 1977/9/1 ) Error : 6.21523581682 months
>>> 31622000 = 1971 / 7.0 ( 1971/8/16 ) Error : [ 0.70175474] months
>>> 30598000 = 1970 / 10.0 ( 1970/12/16 ) Error : [ 1.94408995] months
>>> 22824000 = 1965 / 7.0 ( 1965/12/20 ) Error : [ 4.56739464] months
>>> 24530000 = 1967 / 5.0 ( 1966/12/27 ) Error : [-5.17238503] months
>>> 26079000 = 1968 / 5.0 ( 1968/7/2 ) Error : [ 0.92155251] months
>>> 19832000 = 1963 / 7.0 ( 1963/6/5 ) Error : -2.3274449314 months
>>> 26077000 = 1968 / 5.0 ( 1968/6/7 ) Error : [ 0.10585481] months
>>> 29636000 = 1970 / 5.0 ( 1970/6/8 ) Error : [ 0.2761651] months
>>> 29112000 = 1970 / 4.0 ( 1970/3/12 ) Error : [-1.28906345] months
Accumulated Error : [ 29.98440204] months ( Av. [ 2.141743] month per lookup)
**** PolyfitWindow ****
>>> 26551800 = 1969 / 2.0 ( 1968/10/30 ) Error : -4.0782567508 months
>>> 25443000 = 1968 / 3.0 ( 1967/11/8 ) Error : -4.22725554581 months
>>> 32193764 = 1972 / 0.0 ( 1971/10/8 ) Error : -3.17961077267 months
>>> 32830000 = 1973 / 6.0 ( 1973/12/1 ) Error : 4.69083193818 months
>>> 39925152 = 1976 / 10.0 ( 1977/9/1 ) Error : 10.4778567528 months
>>> 31622000 = 1971 / 6.0 ( 1971/8/16 ) Error : 1.78600928536 months
>>> 30598000 = 1970 / 9.0 ( 1970/12/16 ) Error : 2.62364222994 months
>>> 22824000 = 1965 / 7.0 ( 1965/12/20 ) Error : 4.22379842438 months
>>> 24530000 = 1967 / 5.0 ( 1966/12/27 ) Error : -5.40543572407 months
>>> 26079000 = 1968 / 4.0 ( 1968/7/2 ) Error : 2.07641244964 months
>>> 19832000 = 1963 / 8.0 ( 1963/6/5 ) Error : -2.68052909146 months
>>> 26077000 = 1968 / 5.0 ( 1968/6/7 ) Error : 0.379679403857 months
>>> 29636000 = 1970 / 5.0 ( 1970/6/8 ) Error : 0.288988346695 months
>>> 29112000 = 1970 / 4.0 ( 1970/3/12 ) Error : -1.30260171252 months
Accumulated Error : 47.4209084282 months ( Av. 3.38720774487 month per lookup)
**** Polyfit ****
>>> 26551800 = 1968 / 9.0 ( 1968/10/30 ) Error : 0.782834304973 months
>>> 25443000 = 1968 / 1.0 ( 1967/11/8 ) Error : -2.66447946796 months
>>> 32193764 = 1971 / 10.0 ( 1971/10/8 ) Error : -0.638409514793 months
>>> 32830000 = 1972 / 2.0 ( 1973/12/1 ) Error : 21.0952853108 months
>>> 39925152 = 1976 / 11.0 ( 1977/9/1 ) Error : 8.86793282974 months
>>> 31622000 = 1971 / 6.0 ( 1971/8/16 ) Error : 1.23112233592 months
>>> 30598000 = 1970 / 12.0 ( 1970/12/16 ) Error : -0.343219010597 months
>>> 22824000 = 1966 / 2.0 ( 1965/12/20 ) Error : -2.7133030685 months
>>> 24530000 = 1967 / 6.0 ( 1966/12/27 ) Error : -5.71692287597 months
>>> 26079000 = 1968 / 6.0 ( 1968/7/2 ) Error : 0.297263851107 months
>>> 19832000 = 1963 / 6.0 ( 1963/6/5 ) Error : -0.375203935805 months
>>> 26077000 = 1968 / 6.0 ( 1968/6/7 ) Error : -0.509791556429 months
>>> 29636000 = 1970 / 6.0 ( 1970/6/8 ) Error : -0.575092675453 months
>>> 29112000 = 1970 / 3.0 ( 1970/3/12 ) Error : -0.131612033496 months
Accumulated Error : 45.9424727715 months ( Av. 3.28160519797 month per lookup)
As can be seen, the Statistical and Hybrid Approaches both improved, and converge...
...IS NICE!
Now time to try some data from the 80's (I presently have very few samples here)...no doubt some more careful ponderings will be required. More soon...
PS:
@WurstEver thanks for making your data available. I havent grabbed it yet ... I will look at using the differences between our datasets (yours should have 1.5x better resolution than mine) to further test after 80's analysis...