Low sample size obviously makes randomness more likely making it seem like there's a trend when there isn't, so I thought I'd run some basic statistics on it.

The first was basic 95% confidence intervals of your hit rates for each picture you posted. With only 10 drops in your first unamped "run" you had a 40% hit rate, but the low sample size means the confidence interval was between 10 to 70%. Basically, if another run falls within a confidence level range, you can't call it different than the other. All of the other runs fall within that range, so you can't say any of the other runs with amps were actually different yet because those differences are likely just due to chance.

For a look at how future analysis might go with bigger sample sizes, I also plotted finds per drop for each run (two replications for each amp or unamped). It looks like the hit rates were actually pretty similar throughout aside from that unlucky second level 1 run. There's basically no real trend when you plot amp decay vs hit rate, but there's also not enough of a sample size to detect one if it existed either.

Just for the nuts and bolts, here's the statistical output for that trend (binomial regression).

Coefficients | Estimate | Pr(>|z|) |

Intercept | -0.7968 | 0.506 |

AmpDecay | -0.3557 | 0.870 |

The negative AmpDecay term means that if you just draw a line in the points, you can see a decrease of around 6% from unamped to level 3, but you can't call it a true decrease until that probability value (ranges from 0 to 1) is around the range of 0.1 or ideally 0.05 to rule out pure chance or randomness. Having 0.87 instead means there is basically no evidence supporting any kind of change over the amps or in simpler terms, you can't say the rate of change over amp size is different from zero yet.

Leeloo, if you get more updated numbers you'd want me to run like this, just let me know and I can run some updated stats to see if there's any actual trend. They way you did it works ok, but all I really need is the number of drops for each run (i.e., before switching to another amp) and your cumulative hit rate at the end of each run. If you can just get the hit rate for that individual run instead of all test runs to date, that might make the number checking a little easier on the off chance you forget to record the numbers for a particular run, which would then affect all the subsequent run calculations.

We've talked about alternating between "treatments" every drop in other tests to take out confounding effects of time (e.g., someone ATF's right before you switch to level 3 and it theoretically affects loot), but doing 10 or even up to 30 drops for each treatment actually isn't too bad in that regard. Basically, you don't want to be doing an unamped treatment one day and then wait for a level 1 run the next day. Ideally, you should have a similar amount of drops for each in a similar period of time in a similar location. In experimental design, we call this a blocking design where we'd make sure all the treatments are grouped together in similar conditions, and then go to another location to set up another "block". Once each treatment has 100 total drops, that might be a good point to check in again to see if there's any trend.