Date of slack thread: 6/17/24
Anonymous: Hello, team! Last time your help was extremely handy and helped us to fix the issues with events pipeline! Today I have another problem. During the analysis of the latest experiments (for example - this one) it turned out that there is a big difference in the potential ābotsā activity: namely by bots we consider a case of having a big difference between groups for one metric (on statsig stable id level) grouped by events or users, for example: if the number of users is the same, but group A has significantly more events it might be interpreted as the higher percentage of bots in group A. Saying that Iām trying to describe a case when we have step1 metric grouped by events without any difference between the groups, but the same metric grouped by users shows a statistically significant difference, what gives us an idea that bots might be distributed unevenly across control and test groups, Iām checking more experiments and it seems like it happens from time to time and what is important every time it shows the bigger number of bots in test groups, might it mean that despite random splitting bots traffic has differences, may be bots can bypass statsig redirects and target specific chosen page? Ready to provide more details, seems like I canāt fully resolve it on my own.
Lin Jia (Statsig): Hi <@U071K3M1MEV>, the symptom you mentioned can be an indicator for bot activity - though its hard to fully diagnose unless we are fairly certain which users are bot or power user, which is very hard to do. Also if the ādiscrepancyā between event count and event user are existent for many metrics, itās a stronger sign that it related to bot. We are are actively scoping the work (cc <@U0727PC0VM0> who has more context here). Would love to know your use case!
Anonymous: Hello, <@U06U7NEA7HC>, thanks for your quick response! The automatic answer Iāve got above says: if bots are able to bypass certain mechanisms and directly access the test page, or if there is some other form of systematic bias in how users are being allocated to groups. Can you touch the topic a little bit, are you aware of some possible scenarios for bots to bypass statsig splitting mechanisms or common cases in the platform usage where customers report bots reach the specific URL invalidating the AB procedure?
Lin Jia (Statsig): To clarify, we are not doing any filtering for bot today. Itās possible that a bot passes a gate and our system will treat them as a user. Usually bots are distributed similarly for treatment vs control, but there is a chance that one group ends up having more bots. We have planned work to do more with bot, e.g. filtering known bots from the traffic. Makris will be able to speak more to this.