Statsd supports anomaly detection for max duration, which is
conceptually like asking for an anomaly on the current duration.
There was a bug previously in the logic for max duration since we
recorded a duration every time a dimension went into pause. Now, we
only record a max duration when the dimension goes into stop.
When a dimension inside the MaxDurationTracker leaves the started
state, we need to check if we still need to keep an anomaly alarm
set. It's possible that we just immediately set an alarm with the
same timestamp as what was just deleted.
Test: Added some unit-tests and tested on marlin-eng.
Change-Id: I09c82cd266b4cc6e1c893ad166e602bf08baca0e
In ag/3580546 we now round up when setting anomaly alarms. This means
the test needs to be updated by 1s in one place.
Test: make statsd_test && adb sync data && adb shell data/nativetest64/statsd_test/statsd_test
Change-Id: Ibd3548833dcc2c1b9ca3a1da2c183bb771799775
The key change is to revamp how we parse/store/match a log event, especially how we match repeated
field and attribution nodes, and how we construct dimensions and compare them.
+ We use a integer to encode the field of a log element. And also encode the FieldMatcher into an
integer and a bit mask. The log matching becomes 2 integer operations.
+ Dimension is stored as encoded field and value pair. Checking if 2 dimensions are equal is then
becoming checking if the underlying integers are equal. The integers are stored contiguously
in memory, so it's much faster than previous tree structure.
Start review from FieldValue.h
Test: statsd_test + new unit tests
Bug: 72659059
Change-Id: Iec8daeacdd3f39ab297c10ab9cd7b710a9c42e86
Statsd will create partial buckets in all metrics producers when an
app is upgraded so that we can separate metrics between different
versions of an app. By looking at the uid map changes, we can tell
which app versions belong to a bucket; for metrics that are not
affected by an app version, we can instead join the buckets together.
To simplify the logic, the ends of the full buckets are always
aligned to when the metric producers were created. These boundaries
are computed on the fly by using the bucket number and the metric
producers' start times.
We keep the anomaly trackers to only be given full buckets; we buffer
the partial buckets within each metric producer.
Duration metric's MAX_SPARSE is fixed to be implemented as such. In
addition, after further discussion, we find anomaly detection on
MAX_SPARSE to be unnecessary, so this functionality is removed.
Test: Unit-tests added and modified, passed on marlin-eng.
Change-Id: I5ff7a9c7f05c406e9faf400c6a39162970ded102
Test: added e2e tests for count/duration metrics sliced by fields in condition and with/without links.
Change-Id: Ie34deba68e6780abdde458be3f0ce5284e76a1a2
1/ Avoid unnecessary field/dimension proto construction.
2/ use unordered_map for slicing.
3/ Use dimension fields to compare dimension keys.
Test: all statsd tests passed.
Change-Id: I2f74f78589b7f6ecd0803a2ead822b8d0399f334
Anomalies are now detected and declared per dimension. This means that
declareAnomaly now gets access to the key that is responsible for the
anomaly. Moreover, the refractory period is per dimension, not overall.
So a second anomaly for the same metric but a different dimension can
fire within the first dimension's refractory period. Thus, if app A
misbehaves and app B misbehaves shortly thereafter, they will both be
detected.
Eventually this key will be passed to the subscribers, although this cl
doesn't do anything with it.
Test: adb shell data/nativetest64/statsd_test/statsd_test
Change-Id: Id76856dc44fe9ecf91ac81a423e84f97c81d30ab
+ Don't need to keep the condition key if the condition is not sliced.
+ Whiten the hash before returning.
Test: statsd_test
Change-Id: Ie0bcb14b3adc4a15b79ba3be050431a672a68c28
2/ rename number_of_buckets as num_buckets
3/ use double for the Alert's threshold
Test: statsd unit tests passed.
Change-Id: Id1f55f14d3712eddee561681e3cd77343f086c7a
2/ Handle Subscription for alert.
3/ Support no_report_metric
Bug: 69522276
Test: all statsd unit tests passed.
Change-Id: I851b235f2d149b8602b0cad632d5bf541962f40a
Splits out DurationAnomalyTracker-specific functions into their own
subclass.
Test: the unit tests and CTS tests
Change-Id: Id6eb74d232b4a9c3a932d805d1ba3f0ba43a88b1
+ Add a real HashableDimensionKey as a wrapper of the dimension.
So we can get rid of the maps that we kept.
Pay down technical debt and reduce memory usage.
Test: statsd_test & manual
Change-Id: I233280cf1e2ce93da6a8cd4e8514abb066f4016d
bug fix: GaugeMetricProducer now works better with pulled events.
unit test also includes GaugeMetricProducer_test
Test: unit test
Change-Id: Ic60f09342d14cfb107be2130d445b323a56909e0
Add anomly detection to Value metric in statsd.
Test: make statsd_test && adb sync data && adb shell data/nativetest64/statsd_test/statsd_test
Test: run cts-dev -m CtsStatsdHostTestCases -t android.cts.statsd.HostAtomTests#testValueAnomalyDetection
Fixes: 70240042
Change-Id: I05cf36495cdfd0ac7aa1a922f0e253a60fda1787
StatsdStats now tracks the number of times an anomaly is detected (per
config, per alert name).
Also adds a configKey to AnomalyTracker, which is needed, not only for
statsdstats, but also (in the future) for reporting the header
information to incidentd.
Bug: 67978682
Test: adb shell data/nativetest64/statsd_test/statsd_test
Change-Id: Ib254db7e1edb4f0f193f4772d17f14934cdf7e30
Fixes a few items in AnomalyTracker, especially to do with what happens
when an anomaly alarm fires.
Test: unit tests still pass
Change-Id: Ia89bd617442e952e587336b890c3ca67430b5e21
+ StatsdStats is the global class that tracks the stats about statsd.
+ Added guardrail for classes that have a map which could potentially grow
unboundedly with the number of logs.
TODO: add unit tests & CTS for StatsdStats, and guardrail
add stats for pulled atoms.
Test: statsd_test
Change-Id: I0ea562de4dd3f6162f7923a9c193420b482c1d51
There are some bugs in the previous big anomaly detection cl, which I didn't catch.
Test: statsd_test
Change-Id: I2ca22e0f112e5cff8fb204c0c94141188659fdba
2/ Init anomaly from config based on the public language.
3/ Unit tests for anomaly detection in count/gauge producer.
4/ Revisit the duration tracker logic.
Test: unit test passed.
Change-Id: I2423c0e0f05b1e37626954de9e749303423963f2
+ Changed how we construct LogEvent, now it's based on the context from log_msg
without making a copy of the list.
+ All stats logs now have the same event tag, the atom id is the first elem in the log.
Test: statsd_test
Change-Id: I4419380f2ee1c2b2155d427b9f2adb78883b337f
+ Added counters to count the starts.
Test: added some unit test in statsd_test.
TODO: ADD MORE UNIT TESTS
Change-Id: I3575e921a4abab27c4ea50fffde19cc1d4564030
1/ use string id/name for all metrics/configs/matchers.
2/ move alerts out of metric proto.
Test: unit test passed.
Change-Id: I08269f47be30f83f6dbadf873e7d62213449932b
StatsPullerManager is refactored so that we can mock it.
It may need more refactor pass to make is safer for longer runs.
Test: unit test
Change-Id: Ief0c99710e4d06e1454678f8b749c9599467d114
And other miscellaneous fixes.
+ clang-format
+ 2 bug fixes, one in dump-report command, one in ResourcePowerManagerPuller
Test: statsd_test
Change-Id: Ibd164d948ad62adcc529d813df1210781e38be47