43 Commits

Author SHA1 Message Date
Yangster-mac
932ececa16 Alarm: wakes up statsd and notifies the subscribers.
Test: manually tested it.
Change-Id: Id796a68976aeb1611183023ba4e9c6a8b8c44bb8
2018-02-27 13:30:48 -08:00
TreeHugger Robot
3d7f2d40d4 Merge "Adds anomaly detection to max duration metrics." 2018-02-20 23:38:00 +00:00
David Chen
2e414b99f4 Adds anomaly detection to max duration metrics.
Statsd supports anomaly detection for max duration, which is
conceptually like asking for an anomaly on the current duration.
There was a bug previously in the logic for max duration since we
recorded a duration every time a dimension went into pause. Now, we
only record a max duration when the dimension goes into stop.

When a dimension inside the MaxDurationTracker leaves the started
state, we need to check if we still need to keep an anomaly alarm
set. It's possible that we just immediately set an alarm with the
same timestamp as what was just deleted.

Test: Added some unit-tests and tested on marlin-eng.
Change-Id: I09c82cd266b4cc6e1c893ad166e602bf08baca0e
2018-02-20 13:45:25 -08:00
Howard Ro
db15052ae2 Merge "Add a guardrail to limit minimum bucket duration to be 5 minutes except when configured through adb command" 2018-02-15 00:18:22 +00:00
Yangster-mac
330af58f2b Use elapsed realtime instead of times based on wall clock, which can jump around and go backwards.
Test: statsd unit test passed

Change-Id: Ib541df99231e171b3be2a24f75632693e36da90e
2018-02-13 23:30:39 -08:00
yro
59cc24dbfd Add a guardrail to limit minimum bucket duration to be 5 minutes except
when configured through adb command

Bug: 73287251
Test: statsd_test
Change-Id: Iee51fedbaabb2c1f534a6edef3c564da88ef658b
2018-02-13 20:26:31 -08:00
Bookatz
14e0d854d7 Statsd_test update for rounding up alarms
In ag/3580546 we now round up when setting anomaly alarms. This means
the test needs to be updated by 1s in one place.

Test: make statsd_test && adb sync data && adb shell data/nativetest64/statsd_test/statsd_test
Change-Id: Ibd3548833dcc2c1b9ca3a1da2c183bb771799775
2018-02-13 15:55:14 -08:00
Yao Chen
8a8d16ceea Statsd CPU optimization.
The key change is to revamp how we parse/store/match a log event, especially how we match repeated
field and attribution nodes, and how we construct dimensions and compare them.

+ We use a integer to encode the field of a log element. And also encode the FieldMatcher into an
integer and a bit mask. The log matching becomes 2 integer operations.

+ Dimension is stored as encoded field and value pair. Checking if 2 dimensions are equal is then
  becoming checking if the underlying integers are equal. The integers are stored contiguously
  in memory, so it's much faster than previous tree structure.

Start review from FieldValue.h

Test: statsd_test + new unit tests

Bug: 72659059

Change-Id: Iec8daeacdd3f39ab297c10ab9cd7b710a9c42e86
2018-02-12 10:38:45 -08:00
Chenjie Yu
80f9112aec Pullers merge isolated process with host
Test: unit test
Change-Id: Idcb4d19e1f5182554b503c5a35fbde4da146835c
Fix: 72230210
2018-02-07 22:16:50 +00:00
David Chen
27785a8a4a Partial buckets on app upgrade and fix duration.
Statsd will create partial buckets in all metrics producers when an
app is upgraded so that we can separate metrics between different
versions of an app. By looking at the uid map changes, we can tell
which app versions belong to a bucket; for metrics that are not
affected by an app version, we can instead join the buckets together.

To simplify the logic, the ends of the full buckets are always
aligned to when the metric producers were created. These boundaries
are computed on the fly by using the bucket number and the metric
producers' start times.

We keep the anomaly trackers to only be given full buckets; we buffer
the partial buckets within each metric producer.

Duration metric's MAX_SPARSE is fixed to be implemented as such. In
addition, after further discussion, we find anomaly detection on
MAX_SPARSE to be unnecessary, so this functionality is removed.

Test: Unit-tests added and modified, passed on marlin-eng.
Change-Id: I5ff7a9c7f05c406e9faf400c6a39162970ded102
2018-02-02 13:30:10 -08:00
Yangster-mac
9369446f0b Support dimension in condition in metric producers.
Test: added e2e tests for count/duration metrics sliced by fields in condition and with/without links.

Change-Id: Ie34deba68e6780abdde458be3f0ce5284e76a1a2
2018-01-30 13:33:57 -08:00
TreeHugger Robot
641e29df46 Merge "Extend gauge metric to support memory metric." 2018-01-30 19:08:41 +00:00
Yangster-mac
34ea1103a0 Extend gauge metric to support memory metric.
Test: statd unit test passed.

Test: statsd unit test passed
Change-Id: I2e3f26563678ae77d44afe168454b6d1ea449f3a
2018-01-30 09:14:54 -08:00
Yangster-mac
7ba8fc357e Cpu usage optimization:
1/ Avoid unnecessary field/dimension proto construction.
2/ use unordered_map for slicing.
3/ Use dimension fields to compare dimension keys.

Test: all statsd tests passed.
Change-Id: I2f74f78589b7f6ecd0803a2ead822b8d0399f334
2018-01-26 23:17:02 +00:00
Yangster-mac
2c6dc474ac Rename the dimension_in_what/condition as fields_in_what/condition in MetricConditionLink proto.
Test: statsd cts tests passed.
Change-Id: I4c21b0da3d153c8d0c368a78647c44699a3161a5
2018-01-18 16:17:43 -08:00
Bookatz
1bf94382d0 Anomaly detection is per dimension
Anomalies are now detected and declared per dimension. This means that
declareAnomaly now gets access to the key that is responsible for the
anomaly. Moreover, the refractory period is per dimension, not overall.
So a second anomaly for the same metric but a different dimension can
fire within the first dimension's refractory period. Thus, if app A
misbehaves and app B misbehaves shortly thereafter, they will both be
detected.

Eventually this key will be passed to the subscribers, although this cl
doesn't do anything with it.

Test: adb shell data/nativetest64/statsd_test/statsd_test
Change-Id: Id76856dc44fe9ecf91ac81a423e84f97c81d30ab
2018-01-09 11:29:41 -08:00
Yao Chen
d59a6589fa Some fixes in duration metrics
+ Don't need to keep the condition key if the condition is not sliced.
+ Whiten the hash before returning.

Test: statsd_test
Change-Id: Ie0bcb14b3adc4a15b79ba3be050431a672a68c28
2018-01-08 11:17:11 -08:00
Yangster-mac
a070b6a40c Bug fix: gauge metric producer does not need the atom id parameter.
Test: statsd unit test passed
Change-Id: Ifd1e2f7a05a03886eb227ac1ae0e30b884e49f0c
2018-01-05 04:34:49 +00:00
Yangster-mac
b814481ad1 Use TimeUnit enum to specify the bucket size.
Test: all statsd unit test passed

Change-Id: I4f6b80ba2f8c984b06e46e6de6df3e546e99a968
2018-01-04 13:01:14 -08:00
Yangster-mac
a7fb12d2d2 1/ Use FieldMatcher to specify the value fields in value metric.
2/ rename number_of_buckets as num_buckets
3/ use double for the Alert's threshold

Test: statsd unit tests passed.
Change-Id: Id1f55f14d3712eddee561681e3cd77343f086c7a
2018-01-03 23:49:36 -08:00
Yangster-mac
94e197cceb 1/ Change all "name" to id in statsD.
2/ Handle Subscription for alert.
3/ Support no_report_metric

Bug: 69522276
Test: all statsd unit tests passed.
Change-Id: I851b235f2d149b8602b0cad632d5bf541962f40a
2018-01-03 15:34:00 -08:00
Yangster-mac
2087716f2b 1/ Support nested message and repeated fields in statsd.
2/ Filter gauge fields by FieldMatcher.
3/ Wire up wakelock attribution chain.
4/ e2e test: wakelock duration metric with aggregated predicate dimensions.
5/ e2e test: count metric with multiple metric condition links for 2 predicates and 1 non-sliced predicate.

Test: statsd unit test passed.

Change-Id: I89db31cb068184a54e0a892fad710966d3127bc9
2018-01-01 10:01:36 -08:00
Bookatz
857aaa5208 Splits AnomalyTracker into two files
Splits out DurationAnomalyTracker-specific functions into their own
subclass.

Test: the unit tests and CTS tests
Change-Id: Id6eb74d232b4a9c3a932d805d1ba3f0ba43a88b1
2017-12-22 15:05:31 -08:00
Yao Chen
d5aa01b371 Remove the hacky HashableDimensionKey.
+ Add a real HashableDimensionKey as a wrapper of the dimension.
So we can get rid of the maps that we kept.

Pay down technical debt and reduce memory usage.

Test: statsd_test & manual
Change-Id: I233280cf1e2ce93da6a8cd4e8514abb066f4016d
2017-12-20 18:45:43 -08:00
Chenjie Yu
d9dfda76e0 add feature: GaugeMetricProducer now takes repeated list of fields
bug fix: GaugeMetricProducer now works better with pulled events.
unit test also includes GaugeMetricProducer_test

Test: unit test
Change-Id: Ic60f09342d14cfb107be2130d445b323a56909e0
2017-12-16 17:12:45 -08:00
Bookatz
de1b5562c8 Statsd value metric anomaly detection
Add anomly detection to Value metric in statsd.

Test: make statsd_test && adb sync data && adb shell data/nativetest64/statsd_test/statsd_test
Test: run cts-dev -m CtsStatsdHostTestCases -t android.cts.statsd.HostAtomTests#testValueAnomalyDetection
Fixes: 70240042
Change-Id: I05cf36495cdfd0ac7aa1a922f0e253a60fda1787
2017-12-15 18:10:10 +00:00
Chenjie Yu
a7259abde4 use running sum for ValueMetricProducer bucket
simplify ValueMetricProducer logic for pulled data

Test: unit test
Change-Id: Ic0a21a543166cc5c34c1fa505dba08d1fc2f510a
2017-12-13 22:32:23 -08:00
TreeHugger Robot
c51ec6767f Merge changes Ib254db7e,I4aa0539a
* changes:
  Anomaly Alert declarations in StatsdStats
  Anomaly detection statsdstats
2017-12-08 03:50:57 +00:00
Bookatz
8f2f3d8205 Anomaly Alert declarations in StatsdStats
StatsdStats now tracks the number of times an anomaly is detected (per
config, per alert name).

Also adds a configKey to AnomalyTracker, which is needed, not only for
statsdstats, but also (in the future) for reporting the header
information to incidentd.

Bug: 67978682
Test: adb shell data/nativetest64/statsd_test/statsd_test
Change-Id: Ib254db7e1edb4f0f193f4772d17f14934cdf7e30
2017-12-07 18:00:44 -08:00
Stefan Lafon
a5b5191351 Rename EventConditionLink as MetricConditionLink in statsd_config.proto.
Test: Ran statsd_test and cts tests.
Change-Id: Ia9a38892854443aa4f17d800a075860fcf874652
2017-12-05 21:51:46 -08:00
Yao Chen
f60e0bad59 Fix how DurationTracker passes results to DurationMetricProducer
This allows us to safely delete old buckets after onDumpReport is called.

Test: statsd_test

Change-Id: If8e28a184dcaf491bcd9a62d0d61beabca5a91a5
2017-12-01 09:14:13 -08:00
Yangster
f2bee6fec9 1/ Only expose thread-safe interfaces in metric producer.
2/ Simplify lock logic.
3/ Add test for duration metric producer.

Test: all unit test passsed.
Change-Id: If6ee2e69a17f12406f4b3ea3553b14642cd636d6
2017-11-30 09:32:17 -08:00
Bookatz
cc5adef2d0 Statsd anomaly detection - fixes
Fixes a few items in AnomalyTracker, especially to do with what happens
when an anomaly alarm fires.

Test: unit tests still pass
Change-Id: Ia89bd617442e952e587336b890c3ca67430b5e21
2017-11-27 15:35:40 -08:00
Yao Chen
b356151e63 Add StatsdStats and guardrail.
+ StatsdStats is the global class that tracks the stats about statsd.

+ Added guardrail for classes that have a map which could potentially grow
  unboundedly with the number of logs.

TODO: add unit tests & CTS for StatsdStats, and guardrail
      add stats for pulled atoms.

Test: statsd_test

Change-Id: I0ea562de4dd3f6162f7923a9c193420b482c1d51
2017-11-27 10:52:54 -08:00
Yao Chen
09294efe88 Fix some bugs in duration tracker.
There are some bugs in the previous big anomaly detection cl, which I didn't catch.

Test: statsd_test
Change-Id: I2ca22e0f112e5cff8fb204c0c94141188659fdba
2017-11-25 20:09:22 -08:00
Yangster-mac
e2cd6d509b 1/ Duration anomaly tracker with alarm.
2/ Init anomaly from config based on the public language.
3/ Unit tests for anomaly detection in count/gauge producer.
4/ Revisit the duration tracker logic.

Test: unit test passed.
Change-Id: I2423c0e0f05b1e37626954de9e749303423963f2
2017-11-20 15:37:24 -08:00
TreeHugger Robot
4aa167736b Merge "One more feature in DurationMetric -- nesting on top of Or and Max" 2017-11-18 01:42:24 +00:00
Yao Chen
80235403d2 statsd: parse the new format of stats log
+ Changed how we construct LogEvent, now it's based on the context from log_msg
  without making a copy of the list.

+ All stats logs now have the same event tag, the atom id is the first elem in the log.

Test: statsd_test
Change-Id: I4419380f2ee1c2b2155d427b9f2adb78883b337f
2017-11-17 11:53:15 -08:00
Yao Chen
0ea19901ef One more feature in DurationMetric -- nesting on top of Or and Max
+ Added counters to count the starts.

Test: added some unit test in statsd_test.

TODO: ADD MORE UNIT TESTS

Change-Id: I3575e921a4abab27c4ea50fffde19cc1d4564030
2017-11-15 16:45:59 -08:00
Yangster-mac
d1815dc7b8 Sync statsd to the public proto changes:
1/ use string id/name for all metrics/configs/matchers.
2/ move alerts out of metric proto.

Test: unit test passed.
Change-Id: I08269f47be30f83f6dbadf873e7d62213449932b
2017-11-13 22:09:08 -08:00
Chenjie Yu
6736c893a7 Unit tests for ValueMetricProducer
StatsPullerManager is refactored so that we can mock it.
It may need more refactor pass to make is safer for longer runs.

Test: unit test
Change-Id: Ief0c99710e4d06e1454678f8b749c9599467d114
2017-11-13 13:42:17 -08:00
yro
2b0f88678b Migrate all remaining MetricProducers to use ProtoOutputStream
Test: statsd, statsd_test
Change-Id: I1087e1c1ffb372ca288dfc575cb7a372b11ce8c5
2017-11-07 17:33:37 -08:00
Yao Chen
93fe3a34a0 Add unit tests for CountMetricProducer, EventMetricProducer
And other miscellaneous fixes.
+ clang-format
+ 2 bug fixes, one in dump-report command, one in ResourcePowerManagerPuller

Test: statsd_test

Change-Id: Ibd164d948ad62adcc529d813df1210781e38be47
2017-11-06 17:20:10 -08:00