116 Commits

Author SHA1 Message Date
Yao Chen
8a8d16ceea Statsd CPU optimization.
The key change is to revamp how we parse/store/match a log event, especially how we match repeated
field and attribution nodes, and how we construct dimensions and compare them.

+ We use a integer to encode the field of a log element. And also encode the FieldMatcher into an
integer and a bit mask. The log matching becomes 2 integer operations.

+ Dimension is stored as encoded field and value pair. Checking if 2 dimensions are equal is then
  becoming checking if the underlying integers are equal. The integers are stored contiguously
  in memory, so it's much faster than previous tree structure.

Start review from FieldValue.h

Test: statsd_test + new unit tests

Bug: 72659059

Change-Id: Iec8daeacdd3f39ab297c10ab9cd7b710a9c42e86
2018-02-12 10:38:45 -08:00
Chenjie Yu
80f9112aec Pullers merge isolated process with host
Test: unit test
Change-Id: Idcb4d19e1f5182554b503c5a35fbde4da146835c
Fix: 72230210
2018-02-07 22:16:50 +00:00
David Chen
27785a8a4a Partial buckets on app upgrade and fix duration.
Statsd will create partial buckets in all metrics producers when an
app is upgraded so that we can separate metrics between different
versions of an app. By looking at the uid map changes, we can tell
which app versions belong to a bucket; for metrics that are not
affected by an app version, we can instead join the buckets together.

To simplify the logic, the ends of the full buckets are always
aligned to when the metric producers were created. These boundaries
are computed on the fly by using the bucket number and the metric
producers' start times.

We keep the anomaly trackers to only be given full buckets; we buffer
the partial buckets within each metric producer.

Duration metric's MAX_SPARSE is fixed to be implemented as such. In
addition, after further discussion, we find anomaly detection on
MAX_SPARSE to be unnecessary, so this functionality is removed.

Test: Unit-tests added and modified, passed on marlin-eng.
Change-Id: I5ff7a9c7f05c406e9faf400c6a39162970ded102
2018-02-02 13:30:10 -08:00
Yangster-mac
9369446f0b Support dimension in condition in metric producers.
Test: added e2e tests for count/duration metrics sliced by fields in condition and with/without links.

Change-Id: Ie34deba68e6780abdde458be3f0ce5284e76a1a2
2018-01-30 13:33:57 -08:00
TreeHugger Robot
641e29df46 Merge "Extend gauge metric to support memory metric." 2018-01-30 19:08:41 +00:00
Yangster-mac
34ea1103a0 Extend gauge metric to support memory metric.
Test: statd unit test passed.

Test: statsd unit test passed
Change-Id: I2e3f26563678ae77d44afe168454b6d1ea449f3a
2018-01-30 09:14:54 -08:00
Yangster-mac
7ba8fc357e Cpu usage optimization:
1/ Avoid unnecessary field/dimension proto construction.
2/ use unordered_map for slicing.
3/ Use dimension fields to compare dimension keys.

Test: all statsd tests passed.
Change-Id: I2f74f78589b7f6ecd0803a2ead822b8d0399f334
2018-01-26 23:17:02 +00:00
Bookatz
1a1b0464cb Statsd and frameworks reference proto enums
For frameworks constants that don't have intrinsic meaning (i.e. their actual
value and order don't matter), so that it is unlikely that their values
will be changed:
This cl introduces proto enums representing some constants found in
the Android codebase, and connects the two.

By using the Proto enum as the source-of-truth, it means that Java and
proto can be kept in sync. Otherwise, when the Java frameworks code
changes, it silently breaks the protos from working properly, since the enums
are wrong. By having the Java code reference the proto enums, it ensures
that everything is in sync. The values of the constants are unchanged.
But future changes to these constants will need to be done in the proto
file, which the Java file merely references.

The protos are necessary for incidentd and statsd and, in the future,
possibly dumpsys. In this way, the logging mechanism is much less likely
to get broken when new constants are added, and we can be ensured that
the logging accurately reflects the underlying codebase.

Bug: 69478930
Test: cts-tradefed run cts-dev -m CtsStatsdHostTestCases
Test: cts-tradefed run cts-dev -m CtsIncidentHostTestCases
Change-Id: If79032c34b2799db1e3e70cb47b1312fd72092b9
2018-01-25 15:47:57 -08:00
David Chen
cfc311d2f0 Statsd always includes snapshot of uid map.
Statsd will contain at least one snapshot of the uid map. The
previous design was not very robust in case a snapshot was missing.

Also fixes subtle bug with updating the isolated uid mapping since
this should always be kept up to date even if there are no metrics
being used (since metrics may be added later after the isolated uid
was created).

Test: Checked that unit-tests pass on marlin-eng.
Change-Id: I99754ed9016d980564e409b0946a46b398fd12b7
2018-01-23 18:04:03 -08:00
TreeHugger Robot
5e1f854620 Merge "Rename the dimension_in_what/condition as fields_in_what/condition in MetricConditionLink proto." 2018-01-19 17:39:57 +00:00
Yangster-mac
2c6dc474ac Rename the dimension_in_what/condition as fields_in_what/condition in MetricConditionLink proto.
Test: statsd cts tests passed.
Change-Id: I4c21b0da3d153c8d0c368a78647c44699a3161a5
2018-01-18 16:17:43 -08:00
Yangster-mac
5503f5caae condition dimension query.
Test: Modified the metric link test to query a subset of the predicate/condition dimensions.

Change-Id: I693f14e0b11693fc8ee0bf4fc550977c20d31f71
2018-01-18 14:51:51 -08:00
Howard Ro
2c41460bc8 Merge "Reinstate configs from disk across boots by fixing bugs" 2018-01-18 19:38:38 +00:00
Yangster-mac
468ff04fd4 Proto change for dimension from condition.
Test: all statsd unit tests and cts tests passed.

Change-Id: I955143aca7c76e2e22a10d7ba42eb797bd32e147
2018-01-17 21:19:45 -08:00
yro
6e304ec690 Reinstate configs from disk across boots by fixing bugs
- Update configName to configID in StorageManager reflecting ag/3410772
- Fix a bug of using strtok destructivelyand correctly parse file name
- Fix a unit test to call StartupForTest of config manager

Bug: 70667694
Test: statsd, statsd_test
Change-Id: If32c02982d1f0c509bdd4e6f445924257bbc074b
2018-01-17 13:44:34 -08:00
Yangster-mac
87718e283a Gauge metric e2e test
Test: new test passed
Change-Id: I6a8bceb43fbb6e7b82f47951e71b5620779f2ceb
2018-01-15 20:41:08 -08:00
Yangster-mac
d40053eb8b Map isolated uid to host uid when processing log event in statsD.
Test: added test case for isolated uid in Attribution e2e test.
Change-Id: I63d16ebee3e611b1ef0c910e5154cf27766cb330
2018-01-09 21:45:46 -08:00
TreeHugger Robot
d6808dc0c0 Merge "E2e test: attribution matching and slicing by attribution." 2018-01-10 00:22:45 +00:00
Bookatz
1bf94382d0 Anomaly detection is per dimension
Anomalies are now detected and declared per dimension. This means that
declareAnomaly now gets access to the key that is responsible for the
anomaly. Moreover, the refractory period is per dimension, not overall.
So a second anomaly for the same metric but a different dimension can
fire within the first dimension's refractory period. Thus, if app A
misbehaves and app B misbehaves shortly thereafter, they will both be
detected.

Eventually this key will be passed to the subscribers, although this cl
doesn't do anything with it.

Test: adb shell data/nativetest64/statsd_test/statsd_test
Change-Id: Id76856dc44fe9ecf91ac81a423e84f97c81d30ab
2018-01-09 11:29:41 -08:00
TreeHugger Robot
c131a9297b Merge changes Ibe8c8d3c,I4a475d6f
* changes:
  Thread-safety at log processor level.
  Handle null string in jni and c++ stats-log-api interfaces.
2018-01-09 10:03:01 +00:00
Yangster-mac
b0d0628a29 Thread-safety at log processor level.
Test: statsd unit test passed.

Change-Id: Ibe8c8d3cc8297875b16ee385c077b71c87353147
2018-01-08 14:59:42 -08:00
Yao Chen
d59a6589fa Some fixes in duration metrics
+ Don't need to keep the condition key if the condition is not sliced.
+ Whiten the hash before returning.

Test: statsd_test
Change-Id: Ie0bcb14b3adc4a15b79ba3be050431a672a68c28
2018-01-08 11:17:11 -08:00
Yangster-mac
b5bc74122f E2e test: attribution matching and slicing by attribution.
Test: statsd unit tests passed.
Change-Id: I05875b80a9a1ab5f87a478d1f63543f9798254a5
2018-01-06 23:23:19 -08:00
Howard Ro
7266e82cf8 Merge "Use the configs from disk after reboot and separate out ConfigManager startup for unit test to avoid reading configs for tests" 2018-01-05 06:44:38 +00:00
Yangster-mac
a070b6a40c Bug fix: gauge metric producer does not need the atom id parameter.
Test: statsd unit test passed
Change-Id: Ifd1e2f7a05a03886eb227ac1ae0e30b884e49f0c
2018-01-05 04:34:49 +00:00
yro
469cd80840 Use the configs from disk after reboot and separate out ConfigManager
startup for unit test to avoid reading configs for tests

Bug: 70667694
Test: manual testing conducted, statsd, statsd_test
Change-Id: I49cb82d2b1d7c178b7a4c30c089d8d93e7902faa
2018-01-04 15:00:48 -08:00
Yangster-mac
b814481ad1 Use TimeUnit enum to specify the bucket size.
Test: all statsd unit test passed

Change-Id: I4f6b80ba2f8c984b06e46e6de6df3e546e99a968
2018-01-04 13:01:14 -08:00
Yangster-mac
a7fb12d2d2 1/ Use FieldMatcher to specify the value fields in value metric.
2/ rename number_of_buckets as num_buckets
3/ use double for the Alert's threshold

Test: statsd unit tests passed.
Change-Id: Id1f55f14d3712eddee561681e3cd77343f086c7a
2018-01-03 23:49:36 -08:00
Yangster-mac
94e197cceb 1/ Change all "name" to id in statsD.
2/ Handle Subscription for alert.
3/ Support no_report_metric

Bug: 69522276
Test: all statsd unit tests passed.
Change-Id: I851b235f2d149b8602b0cad632d5bf541962f40a
2018-01-03 15:34:00 -08:00
Yangster-mac
2087716f2b 1/ Support nested message and repeated fields in statsd.
2/ Filter gauge fields by FieldMatcher.
3/ Wire up wakelock attribution chain.
4/ e2e test: wakelock duration metric with aggregated predicate dimensions.
5/ e2e test: count metric with multiple metric condition links for 2 predicates and 1 non-sliced predicate.

Test: statsd unit test passed.

Change-Id: I89db31cb068184a54e0a892fad710966d3127bc9
2018-01-01 10:01:36 -08:00
Bookatz
857aaa5208 Splits AnomalyTracker into two files
Splits out DurationAnomalyTracker-specific functions into their own
subclass.

Test: the unit tests and CTS tests
Change-Id: Id6eb74d232b4a9c3a932d805d1ba3f0ba43a88b1
2017-12-22 15:05:31 -08:00
Yao Chen
d5aa01b371 Remove the hacky HashableDimensionKey.
+ Add a real HashableDimensionKey as a wrapper of the dimension.
So we can get rid of the maps that we kept.

Pay down technical debt and reduce memory usage.

Test: statsd_test & manual
Change-Id: I233280cf1e2ce93da6a8cd4e8514abb066f4016d
2017-12-20 18:45:43 -08:00
Yao Chen
d10f7b1c7b Add log source filtering in statsd to filter out spams.
+ Add log source whitelist in StatsdConfig
+ Some changes in UidMap API. Listener needs to be wp instead of sp.
+ Update dogfood app config to have log source
+ Increase the stats service thread pool size to 10 (9+1).

TODO: add unit tests(b/70805664). This unit test takes some time to write.

Test: statsd_test & manual

Change-Id: I129b1cc13db5114db7417580962bd7cc4438519d
2017-12-20 18:45:43 -08:00
Yangster
9df9a7f78d 1/ index app name.
2/ fix a bug in app update.

Test: statsd unit tests passed.
Change-Id: I0a1c82411d89705ceaa0a2badfdd4bbb9859df3a
2017-12-18 14:48:35 -08:00
TreeHugger Robot
b5c1fe39c0 Merge "rename pulled atoms to shorter names" 2017-12-18 16:41:15 +00:00
TreeHugger Robot
e6abcbed99 Merge "Turn off verbose logging, remove the fake config." 2017-12-17 17:01:36 +00:00
TreeHugger Robot
dc4e426a17 Merge "add feature: GaugeMetricProducer now takes repeated list of fields bug fix: GaugeMetricProducer now works better with pulled events. unit test also includes GaugeMetricProducer_test" 2017-12-17 16:58:46 +00:00
Chenjie Yu
31d14d72dc rename pulled atoms to shorter names
Test: unit test
Change-Id: I337341dc69ea658f5972cd9962c536a1f11d77f9
2017-12-16 20:33:09 -08:00
TreeHugger Robot
d6815d74dd Merge "align metrics to 5min bundary We use one alarm clock for all pulled atoms. If metrics from different configs are not aligned, the clock will be set to repeat at higher and higher frequency, and consume a lot of battery. Current implementation assumes a 5min minimum bucket size. New metric start time is set to be aligned to the start time of statsd in the next 5min. So it will ignore events up to 5min." 2017-12-17 03:14:27 +00:00
Chenjie Yu
d9dfda76e0 add feature: GaugeMetricProducer now takes repeated list of fields
bug fix: GaugeMetricProducer now works better with pulled events.
unit test also includes GaugeMetricProducer_test

Test: unit test
Change-Id: Ic60f09342d14cfb107be2130d445b323a56909e0
2017-12-16 17:12:45 -08:00
Chenjie Yu
85ed838713 align metrics to 5min bundary
We use one alarm clock for all pulled atoms.
If metrics from different configs are not aligned,
the clock will be set to repeat at higher and higher
frequency, and consume a lot of battery.
Current implementation assumes a 5min minimum bucket
size. New metric start time is set to be aligned to
the start time of statsd in the next 5min.
So it will ignore events up to 5min.

align puller alarm to minute bundary

Test: unit test
Change-Id: I77ffa3c13de363c780b1000181b9a9b780dd0846
2017-12-16 17:08:46 -08:00
Yao Chen
3c0b95ceb5 Turn off verbose logging, remove the fake config.
Test: statsd_test & manual
Change-Id: I43068c8db006cbbeccacaa9d8426ab4d0394313a
2017-12-16 14:34:20 -08:00
Bookatz
de1b5562c8 Statsd value metric anomaly detection
Add anomly detection to Value metric in statsd.

Test: make statsd_test && adb sync data && adb shell data/nativetest64/statsd_test/statsd_test
Test: run cts-dev -m CtsStatsdHostTestCases -t android.cts.statsd.HostAtomTests#testValueAnomalyDetection
Fixes: 70240042
Change-Id: I05cf36495cdfd0ac7aa1a922f0e253a60fda1787
2017-12-15 18:10:10 +00:00
Chenjie Yu
a7259abde4 use running sum for ValueMetricProducer bucket
simplify ValueMetricProducer logic for pulled data

Test: unit test
Change-Id: Ic0a21a543166cc5c34c1fa505dba08d1fc2f510a
2017-12-13 22:32:23 -08:00
Bookatz
e1d143a058 StatsdStats alerts -rename alerted_times
Renaming declared_times to alerted_times.

Test: still compiles and passes tests
Change-Id: Ib7c890a21bb91a7185c6d3a098b1bec8bd868ad5
2017-12-13 16:02:59 -08:00
Yao Chen
288c600013 Only create ProtoOutputStream when onGetData() is called.
The exception is EventMetricProducer. Each EventMetricProducer will still have a ProtoOutputStream
Because LogEvent comes as a fixed 4K, it's more memory efficient to have an 8k ProtoOutputStream for
storing the events.

Also removed finish() api in MetricProducer, which was intended to use with Dropbox.

Test: statsd_test & dogfood app
Bug: 70393808
Change-Id: I2efe4ecc76a88060a9aa5eb49d1fa6ea60bc5da8
2017-12-12 14:20:32 -08:00
TreeHugger Robot
62c1fb0f49 Merge "Adds rate limit to checking byte size." 2017-12-08 04:46:56 +00:00
TreeHugger Robot
c51ec6767f Merge changes Ib254db7e,I4aa0539a
* changes:
  Anomaly Alert declarations in StatsdStats
  Anomaly detection statsdstats
2017-12-08 03:50:57 +00:00
David Chen
d9269e2ee7 Adds rate limit to checking byte size.
Since there is a separate guardrail for memory used by uid map, we
no longer add the memory from uid map with the memory per each
config's metrics. We also prevent the byte size check from happening
too frequently. In order to mock the MetricsManager, we refactor
some of the existing methods.

Test: Added unit-tests and verified they all pass on marlin.
Change-Id: I15cf105f7d95f4016fdb0443b0a33eebe862cafb
2017-12-07 18:22:58 -08:00
Bookatz
8f2f3d8205 Anomaly Alert declarations in StatsdStats
StatsdStats now tracks the number of times an anomaly is detected (per
config, per alert name).

Also adds a configKey to AnomalyTracker, which is needed, not only for
statsdstats, but also (in the future) for reporting the header
information to incidentd.

Bug: 67978682
Test: adb shell data/nativetest64/statsd_test/statsd_test
Change-Id: Ib254db7e1edb4f0f193f4772d17f14934cdf7e30
2017-12-07 18:00:44 -08:00