72 Commits

Author SHA1 Message Date
Muhammad Qureshi
3a5ebf589e Cancel Metric activations
Cancel Metric activations triggered by atom matchers

Bug: 128218061
Test: statsd_test
Test: statsd_localdrive
Change-Id: I90a705d74725c2aa04025e18e1fa77ec4fefc522
2019-04-03 06:03:48 -07:00
Yao Chen
0f861867c6 Make StatsLog drop less.
+ Create a thread-safe LogEventQueue to buffer log events.

+ The socket listner thread will read from socket and write to the buffer as quickly as possible
  to minimize the data loss in socket.

+ All pushed data is fetched from the the buffer and processed in a dedicated thread. After an
  event is fetched from the queue, we no longer block the socket listener thread.

+ Report event queue stats via statsdstats, including the min and max queue event history span in
  the queue (to understand how slow statsd can be and how fast the events can be)

Bug: 119031518
Test: unit tests added in statsd_test

Change-Id: I6b65ed9a678935b2e24302ba4b36e69c157adde4
2019-03-28 14:44:37 -07:00
Olivier Gaillard
e3c4b93263 Fix end-to-end value metrics tests.
- Use no_time_contraints to pull the data in PartialBucket_e2e_test.
Fast means there is no time to complete the current bucket
- One test had to be updated since the condition was unknown so the
bucket was not valid

Test: atest statsd_test
Change-Id: Id22539b801b29ed6b384045979e7c3cc6e3fe8b6
2019-03-15 21:30:43 +00:00
Olivier Gaillard
6c75ecdcef Introduces an option to set a dump latency requirement.
We are currently dumping invalid data for pulled metrics. Pulled metrics
require a new pull when flushing a bucket. We should either do another
pull or invalidate the previous bucket.

There are cases where we cannot afford to do another pull, e.g. statsd
being killed. If we do not have enough time, we'll just invalidte the
bucket to make sure we have correct data.

Bug: 123866830
Test: atest statsd_test
Change-Id: I090127cace3b7265032ebb2c9bddae976c883771
2019-02-20 17:59:50 +00:00
Tej Singh
d823aebd2f Unit tests for active config change broadcast
Modifies statsd unit tests to also assert the active configs changed
broadcast is called at the right time with the right configs. One case
has one config with multiple activation signals, while the other has
multiple configs that are activated at the same time with one signal.

Test: this is the test
Change-Id: I6e1cd19a8a0a8dbf9745152d4ad4980854be0cbd
2019-02-13 16:43:49 -08:00
Tej Singh
d7b5118f23 Fix failing statsd unit tests
Statsd unit tests relied on temperature atom. Since this atom is now
pulled by SCS, it no longer works on unit tests. We should figure out
why SCS pullers fail in unit tests, but in the meantime this cl changes
the tests to use full battery capacity.

Test: tests now pass on blueline
Bug: 123530679
Change-Id: I17c5cd67c6bf64a4b572de0a88c7abe73e0fe87b
2019-01-31 16:03:50 -08:00
Chenjie Yu
0bd73dbaac Puller refactor
1) Refactor pullers and add tests.

2) Add timeout to a puller.
mPullTimeoutNs is intrinsic to puller. A pull taking longer than this is
deemed failed and the data discarded.
A metric or StatsPullerManager requesting a pull should monitor the pull
and have deadlineNs. A successful pull may come later than desired due
to statsd processing delays.

3) Add unit tests to puller now that the base puller is more
complicated.

Bug: 118756964
Test: unit test
Change-Id: I0e5d47e2527391f7beef4b2d06bfd5c2f82f1179
2018-12-21 09:46:09 -08:00
Yang Lu
b472291224 Always report the metric id even if the metric has no data to upload.
This is to normalize the metric when applying cross-device aggregation.

Test: statsd test
BUG: b/118203108

Change-Id: I644a6a4163e2d2ad8f70e0c4e002dd4f542248af
2018-12-07 22:18:45 +00:00
Chenjie Yu
e077fd2e9c deprecate ALL_CONDITION_CHANGE in gauge
The feature set for GaugeMetric is the following:

1. Pull/Push once per bucket ----> RANDON_ONE_SAMPLE
  Have 1 gauge per bucket, if condition is true

2. Pull when condition becomes true ----> CONDITION_CHANGE_TO_TRUE
  Have 1 gauge per condition change to true, up to n

3. Pushed events multiple copies ----> FIRST_N_SAMPLE
  Have 1 gauge per event, up to n

4. Pull on trigger ----> FIRST_N_SAMPLE
  Have 1 gauge per trigger, up to n

For P, feature 1,2,3 work and same config can be used for P&Q.
(FIRST_N_SAMPLES is new but the code works, as would otherwise be
intended)
Feature 4 is new in Q.

Bug: 119783642
Bug: 118639742
Test: unit test
Change-Id: I671f7a5d3fdf9c2db4e8b6a137e3e45211874dcc
2018-11-19 16:39:47 -08:00
TreeHugger Robot
b1c6ba026d Merge "Statsd uidmap includes vers string and installer" 2018-11-15 08:56:28 +00:00
Chenjie Yu
32717c3b6f multi-value aggregation in ValueMetric
Allow aggregation on multiple fields, instead of one at a time.
All these fields should use the same aggregation time, use_diff,
direction, etc.
The config reuses value_field but allows multiple fields to be
specified.
The order they are specified determines the "index" of a value in the
output.

Bug: 119217634
Test: unit test
Change-Id: I38b1465d13723a897b30ee0b4f868498f60ad4db
2018-11-12 12:04:16 -08:00
Chenjie Yu
c715b9e0cc ValueMetric overhaul
This is savaged from the large cl.

1. Simplify the logic in ValueMetricProducer.
1.1 for pull data on bucket boundary, we pull on bucket end, instead of
playing with timestamp twice.
1.2 for data that require diffing, we keep a rolling diff base that gets
updated.
1.3 Now we check condition in onMatchedLogEventInternalLocked for pushed atoms. For pulled atoms, check before commit time. This was very error prone in P and caused multiple bugs. It is much simpler now.
2. Treat pushed and pulled atoms the same way and share the same
aggregation types.
4. Allow decreasing values for diffing.
5. Allow diffing for pushed atoms.
6. For diff based aggregation, if the diff value is zero, we skip
output.

Bug: 117224984
Bug: 115683963
Bug: 117975256
Bug: 113268259
Test: unit test
Change-Id: I6ee306e9f6e5a166b392c443594704e7d2792ef5
2018-11-06 17:07:29 -08:00
Bookatz
ff71cadecc Statsd can dump data as proto to bugreport
* Creates an incident section for statsd data.
* Allows dump to output statsd data, in proto format.
* Hooks up two statsd outputs to bugreports:
  -statsd report data in proto format
  -statsd metadata (statsdstats) in text format

The incident section does not import stats_log.proto because that turns
out to be extremely difficult: stats_log.proto imports atoms.proto,
which imports more things and is enormous and causes all sorts of
problems. atoms.proto was purposefully never compiled in AOSP, so to
retain that feature, the incident section uses 'bytes' instead of an
actual message. Since this isn't ever read in AOSP (other than testing),
this should be fine.

Bug: 115678461
Test: take a bug report and confirm valid proto
Test: cts-tradefed run cts-dev -m CtsStatsdHostTestCases -t android.cts.statsd.atom.HostAtomTests#testDumpsysStats

Change-Id: I1c370af7678d1dc7440ce299ea5ea4da6d33832b
2018-11-05 18:52:49 -08:00
dwchen
730403e757 Statsd uidmap includes vers string and installer
Each config can choose to include version strings and installer with
each metrics report. This data may be useful in the cloud to filter
the app-specific data.

BUG: 115626330
Change-Id: I3972ff2a94e7f0347ac0cc8a443cf328c1731e13
Test: Modified unit-tests, verified on marlin-eng
2018-10-30 16:24:25 -07:00
TreeHugger Robot
7dd9e171c2 Merge "metric activation and TTL." 2018-10-18 21:32:29 +00:00
Yangster-mac
849dfdc25b metric activation and TTL.
Currently, once a metric config is pushed to statsD, it will
start to collect metrics immediately. This CL introduces the metric
activation logic. When metric needs an activation, the metric producer
will hold until the activation event is detected. Then the metric producer
starts metric generation until the TTL expires (timebomb).

This is to support Mainline where it wants to collect a few metrics for
a few hours when the binary push starts or flag flips.

Test: statsd test
BUG: b/117858835
Change-Id: I992ae98f4303d5b79932eb94eddf6c19ded3727e
2018-10-18 10:45:31 -07:00
Yangster-mac
32f07af29c Match pulled events in gauge metric.
Bug: b/117703265

Test: statsd unit tests
Change-Id: Ia4c67ebfdb4f9647d4135c6879f04faa4efdd471
2018-10-17 17:47:11 -07:00
Yangster-mac
a8a304458b Fix the typos and naming convention in atoms.proto
Test: cts,statsd
BUG: b/117708491

Change-Id: Ib381ef66ae9925938e1f70b9a8869ef008e3d335
2018-10-14 22:04:52 -07:00
TreeHugger Robot
908bd8dea2 Merge "Fix a bug in DurationMetric's dimensions from condition and make unit tests expect the right answer." 2018-09-28 23:38:29 +00:00
TreeHugger Robot
07d7baf36c Merge "statsd events/gauge: remove WallClockTime" 2018-09-12 21:43:03 +00:00
Bookatz
fe2dde8184 statsd events/gauge: remove WallClockTime
EventMetricData stores wall_clock_timestamp_nanos.
It is expensive, costing 10 bytes per event and evidently not needed.
Similar for GaugeMetricData.

Bug: 113072343
Test: make -j8 statsd_test && adb sync data && adb shell data/nativetest64/statsd_test/statsd_test
Test: run cts-dev -m CtsStatsdHostTestCases
Test: Manually confirm that events/gauges don't have wallclock
Change-Id: Iae978a434354c049e1fa61d42536be981c862b4f
2018-09-05 10:53:33 -07:00
Chenjie Yu
4c31f67965 Remove dimension fields in GaugeMetric output
GaugeMetric output is atom format and can contain all fields including
those already included in dimensions.
This is not necessary and duplicate.
And, if we do set dimensions, we can use this to reduce data size
similar to repeated fields, where same dimension value only appear once.

Bug: 113061955
Test: unit test
Change-Id: I299bd1cb1b9b90ea7426ef182df78d2ffc091910
2018-08-22 14:14:02 -07:00
Chenjie Yu
a0f0224906 ValueMetric supports multiple aggregation types
1. Add support for MIN, MAX, AVG
2. ValueMetric also allow floats now, in addition to long data type.
AnomalyDetection still takes long only. I am not sure if it makes
sense to do anomaly on AVG. I will leave that for later.
3. ValueMetric supports sliced condition change for pushed events.
I don't think it makes sense for pulled events to have sliced condition
changes so leave it for now.

Test: unit test
Change-Id: I8bc510d98ea9b8a6eb16d04ff99dce6b574249cd
2018-07-13 10:24:41 -07:00
Yao Chen
0aff90329b Fix a bug in DurationMetric's dimensions from condition and make unit tests expect the right answer.
Bug: 111119889
Test: statsd_test
Change-Id: Ie8379031c93641c011bf27694b47ae21fe8f8d7a
2018-07-03 10:51:05 -07:00
Yao Chen
5bfffb54da Clean up TODOs in statsd
+ Created bugs for those TODOs that are still relevant.
+ Remove obsolete TODOs.

Test: no code change.
Change-Id: I41c2a89a882f087817ee6cbc3f095e1d80e1928e
2018-06-25 11:08:04 -07:00
Chenjie Yu
e22192071d StatsPullerManager not use singleton
This is to be consistent with other patterns such as UidMap.
This also makes unit test simpler.

Change-Id: I1558cd609e470481f269ecf2ae616277a95cfbf0
Bug: 72722120
Test: unit test
2018-06-14 15:46:54 -07:00
Yangster-mac
1c58f04cd3 Add a field in config to disable/enable the string hashing in metric report.
Statsd hashes (using its own hashing function) raw strings to reduce the
upload data size when there are duplicate strings in the report. And in cloud,
the clearcut translator would backfill the strings.

In a few droidfood users, we find the translator was unable to do that. While
debugging the root cause, we first decided to provide an option to disable
the hashing from the cloud.

Test: statsd unit test, CTS test, tested manually

BUG: b/79943763
Change-Id: If0359c8cf3f3cf83a2938db9ebf95ea7906f0b0c
2018-05-18 10:39:50 -07:00
Chenjie Yu
021e25307d ValueMetric pushed events should check condition
+ fix unit test flakiness

Bug: 79873404
Change-Id: I15b52a79b18c05603640781e4450e7b62fac24ba
Fix: 79873404
Test: unit test
2018-05-16 14:50:11 -07:00
David Chen
56ae0d9a48 Fixes statsd reports missing strings and SCS.
Reports written to disk don't contain the strings used, which will
make this report unusable if there are strings that don't show up
again. We should always include the strings, so this option is
removed entirely.

Also, we hard-coded the wrong number of fields when pulling
ModemActivityInfo. There are actually 10 fields, not 6.

Bug: 79601503
Test: Tested unit-tests pass on marlin-eng.
Change-Id: I6834b096ced77418a9cc2ddd79b08d1c9c447fae
2018-05-11 17:04:56 -07:00
TreeHugger Robot
a159842161 Merge "Fix the flaky gauge/value e2e test due to cached events." into pi-dev 2018-05-10 01:02:10 +00:00
Yangster-mac
58e609e339 Fix the flaky gauge/value e2e test due to cached events.
Test: statsd test
BUG: b/79265262
Change-Id: I4d67f1c2edb6215a3cea23f8c7b2e8d5099c4aac
2018-05-08 16:19:48 -07:00
Yao Chen
cc884dfc94 Fix partial bucket unit tests.
Bug: 79347749
Test: statsd_test
Change-Id: I69eee7172d6fe4ce895530f089193eb08653e269
2018-05-07 10:34:31 -07:00
Yangster-mac
9def8e3995 Reduce statsd log data size.
1. Hash the strings in metric dimensions.
2. Optimize the timestamp encoding in bucket.
   Use bucket num for full bucket and millis for
   partial bucket.
3. Encode the dimension path per metric and avoid
   deduping it across dimensons.

Test: statsd test
Change-Id: I18f69654de85edb21a9c835c73edead756295e05
BUG: b/77813755
2018-04-26 04:30:18 -07:00
Chenjie Yu
e36018b272 add dump report reason to reports
+ also change uidmapping version numbers to int64_t

Bug: 78132855
Change-Id: Iac7ea93e4bf651bd65bd03383e7ab4971af4fc29
Fix: 78132855
Test: gts test
2018-04-18 20:19:21 +00:00
David Chen
81245fd53a Adds option to drop small buckets for statsd.
We notice that some of the pulled metrics have a ton of data, and
during app upgrades, we're forming partial buckets that represent
small periods of time but require many bytes of data. We now have an
option to drop these buckets that are too short. Note that we still
have to pull the data to keep the metrics for the next bucket
correct. We include a new field in the value and gauge metric outputs
so that it's easy to tell when a bucket was dropped.

We drop the partial buckets also from anomaly detection since we
should be computing anomalies from the same data that is reported.

Test: Added unit-tests for value and gauge metrics.
Bug: 77925710
Change-Id: Ic370496377c6afd380e02278a6c1ed8b521a2731
2018-04-16 18:42:14 -07:00
Jeff Sharkey
6b64925737 Protect usage data with OP_GET_USAGE_STATS.
APIs that return package usage data (such as the new StatsManager)
must ensure that callers hold both the PACKAGE_USAGE_STATS permission
and the OP_GET_USAGE_STATS app-op.

Add noteOp() method that can be called from native code.

Also add missing security checks on command interface.

Bug: 77662908, 78121728
Test: builds, boots
Change-Id: Ie0d51e4baaacd9d7d36ba0c587ec91a870b9df17
2018-04-16 12:44:32 -06:00
TreeHugger Robot
6b317915e8 Merge "StatsManager throws exceptions" into pi-dev 2018-04-11 17:02:06 +00:00
Yangster-mac
15f6bbc24f Flush the bucket when creating the metric producer.
Use int64 for value field.
E2e test for gauge/value metric.

BUG: b/74445671

Test: statsd test.
Change-Id: I823a0bade8f89834bdfb9cf48864852a47d7b63b
2018-04-10 20:25:13 -07:00
Bookatz
4f71629002 StatsManager throws exceptions
When StatsManager fails to connect to statsd, it now throws an exception
for the caller to catch. It also throws an exception of the config being
added is of an unreadable format.

Due to backwards compatibility issues, the old APIs could not be
changed, so new ones were made to replace the old ones. The old ones are
now temporary and will be removed when the compatibility issue is
resolved.

Bug: 77648233
Test: gts-tradefed run gts-dev --module GtsStatsdHostTestCases
Change-Id: Ibea05883a29b9b3ef9927d2f8fe295eb99832ab7
2018-04-10 19:07:32 -07:00
Yangster-mac
e68f3a5811 Flush the partial bucket when startd shuts down or config updated.
Test: statsd test

BUG: b/77556036
Change-Id: Ie4a04ace55e07c4529cdff5906ba874f8815f620
2018-04-05 18:05:57 -07:00
David Chen
bd12527c90 Fix uid map to be simpler and fix partial bucket.
The previous scheme captured periodic snapshots for each config with
complex logic that's unnecessary and wasted memory. We actually don't
need to store any snapshots since we just convert the current state
into a snapshot and also include the deltas (change events) since the
previous report until now.

To make the system more robust, we also include up to 100 of the
deleted apps in the uid map.

Also, fix the wiring of the partial buckets so the metric producers
form partial buckets on both app upgrade and removal, but not on
installation of a new app.

Also, we update StatsCompanionService to also include disabled apps.

Bug: 77607583
Test: Verified unit-tests pass and added new e2e tests.
Change-Id: I98e1f544d6e6571545ae1581c4cebab807596f51
2018-04-05 16:15:01 -07:00
Yangster-mac
b142cc8add Statsd config TTL
Roughly check the config every hour to see whether the ttl expired.
If so, read the config from disk and recreate the metric manager.

Test: statsd test

BUG: b/77274363

Change-Id: I16838afe5bbe966c3a0f638869751f9b59a5a259
2018-04-04 15:59:43 +00:00
TreeHugger Robot
46eef8d049 Merge "E2e test for periodic alarm." into pi-dev 2018-03-31 03:04:59 +00:00
Chenjie Yu
5caaa9d854 Clean up atoms.proto
changes are:
1) for pushed atoms, use attribution node in place of uid when
appropriate
2) name changes to be more consistent

Bug: 73823969
Test: manual test
Change-Id: Iacf7186dbd7a2282f7fe481f43dbbf92e1165b47
2018-03-30 10:11:03 -07:00
Yangster-mac
684d195227 E2e test for periodic alarm.
Test: new test

BUG: b/76281156
Change-Id: I60cb28baaeec6996e946a7cb3358ec8e0aca80e5
2018-03-27 13:26:20 -07:00
Yangster-mac
be10ddfe46 Flush the past buckets in anomaly tracker when time jumps forward
E2e test for count/duration anomaly trackers.

Test: new statsd tests.

BUG: b/74446029

Change-Id: Ia9be0240ba5021d44c1e1f096d67563e9138bb59
2018-03-26 18:44:01 -07:00
David Chen
8faaa01489 Deletes default allowed_log_sources in statsd.
We should fail right away if someone forgets to set this field
instead of setting default values since this may lead to hard to
explain bugs in the future (eg, why isn't systemui logging an atom
to a config).

Bug: 74608359
Test: Verified unit-tests still pass on marlin-eng.
Change-Id: Ibd8a6ccbc5cc8d2dfb8a1577c64bf9b49822a2c3
2018-03-15 17:45:58 -07:00
TreeHugger Robot
a825c8ee5d Merge "Support slicing by chain." into pi-dev 2018-03-15 02:27:15 +00:00
Yangster-mac
e06cfd777a Support slicing by chain.
BUG: b/73975181

Test: statsd test
Change-Id: I913ae0f68ff21ed0703bb5da9c60d3eaa3bf5981
2018-03-14 15:36:24 -07:00
Yangster-mac
3fa5d7fb23 Add wall clock timestamp for ConfigMetricsReport and gauge atoms.
Fix the bug when serializing multiple atoms in gauge metric

BUG: b/74159560

Test: new test for ALL_CONDITION_CHANGES sampling method.
Change-Id: I6d33c1efbac92b6e13be2d64c323e090cb1f84aa
2018-03-10 22:25:28 -08:00