+ Create a thread-safe LogEventQueue to buffer log events.
+ The socket listner thread will read from socket and write to the buffer as quickly as possible
to minimize the data loss in socket.
+ All pushed data is fetched from the the buffer and processed in a dedicated thread. After an
event is fetched from the queue, we no longer block the socket listener thread.
+ Report event queue stats via statsdstats, including the min and max queue event history span in
the queue (to understand how slow statsd can be and how fast the events can be)
Bug: 119031518
Test: unit tests added in statsd_test
Change-Id: I6b65ed9a678935b2e24302ba4b36e69c157adde4
- Use no_time_contraints to pull the data in PartialBucket_e2e_test.
Fast means there is no time to complete the current bucket
- One test had to be updated since the condition was unknown so the
bucket was not valid
Test: atest statsd_test
Change-Id: Id22539b801b29ed6b384045979e7c3cc6e3fe8b6
We are currently dumping invalid data for pulled metrics. Pulled metrics
require a new pull when flushing a bucket. We should either do another
pull or invalidate the previous bucket.
There are cases where we cannot afford to do another pull, e.g. statsd
being killed. If we do not have enough time, we'll just invalidte the
bucket to make sure we have correct data.
Bug: 123866830
Test: atest statsd_test
Change-Id: I090127cace3b7265032ebb2c9bddae976c883771
Modifies statsd unit tests to also assert the active configs changed
broadcast is called at the right time with the right configs. One case
has one config with multiple activation signals, while the other has
multiple configs that are activated at the same time with one signal.
Test: this is the test
Change-Id: I6e1cd19a8a0a8dbf9745152d4ad4980854be0cbd
Statsd unit tests relied on temperature atom. Since this atom is now
pulled by SCS, it no longer works on unit tests. We should figure out
why SCS pullers fail in unit tests, but in the meantime this cl changes
the tests to use full battery capacity.
Test: tests now pass on blueline
Bug: 123530679
Change-Id: I17c5cd67c6bf64a4b572de0a88c7abe73e0fe87b
1) Refactor pullers and add tests.
2) Add timeout to a puller.
mPullTimeoutNs is intrinsic to puller. A pull taking longer than this is
deemed failed and the data discarded.
A metric or StatsPullerManager requesting a pull should monitor the pull
and have deadlineNs. A successful pull may come later than desired due
to statsd processing delays.
3) Add unit tests to puller now that the base puller is more
complicated.
Bug: 118756964
Test: unit test
Change-Id: I0e5d47e2527391f7beef4b2d06bfd5c2f82f1179
This is to normalize the metric when applying cross-device aggregation.
Test: statsd test
BUG: b/118203108
Change-Id: I644a6a4163e2d2ad8f70e0c4e002dd4f542248af
The feature set for GaugeMetric is the following:
1. Pull/Push once per bucket ----> RANDON_ONE_SAMPLE
Have 1 gauge per bucket, if condition is true
2. Pull when condition becomes true ----> CONDITION_CHANGE_TO_TRUE
Have 1 gauge per condition change to true, up to n
3. Pushed events multiple copies ----> FIRST_N_SAMPLE
Have 1 gauge per event, up to n
4. Pull on trigger ----> FIRST_N_SAMPLE
Have 1 gauge per trigger, up to n
For P, feature 1,2,3 work and same config can be used for P&Q.
(FIRST_N_SAMPLES is new but the code works, as would otherwise be
intended)
Feature 4 is new in Q.
Bug: 119783642
Bug: 118639742
Test: unit test
Change-Id: I671f7a5d3fdf9c2db4e8b6a137e3e45211874dcc
Allow aggregation on multiple fields, instead of one at a time.
All these fields should use the same aggregation time, use_diff,
direction, etc.
The config reuses value_field but allows multiple fields to be
specified.
The order they are specified determines the "index" of a value in the
output.
Bug: 119217634
Test: unit test
Change-Id: I38b1465d13723a897b30ee0b4f868498f60ad4db
This is savaged from the large cl.
1. Simplify the logic in ValueMetricProducer.
1.1 for pull data on bucket boundary, we pull on bucket end, instead of
playing with timestamp twice.
1.2 for data that require diffing, we keep a rolling diff base that gets
updated.
1.3 Now we check condition in onMatchedLogEventInternalLocked for pushed atoms. For pulled atoms, check before commit time. This was very error prone in P and caused multiple bugs. It is much simpler now.
2. Treat pushed and pulled atoms the same way and share the same
aggregation types.
4. Allow decreasing values for diffing.
5. Allow diffing for pushed atoms.
6. For diff based aggregation, if the diff value is zero, we skip
output.
Bug: 117224984
Bug: 115683963
Bug: 117975256
Bug: 113268259
Test: unit test
Change-Id: I6ee306e9f6e5a166b392c443594704e7d2792ef5
* Creates an incident section for statsd data.
* Allows dump to output statsd data, in proto format.
* Hooks up two statsd outputs to bugreports:
-statsd report data in proto format
-statsd metadata (statsdstats) in text format
The incident section does not import stats_log.proto because that turns
out to be extremely difficult: stats_log.proto imports atoms.proto,
which imports more things and is enormous and causes all sorts of
problems. atoms.proto was purposefully never compiled in AOSP, so to
retain that feature, the incident section uses 'bytes' instead of an
actual message. Since this isn't ever read in AOSP (other than testing),
this should be fine.
Bug: 115678461
Test: take a bug report and confirm valid proto
Test: cts-tradefed run cts-dev -m CtsStatsdHostTestCases -t android.cts.statsd.atom.HostAtomTests#testDumpsysStats
Change-Id: I1c370af7678d1dc7440ce299ea5ea4da6d33832b
Each config can choose to include version strings and installer with
each metrics report. This data may be useful in the cloud to filter
the app-specific data.
BUG: 115626330
Change-Id: I3972ff2a94e7f0347ac0cc8a443cf328c1731e13
Test: Modified unit-tests, verified on marlin-eng
Currently, once a metric config is pushed to statsD, it will
start to collect metrics immediately. This CL introduces the metric
activation logic. When metric needs an activation, the metric producer
will hold until the activation event is detected. Then the metric producer
starts metric generation until the TTL expires (timebomb).
This is to support Mainline where it wants to collect a few metrics for
a few hours when the binary push starts or flag flips.
Test: statsd test
BUG: b/117858835
Change-Id: I992ae98f4303d5b79932eb94eddf6c19ded3727e
EventMetricData stores wall_clock_timestamp_nanos.
It is expensive, costing 10 bytes per event and evidently not needed.
Similar for GaugeMetricData.
Bug: 113072343
Test: make -j8 statsd_test && adb sync data && adb shell data/nativetest64/statsd_test/statsd_test
Test: run cts-dev -m CtsStatsdHostTestCases
Test: Manually confirm that events/gauges don't have wallclock
Change-Id: Iae978a434354c049e1fa61d42536be981c862b4f
GaugeMetric output is atom format and can contain all fields including
those already included in dimensions.
This is not necessary and duplicate.
And, if we do set dimensions, we can use this to reduce data size
similar to repeated fields, where same dimension value only appear once.
Bug: 113061955
Test: unit test
Change-Id: I299bd1cb1b9b90ea7426ef182df78d2ffc091910
1. Add support for MIN, MAX, AVG
2. ValueMetric also allow floats now, in addition to long data type.
AnomalyDetection still takes long only. I am not sure if it makes
sense to do anomaly on AVG. I will leave that for later.
3. ValueMetric supports sliced condition change for pushed events.
I don't think it makes sense for pulled events to have sliced condition
changes so leave it for now.
Test: unit test
Change-Id: I8bc510d98ea9b8a6eb16d04ff99dce6b574249cd
+ Created bugs for those TODOs that are still relevant.
+ Remove obsolete TODOs.
Test: no code change.
Change-Id: I41c2a89a882f087817ee6cbc3f095e1d80e1928e
This is to be consistent with other patterns such as UidMap.
This also makes unit test simpler.
Change-Id: I1558cd609e470481f269ecf2ae616277a95cfbf0
Bug: 72722120
Test: unit test
Statsd hashes (using its own hashing function) raw strings to reduce the
upload data size when there are duplicate strings in the report. And in cloud,
the clearcut translator would backfill the strings.
In a few droidfood users, we find the translator was unable to do that. While
debugging the root cause, we first decided to provide an option to disable
the hashing from the cloud.
Test: statsd unit test, CTS test, tested manually
BUG: b/79943763
Change-Id: If0359c8cf3f3cf83a2938db9ebf95ea7906f0b0c
Reports written to disk don't contain the strings used, which will
make this report unusable if there are strings that don't show up
again. We should always include the strings, so this option is
removed entirely.
Also, we hard-coded the wrong number of fields when pulling
ModemActivityInfo. There are actually 10 fields, not 6.
Bug: 79601503
Test: Tested unit-tests pass on marlin-eng.
Change-Id: I6834b096ced77418a9cc2ddd79b08d1c9c447fae
1. Hash the strings in metric dimensions.
2. Optimize the timestamp encoding in bucket.
Use bucket num for full bucket and millis for
partial bucket.
3. Encode the dimension path per metric and avoid
deduping it across dimensons.
Test: statsd test
Change-Id: I18f69654de85edb21a9c835c73edead756295e05
BUG: b/77813755
We notice that some of the pulled metrics have a ton of data, and
during app upgrades, we're forming partial buckets that represent
small periods of time but require many bytes of data. We now have an
option to drop these buckets that are too short. Note that we still
have to pull the data to keep the metrics for the next bucket
correct. We include a new field in the value and gauge metric outputs
so that it's easy to tell when a bucket was dropped.
We drop the partial buckets also from anomaly detection since we
should be computing anomalies from the same data that is reported.
Test: Added unit-tests for value and gauge metrics.
Bug: 77925710
Change-Id: Ic370496377c6afd380e02278a6c1ed8b521a2731
APIs that return package usage data (such as the new StatsManager)
must ensure that callers hold both the PACKAGE_USAGE_STATS permission
and the OP_GET_USAGE_STATS app-op.
Add noteOp() method that can be called from native code.
Also add missing security checks on command interface.
Bug: 77662908, 78121728
Test: builds, boots
Change-Id: Ie0d51e4baaacd9d7d36ba0c587ec91a870b9df17
When StatsManager fails to connect to statsd, it now throws an exception
for the caller to catch. It also throws an exception of the config being
added is of an unreadable format.
Due to backwards compatibility issues, the old APIs could not be
changed, so new ones were made to replace the old ones. The old ones are
now temporary and will be removed when the compatibility issue is
resolved.
Bug: 77648233
Test: gts-tradefed run gts-dev --module GtsStatsdHostTestCases
Change-Id: Ibea05883a29b9b3ef9927d2f8fe295eb99832ab7
The previous scheme captured periodic snapshots for each config with
complex logic that's unnecessary and wasted memory. We actually don't
need to store any snapshots since we just convert the current state
into a snapshot and also include the deltas (change events) since the
previous report until now.
To make the system more robust, we also include up to 100 of the
deleted apps in the uid map.
Also, fix the wiring of the partial buckets so the metric producers
form partial buckets on both app upgrade and removal, but not on
installation of a new app.
Also, we update StatsCompanionService to also include disabled apps.
Bug: 77607583
Test: Verified unit-tests pass and added new e2e tests.
Change-Id: I98e1f544d6e6571545ae1581c4cebab807596f51
Roughly check the config every hour to see whether the ttl expired.
If so, read the config from disk and recreate the metric manager.
Test: statsd test
BUG: b/77274363
Change-Id: I16838afe5bbe966c3a0f638869751f9b59a5a259
changes are:
1) for pushed atoms, use attribution node in place of uid when
appropriate
2) name changes to be more consistent
Bug: 73823969
Test: manual test
Change-Id: Iacf7186dbd7a2282f7fe481f43dbbf92e1165b47
We should fail right away if someone forgets to set this field
instead of setting default values since this may lead to hard to
explain bugs in the future (eg, why isn't systemui logging an atom
to a config).
Bug: 74608359
Test: Verified unit-tests still pass on marlin-eng.
Change-Id: Ibd8a6ccbc5cc8d2dfb8a1577c64bf9b49822a2c3
Fix the bug when serializing multiple atoms in gauge metric
BUG: b/74159560
Test: new test for ALL_CONDITION_CHANGES sampling method.
Change-Id: I6d33c1efbac92b6e13be2d64c323e090cb1f84aa