13 Commits

Author SHA1 Message Date
John Grossman
8100960892 am e1d6c080: Make common_time more deferential when coming out of networkless mode.
* commit 'e1d6c080f0b1769637d742e51cc22167c7af12bb':
  Make common_time more deferential when coming out of networkless mode.
2012-04-09 17:00:24 -07:00
John Grossman
e1d6c080f0 Make common_time more deferential when coming out of networkless mode.
Addresses issues seen in bug 6260139.

This is a really tough bug to repro, but there is no doubt that it is
happening occasionally on our super huge A@H subnet.  I have collected
data all weekend; the failure did not occur, but I got enough to have
a theoretical sequence of events which could trigger this behavior.
The sequence goes like this.

1) A network is running and happy with a timeline master M,
   maintaining timeline X.
2) Device B boots, but its network is taking a long time to come up.
   After 60 seconds of waiting for the network to come up, device B
   goes into networkless master mode and creates timeline Y.
3) Device B's network comes up.  It immediately sends a master
   announcement saying that it is the current low-priority master of
   timeline Y (its low priority because it has never had any real
   clients)
4) Master M ignores B because B is low priority.
5) Device C boots and sends out a who is master request.  It is a race
   between M and A to see who will respond first.  In this case, A
   responds first.
6) C sends B a request which B receives.  B now has its first client
   and is now high priority.  In this scenario, B matches M in all
   aspects of the priority ranking function, including winning the tie
   breaker (larger MAC address when interpreted as a 48 bit integer)
7) M sends its master announcement; it is ignored by B since B
   now wins in the ranking function vs M.
8) Finally, B sends its next master announcement.  M sees it, realizes
   that there is a higher priority master out there (looks like a
   bridged network scenario to M).  M gives up master status along
   with timeline X.  The clients of M become clients of B and move
   from timeline X to timeline Y (something which should only be
   needed during an actual network bridging event)

This change has a few different things meant to severely minimize the
chance that this can happen.

First, and the most important change, is that networkless masters do
not immediately announce themselves as masters on the network they are
joining.  Instead, they transition into Ronin to discover any
pre-existing masters on the network.  If there are no masters out
there, the device will simply transition back to master and continue
to maintain the timeline it had in networkless mode.  In the scenario
above, however, B should discover M and become its client, preserving
the established timeline X.

Second, any time a device experienced an interface reconfiguration
(including coming out of networkless mode), it clears its high
priority bit.  This is a good thing.  The bit used to get set again
any time...

1) The device is master and receives a client request.
2) The device becomes a client of another master on the network.
3) The device becomes a master.

Number 3 in this list is a mistake.  The high priority bit should only
be set for devices during master election which have been
participating in a timeline which has been used by multiple devices.
We know that this is the case when we are master and receive a
request.  We also know that this is the case when we hear from a
master and decide to become its client.  Simply becoming a master
should not make us high priority.  This behavior has been removed.

Third, timeouts have been adjusted just for some extra "stickyness"
when it comes to master status.  Clients now say in the Ronin state
for up to 10 seconds looking for a master sending up to 20 discovery
requests, instead of only 3 seconds (sending 6 requests).  The
wait-for-election timeout has been adjusted up from 5 seconds to 12.5
seconds to track the longer election cycle as well.  Also, while in
steady-state, clients will now wait until 10 packets (10 seconds)
have not been answered by its master before giving up and dropping
into Ronin.

Change-Id: I438b39f31265e34d6719d4adfa9e8b95a2afc188
Signed-off-by: John Grossman <johngro@google.com>
2012-04-09 15:33:53 -07:00
Kent Ryhorchuk
11bc45fcba New clock sync control loop.
Change clock sync control to velicity form PI loop. Tuned for office LAN and
WiFi conditions, will probably perform better in clean environments.
Improve packet filtering to prevent clock sync on bad rtt.
Changed diag interface to take rtt times, P, I, D are no longer supported.

Change-Id: Iad2b26eb44cd222ec5f219b49669e2d6baec9d1c
2012-02-17 09:46:37 -08:00
John Grossman
ba2ff9c223 Really fix the build this time.
Cannot try to include <limits> on git_master-without-vendor.  The file
just does not exist.

Change-Id: Iae383465c59d1cf59a9ba3f729f8f074971f7ce4
2012-02-16 15:25:51 -08:00
John Grossman
b8525e9a76 Fix the build.
Looks like not all flavors of the android build include support for
std::numeric_limits.  Fix the build by using a simple macro for now.
A more elegant solution can be searched for once the build is green
again.

Change-Id: I18329cd0d26ca69de6a52df9a1c6eeb3ba063b48
2012-02-16 14:56:08 -08:00
Mike J. Chen
6c92951047 Upintegrate the common_time service from ics-aah.
Move the common_time service developed in the ics-aah branch back into
master.

The common_time service is a small service build to synchronize an
arbitrary timeline amongst peers on a local sub-net.  While running
and configured, the service will elect a master from the set of
available devices within the subnet, define a relationship between the
common_time timeline the local time timeline (provided by the local
time HAL), and then attempt to maintain synchronization between common
and local time by controlling the frequency of the local time clock
via the HAL, or by disciplining local time in the digital domain if
the local time HAL implementation does not support HW slewing.

On its own, the native common time service will do nothing until it is
configured.  The CommonTimeManagementService (running out of the
system server process) is responsible for implementing policy
regarding configuration and operation of the common_time service and
will be added in a subsequent CL.

Change-Id: I71292f9b9b1797665865689c4572c9d3a0552f64
Signed-off-by: John Grossman <johngro@google.com>
2012-02-16 13:45:10 -08:00
Kent Ryhorchuk
17fe247616 New clock sync control loop.
Change clock sync control to velicity form PI loop. Tuned for office LAN and
WiFi conditions, will probably perform better in clean environments.
Improve packet filtering to prevent clock sync on bad rtt.
Changed diag interface to take rtt times, P, I, D are no longer supported.

Change-Id: Id7758262c5f987f07d7091aba6c0874d7c19f387
2012-02-14 16:41:26 -08:00
John Grossman
583a03ac04 Fix device ID selection in the common time service.
Fix an issue I discovered while back-porting this code to master.  The
common time service was using the MAC address of "eth0" (hardcoded) as
its device ID instead of fetching it from the interface it is
currently bound to.  On phones (or any other device with no eth0) this
causes time service to never be able to fetch a device ID as it
should.

Change-Id: Icf8a2006924088efc86065927a648f7f53638657
2012-02-07 16:07:48 -08:00
John Grossman
354edbc80e Implement new common_time service functionality.
Major re-factor of the common_time (formally aah_timesrv) service in
preparation for up-integration into Android master.  This work
includes bug fixes, new features, and general code cleanup.  High
points are listed below.

+ CommonClock interface has been enhanced to allow querying of many
  more low level synchronization details; mostly for debugging, but in
  theory useful to an application as well.
+ CommonTimeConfig interface has been implemented.  This allows a
  management process to configure a number of different parameters
  (many of them new) to control the behavior of the common_time
  service.  Most importantly, the time service can be bound to a
  specific network interface and should only operate on that interface
  an no others.
+ Enhance log messages to be more useful in determining what the time
  service state machine is doing and why.
+ Enhance information provided by dumpsys to provide many more details
  about the quality of time sync and the network conditions which gave
  rise to the current quality conditions.

Features, features, features....
+ Add a feature which lets the high level choose a different master
  election endpoint so that multiple time synchronization domains can
  co-exist on the same subnet (mostly to support a potential use case
  of multiple home domains in a multiple dwelling environment like a
  hotel, dormitory or apartment complex).
+ Add a feature which lets the high level assign a 64-bit group ID
  which allows partitioning of time synchronization domains even when
  the master election endpoint is shared (as it might be if broadcast
  is being used instead of multicast)
+ Add an auto-disable feature which lets the time service drop into
  network-less mode when there are no active clients of the
  common_time service in the device.  Mostly for phones, this allows
  phones to not consume network/battery resources when they don't need
  to maintain common time.
+ Add a feature which lets the high level choose the priority of the
  common_time service in the master election protocol.  This allows
  high level decisions about things like mobile vs non-mobile, wired
  ethernet vs WiFi to affect who ends up with the job of master on a
  given network.  Priority overrides at the low level also allow
  clients coming in from network-less mode to lower their effective
  priority as they join a new network so as to not disrupt any
  stable long-running timeline which may already be active on the
  network.
+ Add the ability to control some of the core parameters of the time
  sync service which effect network load (like the sync polling
  interval and the master announce interval)

Change-Id: I71af15a83cfa5ef0417b406928967fb9e02f55c6
2012-02-06 18:02:32 -08:00
John Grossman
9387f4f800 Add native common time config service.
Define a native service interface for configuring and controlling the
common time service.  Implement the native marshallers and stub the
implementation of the new interface.

Change-Id: Ia6a6a20ef3d221e8829c55be1dd5f98ed996c610
2012-02-03 17:46:00 -08:00
John Grossman
2627965d61 Add marshallers for the new common clock methods.
Add marshallers and stub implementations for new methods in the common
clock interface to support new functionality being added in the
process of integrating the common time service more closely with the
Java level of Android.

Change-Id: Iac2d3fb405d1b64cea1d8e13f988160afb76a06d
2012-02-03 17:45:53 -08:00
John Grossman
7f1d9e1c53 Move the definition of time server state.
Move the State enum up to the ICommonClock interface so it can be
returned for status/debugging up to clients.

Change-Id: I81fef5b96ffc69a4f2e9801b3744feea099ccd47
2012-02-03 17:45:17 -08:00
John Grossman
232f869c99 De-AAH-ify the common time service.
Bulk name change to remove references to Android@Home from the common time
service in preparation for cleanup and up-integration into the master
branch.  Basically, aah_timesrv is now common_time.

Change-Id: I3d3db212f96e8ba171aa36b9c58e27e4a336cb0a
2012-02-03 17:43:21 -08:00