Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate tabs never arriving in iOS devices #3097

Closed
eoger opened this issue May 6, 2020 · 11 comments
Closed

Investigate tabs never arriving in iOS devices #3097

eoger opened this issue May 6, 2020 · 11 comments
Assignees

Comments

@eoger
Copy link
Contributor

eoger commented May 6, 2020

This bug is to figure out what's causing a tab sent from another device to never arrive on iOS with no notification at all (not even "Tap to begin").
It seems the problem is pretty prevalent, as I can reproduce reliably when sending a bunch of tabs in short intervals to my iPhone.

┆Issue is synchronized with this Jira Spike
┆Story Points: 5
┆Epic: FxA Ecosystem (backlog)
┆Sprint: SYNC - end 2020-05-22

@eoger
Copy link
Contributor Author

eoger commented May 6, 2020

Update:

From investigation with @jbuck, the autopush servers are returning with a Received unexpected response code error. However we don't log the response code we're using the default JS's toString that doesn't include extra properties.

@eoger
Copy link
Contributor Author

eoger commented May 6, 2020

Filed mozilla-services/autopush#1383 for further investigation on the Push side.

@eoger
Copy link
Contributor Author

eoger commented May 6, 2020

cc @AnaMedinac as this is relevant to your product interests :)

@rfk
Copy link
Contributor

rfk commented May 6, 2020

I have to imagine this is some sort of rate-limiting being enforced by the apple push servers, as a first guess :-/

Given that push is so central to how we expect send-tab to work, do you think it would be worth having the FxA server fail the invoke_command request if it's not able to send the push notification? IIRC it's currently fire-and-forget.

@eoger
Copy link
Contributor Author

eoger commented May 6, 2020

Given that push is so central to how we expect send-tab to work, do you think it would be worth having the FxA server fail the invoke_command request if it's not able to send the push notification? IIRC it's currently fire-and-forget.

It might be worth doing if we hit a 100% failure rate (N devices to notify, N failures) in push.js. However if we throw here we most likely want to try-catch in the other methods (notifyDeviceConnected) since their callers are not expecting them to throw.

@eoger
Copy link
Contributor Author

eoger commented May 14, 2020

I'm going to close this since upgrading autopush "seems" to have fixed the problem and open some relevant bugs.

@rfk
Copy link
Contributor

rfk commented May 15, 2020

upgrading autopush "seems" to have fixed the problem

For completeness, by "upgrading autopush" do you mean the fix in mozilla-services/autopush#1385 ?

@eoger
Copy link
Contributor Author

eoger commented May 15, 2020

Yes that's correct, I still don't understand how it fixed the problem though...

@eoger
Copy link
Contributor Author

eoger commented May 15, 2020

Actually, while retrying this for a last time when closing the issue, I've been able to reproduce both with my phone and my node "helper app":

Error sending push notification WebPushError: Received unexpected response code
    at IncomingMessage.<anonymous> (/Users/eoger/pushiostest/node_modules/web-push/src/web-push-lib.js:341:20)
    at IncomingMessage.emit (events.js:323:22)
    at endReadableNT (_stream_readable.js:1204:12)
    at processTicksAndRejections (internal/process/task_queues.js:84:21) {
  name: 'WebPushError',
  message: 'Received unexpected response code',
  statusCode: 502,
  headers: {
    'access-control-allow-headers': 'content-encoding,encryption,crypto-key,ttl,encryption-key,content-type,authorization',
    'access-control-allow-methods': 'POST',
    'access-control-allow-origin': '*',
    'access-control-expose-headers': 'location,www-authenticate',
    'content-type': 'application/json',
    date: 'Fri, 15 May 2020 14:38:41 GMT',
    server: 'nginx',
    'strict-transport-security': 'max-age=31536000;includeSubDomains',
    'content-length': '175',
    connection: 'Close'
  },
  body: '{"code": 502, "errno": 999, "error": "", "more_info": "http://autopush.readthedocs.io/en/latest/http.html#error-codes", "message": "APNS returned an error processing request"}',
  endpoint: '<redacted, ask on Slack>'
}

@eoger eoger reopened this May 15, 2020
@eoger
Copy link
Contributor Author

eoger commented May 15, 2020

From Slack discussion with @jrconlin, we have identified a few things:

  • Autopush does not log APNS status codes.
  • We think that, because the delivery issues came back ~7 days after the last deployment, we might be having problems with long-lived HTTP2 connections with the APNS servers (or certs/tokens/whatever are expiring somehow). Logging the APNS errors is crucial to determine what's happening.

In the meantime, JR proposed to add timeouts to kill inactive APNS connections (mozilla-services/autopush#1393) and add logging.

@vladikoff
Copy link
Contributor

QA filed a new bug for this mozilla-mobile/firefox-ios#6637

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants