Recently, Linus Tech Tips fell victim to a hack . The hackers exploited vulnerabilities in YouTube’s token authentication implementation. Like any authentication system, auth tokens can be vulnerable to hacking if not implemented correctly. In this blog, we will discuss seven steps to improve your token implementation, reduce the risk and impact of such hacks, and provide tools to limit the danger zone.
Level 0: JWT ☕
JSON Web Tokens (JWTs) are authentication tokens commonly used in web applications. A JWT is a string containing all the information a server needs to authenticate, authorise a user, and provide access to resources.
JWT tokens consist of three parts:
- The header contains metadata about the token, such as the encryption algorithm.
- The payload contains the actual user data.
- The signature is used to verify that the token has not been tampered with or altered. JWT tokens are widely used for their ease of use, scalability, and flexibility.
// Header contains the algorithm used to sign the token
// Payload contains the user data in the form of claims
"name": "John Doe",
// Signature is generated by signing the base64-encoded header and payload with a secret key
Checkout (jwt.io)[https://jwt.io/] a great tool foor
Letting go of sessions
JWTs are a great solution because they provide sessionless authentication. The method does not require the server to maintain any state or session information. The token contains all the information required to authenticate and authorise a user.
After a successful authentication event (like user login or magic link), the required user information is written in a token and signed by the server. Whenever the server receives a request containing the signed token (in a header or a cookie), it can provide the correct access to resources without checking out its records.
This is in contrast to session-based authentication, where the server must maintain a session for each user, which requires additional server resources and can lead to scalability issues.
The benefit of scale
When a server doesn’t have to validate each request with an expensive database call, it can really scale. In the world of multiple mobile devices, native, desktop, and web browsers, handing out tokens signed tokens to authentic users makes the most sense. It scales in terms of compute. Additionally, it scales well architecturally when you factor in that users must authenticate themselves to multiple services in a complex ecosystem.
Stateless tokens have many benefits, but there is one big risk: tokens can be stolen. If someone gains access to your signed token, they gain access to all the resources for which the token has been signed. When a signed token is out in the wild, it’s harder to invalidate. In this blog, we will list some tactics to limit the impact of a stolen token.
Level 1: Cycling Signing key 🔑🔃
Your application signs each JWT issued with a secret key. This signature prevents the content of the token from tampering. Any other service can verify the token later if it can access the secret key.
This key is a serious liability. If a hacker gains access to this key, they can start counterfeiting tokens. The server would have no way to distinguish valid signed tokens from ‘doctored’ tokens. This is why it’s essential to cycle your signing key regularly. This addresses the risk of loitering secrets or brute-forcing keys.
However, swapping your signing key will immediately invalidate all signed tokens currently in circulation. This is why we should always use primary and secondary signing keys.
You use the primary key to sign and verify tokens. During a key cycle, you retire the primary key and replace it with a newly generated key. You retain the old value of the primary key as the secondary key. If there is a secondary key, its value is discarded.
If the system encounters tokens that it can’t verify with the newly generated primary key, it falls back to the secondary key. All newly issued tokens are signed with the primary key.
You can retain the key for a grace period before discarding it or wait for the next cycle.
Level 2: Short expiration times ⌛
First, you should set short expiration times for your tokens. This forces an authentication event (a user login or magic link) to generate fresh tokens. Shorter expiration times help to reduce the impact of a stolen token.
However, the real threat we are fighting is the danger of loitering tokens. These are tokens that, through error or malpractice, are left in chat histories or server logs. Data miners will find these sooner or later, and you better hope they have expired by then. Logs and backup data are often much less secure than operational data. Indefinitely valid tokens turn every data your users generate into uranium.
"name": "John Doe",
"iat": 1516239022, // time it was issued
// Add an expiration timestamp. The token is invalid after this timestamp
// altering it will, of course, invalidate the token's signature.
Level 3: Claims and Scopes 🛂
JWT bases its authorisation syntax on claims. Token claims offer extra user information like name, role and permissions. The server can decide about the user’s resource access using these claims. It’s crucial to limit the number of claims to the necessary information. This keeps the token lightweight and easy to manage.
"sub": "8e0ce8ca-89b3-4f9b-a041-daae0661abb0", // Claim 1: Subject's user id
"name": "John Doe", // Claim2: User's full name
"role": "user-admin",// Claim3: John has a user admin role
A nice benefit is that the UI can use this token-encoded information before talking to the server. It can show the user’s full name and e-mail and even draw tabs to the sections of the application the user has access to.
The scope claim
Scopes define permissions and access levels in a JWT token. They represent a specific action or resource that the token holder can access. For instance, “read:profile” means that the holder can read a user’s profile but not modify it. The “scope” key in the JWT payload contains an array of strings for scopes. The server validates the token and uses the scope for authorisation. Scopes adhere to the principle of least privilege. This ensures tokens have minimal permissions for their intended use.
"name": "John Doe",
Scopes vs claims
Scope claims are standardised and coarse-grained, while claims are customisable and fine-grained. Scopes are pre-defined by the server and share meaning across applications. Claims are defined by the token issuer or the application that consumes them. They can have any syntax or meaning that best suits the specific use case.
You should include the required scopes in the request for a token. Claims are usually included in the response with the token.
Level 4: Access tokens and Refresh Tokens 👨🤝👨
Access tokens are short-lived and authorise requests for protected resources. Conversely, refresh tokens live longer but can only generate new access tokens. The client can request new tokens during user activity. The client uses access tokens to request new refresh tokens and vice-versa. The goal is a continuous background process exchanging access- and refresh-tokens. The user must reauthenticate if both access and refresh tokens expire due to inactivity.
This way, users get a continuous experience without the server issuing forever tokens. I suggest 15 minutes of refresh token lifetime for each minute of access token lifetime. An access-token TTL of 3 minutes would have a refresh token lasting 45 minutes.
Using access- and refresh-tokens to limit user credential exposure improves server security. It helps protect against the risks of token interception or theft. Access tokens are only valid for a short time, while refresh tokens stay encrypted on the server. But to maintain authentication security, it’s important to rotate and revoke tokens regularly.
An Aside for mobile users
Refreshing every 45 minutes due to inactivity might be suitable for web applications. However, you’d have another ratio for mobile. If your service provides both a web and a mobile application, then consider mobile users:
They have a more secure environment to store tokens.
They usually have very narrow screen-time
Mobile users’ inactivity is counted in days, not minutes.
For this reason, you should give mobile users a special refresh token that would be alive for days, maybe weeks. This longer-lasting token can be safely stored behind biometrics or other native auth.
This long-lasting token could only be valid for requesting special mobile access tokens. Mobile applications usually require less functionality. You could limit the claims on this special access token to tighten security.
Level 5: Blacklisting 📃💀
Blacklisting tokens is an exemption from the pattern of stateless tokens. It requires the server to hold the state of revoked tokens. But it is a reasonable compromise to give more power to the user and limit the threat window of tokens.
You can implement a Token Revocation List (TRL) to give users more control over their tokens. This is a list of all known revoked tokens. When a user logs out by hand, the server adds the token they used to log out to the revocation list. The next time the server receives a request with that token, it’s rejected. The server will return a 403 even though the signed token has not expired. A token can be safely removed from the TRL after its expiry date.
The TRL shouldn’t only be used for explicit logout actions. Any sensitive security changes, like password updates or MFA, should revoke the token. This could mean automatically generating a new token or requiring complete reauthentication.
The limit of this approach is that you can’t revoke tokens remotely. You can’t add a token to the list if you don’t have the token. So only users can revoke their own tokens. To give more power to super-users and administrators, we will need something more powerful.
While a revocation list can be helpful, we need more to provide complete security for tokens. The server can only determine which tokens to revoke only when having the token. To address this limitation, you can implement revocation warrants. These are rules for the automatic revocation of incoming tokens. Warrants can be issued by an administrator or by an automated system. These warrants are another layer of state that extends the TRL.
Warrants actively intercept suspect tokens and add them to the revocation list.
Warrants can be lifted manually or automatically. After a warrant is lifted, tokens will no longer be intercepted.
When a system or human detects suspicious behaviour, they could issue a warrant. That warrant can target a user, and the server knows to revoke any tokens from that user. This will effectively quarantine the targeted user until the risk is mitigated. You can target a warrant at a tenant or an e-mail domain to shut down a swath of tokens.
If your need to simultaneously revoke a bunch of tokens that are hard to target, you can issue a timed warrant. A timed warrant targets the ‘iat’ claim. This way, you can revoke all tokens issued before or after specific timestamps.
This is useful when logging out a user ‘From all devices’. If you have signed a bunch of tokens for several user devices, they will be invalidated.
Sometimes you need to invalidate a token not based on its payload but on the request made using that token.
Some examples of request warrants targets:
- requests made outside of office hours
- requests made from suspect IP addresses
- requests that result in a 4xx response
Collect and Combine
You can mix and match all kinds of rules together. Issue comprehensive warrants with composite predicates and enact control on incoming tokens. You can cast a drag-net and shut down all tokens issued after a specific time for all users. Or, you can issue a warrant targeting a particular user set at a narrow time window.
Grant and Automate
With this system, you can grant the ability to revoke tokens to users other than the token’s subject. This is useful when implementing super-users or administrator roles that can manage users. Additionally, you can build automated systems to detect suspicious behaviour and issue warrants.
Level 6: Token promotion 👒>👑
The old US Army Field Manual FM 3-19.30 Physical Security has an excellent chapter on designing restricted areas:
A restricted area is any area that is subject to special restrictions or controls for security reasons
In Software, we often have restricted areas where users can execute risky actions. Often access to these restricted areas requires re-authentication or MFA. The token’s claims or scope controls other times access.
The same manual mentions this technique:
Multiple-Card or -Badge System 7-17. This system provides the greatest degree of security. Instead of having specific markings on the cards/badges denoting permission to enter various restricted areas, the multiple card/badge system makes an exchange at the entrance to each security area. The card/badge information is identical and allows for comparisons. Exchange cards/badges are maintained at each area only for individuals who have access to the specific area.
Let’s extend the analogy of a card/badge to the security token. We get a more secure approach: use one token to exchange it for another. This allows for more granular control of revoking tokens. You can automate this exchange. The user does not have to reauthenticate to escalate privileges. You can revoke access to a restricted area while the user still has access to the controlled area.
This is the underlying principle of access- and refresh tokens . The real power of this comes when your application has multiple restricted areas that a user can hop in and out of. Now your access control systems don’t need to be altered. You identify the different restricted areas and the allowed transitions.
Additionally, if you revoke a user’s token, they can still access other areas. Finally, you can start nesting in restricted areas for increased security.
Level 7: The Security State Machine
It’s no coincidence that we illustrated the previous chapter with state diagrams. Once we start exchanging and promoting tokens, we can talk about state transitions. A user transitions from access to one area to another. These transitions can throw events that trigger token revocations. Revoking the origin token at each transition leaves the user with only the destination token of that transition.
Now the user can access one valid token and thus one restricted area at a time. We can express this security landscape in a diagram:
This way, we can identify that there are only paths to the Exclusion area if we enter Restricted area B
Conclusion : No essions != No state
Wait, but all these events logging and keeping track of blacklists look awfully like state!
Correct! These token techniques are sessionless because there is no central store of sessions. But, we do hold state. The state is the large lookup table of blacklisted tokens representing session history.
This large lookup table can scale very well. You don’t need to refactor your entire auth system to adapt to the growing demand of a shifting ecosystem. Access control can evolve using the same token exchange system.
The user has the freedom to automate most of the exchange steps. Each transition provides us with a wonderful security log feed to monitor. We have tools to intervene in case of suspicious behaviour. Finally, we can ensure that we only expose the resources the user needs. We can granularly control access without locking out the user completely.