Catching Compromised Cookies – Slack Engineering

Slack makes use of cookies to trace session states for customers on slack.com and the Slack Desktop app. The ever-present cookie banners have made cookies mainstream, however as a fast rebrisker, cookies are just a little piece of client-side state related to an internet site that’s despatched as much as the net server on each request. Web sites use this piece of data to inject state into the inherently stateless protocol of HTTP. At Slack, meaning each time you signal right into a workspace, your cookie (which we name the session cookie) is up to date to replicate this.

Since session cookies are ceaselessly used to uniquely establish customers in functions throughout the web, they’ve develop into an apparent goal for malicious actors seeking to acquire entry to methods. If hackers current a cookie as their very own, the web site will sometimes grant them entry as in the event that they had been the unique person. Malicious actors typically purchase these cookies by means of malware working on a person’s machine, utilizing the malware to silently steal cookies and different delicate information and ship them to a server managed by the attackers. Utilizing this stolen information permits them to realize entry to quite a lot of web functions starting from banking providers to social media websites. The implications of this may be extreme, starting from monetary loss and id theft to the publicity of confidential communications and private data.

Slack workspaces comprise delicate information and may be a horny goal for attackers. Think about the scenario the place a menace actor phishes a person and manages to put in malware on their machine. The malware may then steal cookies, that are saved within the machine’s browser, and replay these cookies to impersonate the person. To take an actual world instance, think about you left your home key beneath the mat and somebody managed to find it, clone it, and put it again so that you had no thought. One solution to cut back the danger of a copied secret is to alter your locks frequently. In case you do this, a thief would have solely a restricted window of time to make use of the important thing they copied.

In Slack, the analogue of fixing your lock is the session length characteristic. Admins can configure how lengthy they need somebody’s session to final earlier than they need to log in once more. This helps restrict the danger of stolen cookies, but it surely’s not good. Attackers nonetheless have a window of time to make use of their copy of the cookie and session length doesn’t inform us when an attacker is energetic. As well as, customers get annoyed when the session length is simply too quick as they discover themselves having to sign up once they’re simply attempting to get work performed.

Cookies for varied websites are ceaselessly compromised by actual attackers seeking to acquire entry to firm data. Malware operators steal cookies and promote them on darkish net marketplaces to the very best bidder. Whereas we are able to’t make sure the safety of the units our prospects use to entry Slack, we needed to go additional to guard our prospects’ information. This weblog talks about how we are able to detect when cookies are stolen and alert workspace directors.

Detecting cookie misuse

The core thought behind our technique is to detect session forking. That’s, understanding if a cookie is getting used from a couple of machine on the identical time:

To detect session forking, we use a number of elements to detect alerts in parallel. These elements can cowl the gaps between one another and improve the accuracy of our system. A very powerful element is the final entry timestamp.

Final entry timestamp

The final entry timestamp corresponds to when the server set the cookie on the consumer. We retailer the timestamp each within the cookie and within the database. On future requests, we examine the timestamp on the incoming cookie with the timestamp within the database. If they don’t match, this means that the person is sending an outdated model of the cookie.

We frequently refresh the cookie with a more moderen final entry timestamp and replace the database accordingly. If a malicious actor obtains a stolen cookie, they’ll probably obtain an outdated model with an outdated timestamp. After they use that cookie to entry Slack, we’ll examine the outdated timestamp within the cookie with the newer worth within the database. Since they don’t match, we’ll detect that the session has been forked.

A foul actor may attempt to forestall this by frequently interacting with Slack through the stolen cookie. In that case, we’d replace the final entry timestamp for the unhealthy actor’s cookie and the database. When the unique person begins Slack once more, they current their outdated copy of the cookie. We examine that with the newer worth within the database and once more decide {that a} session fork has occurred. Based mostly on the final entry time, we don’t know which aspect of a forked session is legit. We will solely inform that there are two (or extra) copies of the cookie when there ought to be one.

Testing

As soon as we had a fundamental model of the system working, the following step was to guage its effectiveness. Our preliminary outcomes weren’t supreme. We had a real optimistic within the type of a coworker who was utilizing their cookie to automate actions in Slack. However in varied circumstances, our detection logic resulted in each false negatives and false positives. For the characteristic to be a significant safety enchancment, we’d like dependable detection to have the ability to act on the alerts we generate. Our pilot prospects deliberate on robotically invalidating classes which may have been forked, which meant that our excessive variety of false positives could be disruptive to their work.

False positives

From our investigation, we discovered that customers had been triggering detection occasions whereas going about their regular day. We discovered many alternative edge instances that prompted this. Typically, we might attempt to set a brand new cookie with an up to date timestamp, however the consumer by no means obtained the brand new cookie. That meant the Slack consumer now had a unique final entry time from the database, making it current equally to an outdated, stolen cookie. This case would end in a false detection occasion.

So we launched the IP tackle. If the final entry time is completely different, however the IP tackle matches the IP saved within the database alongside the outdated timestamp, the request is probably going coming from the identical laptop and subsequently unlikely to be stolen. This modification alone eradicated a big share of the false positives, however failed to deal with a number of the key shortcomings within the design.

For the final entry timestamp to work, we’d like shoppers to reliably set cookies. Now we have varied hypotheses for why shoppers weren’t setting cookies, reminiscent of laptops going to sleep earlier than the server may reply.

We must always solely replace the timestamp within the database after we all know the consumer has saved the brand new cookie. To perform this, we use a two-phased strategy, the place every request is idempotent. We replace the session cookie by setting a separate “session candidate” cookie. If we obtain a request with a more recent session candidate cookie set, we put it on the market to the session cookie. We replace the timestamp within the database after the consumer presents us with a more recent timestamp through the session candidate cookie.

With this strategy, if the consumer doesn’t obtain a response for any specific request, we’ll decide up the place we left off within the course of. If the server tries to set a session candidate cookie, however the consumer doesn’t current a session candidate cookie on the following request, we’ll simply set it once more. Likewise, if the consumer doesn’t obtain the headers to advertise the worth within the session candidate cookie to the session cookie, we’ll simply embrace these headers on the following request. When the consumer offers each session candidate and session cookies, we’ll contemplate both timestamp worth when evaluating with the database timestamp. Within the above diagram, the session cookie would match the database since that is the primary request that the consumer sends the session candidate cookie. Within the final request of the diagram under, the session candidate cookie will match the timestamp within the database.

Now we have additionally performed work to mitigate the impact of race circumstances the place the consumer sends a bunch of API requests in fast succession. We need to keep away from the scenario the place we replace the database on the primary request that is available in, however different requests are already in flight with the outdated model of the cookie. If the timestamp within the database was simply up to date, we don’t have a correct outdated worth to check with the incoming cookie timestamp. To that finish, we ignore the timestamp in these requests. A request on this prompt may theoretically evade detection, however it might be very arduous for an attacker to foretell precisely when the unique person sends the primary request inflicting the database to be up to date. An attacker can’t take a number of guesses to attempt to time the window as a result of if anyone request falls exterior the window, we’ll detect that the cookie has been forked. This reduces false positives from in-flight requests with out compromising the worth provided by the characteristic.

Danger degree measurement

We now have some new data along with the final entry timestamp (i.e. details about the machine and community) that we are able to mix. We then algorithmically generate an evaluation about whether or not a detection is a real or false optimistic. With our calculated chance, we categorized the danger as low, medium or excessive. For something decided to be excessive danger, we ship an occasion to the audit log. We’re persevering with to enhance our algorithm to additional cut back false positives.

Efficiency considerations

Within the diagrams above, we deal with the logic round updating the final entry timestamp within the cookie and database. That’s probably the most advanced interplay of this technique, however not the most typical. For the overwhelming majority of API requests, we merely examine the timestamp with the present worth and decide if the request is an anomaly.

As a consequence of Slack’s real-time nature, our shoppers may be very chatty and ship many API requests throughout easy person interplay. As introduced above, our final entry timestamp must be learn from the database on each request. Introducing a brand new database learn on each request could be vital by way of load. Whereas a few of this load might be taken by a cache, we are able to simplify additional and keep away from a number of the database reads within the first place.

Diagram showing several requests in short succession which do not update the database before a final request sets a new timestamp and updates the cookie

If the final entry time within the cookie is latest, we all know the cookie is in energetic use since meaning the server simply set it. This implies if the session had been forked, we might have already triggered a detection occasion. We will keep away from studying from the database till a while has handed, primarily based on the belief that attackers don’t immediately steal and promote cookies. When the cookie ages out of that window, we set a contemporary cookie. This strategy permits us to keep away from interacting with the database on a major majority of API requests. This strategy additionally lends nicely to the utilization patterns of Slack customers, who typically use Slack in bursts with many API requests.

Rollout

As with the opposite anomaly detections we’ve rolled out, we labored carefully with pilot prospects to develop their understanding of the characteristic. Anomalies aren’t meant as a transparent indicator of malicious habits a lot as one thing sudden in an setting and ought to be investigated as probably malicious. In some instances this cookie anomaly may occur for regular causes, reminiscent of a pc being restored from a backup. We labored carefully with our pilot prospects to validate and enhance our detection capabilities.

This restricted rollout gave us the chance to raised perceive the efficiency traits of our design in addition to examine sources of noise within the information. The data we collected at this stage led to a number of key enhancements, together with our two-phase cookie updating strategy. After lowering the noise to a suitable degree and validating that the characteristic labored as anticipated, we steadily rolled out the detection logic to the remainder of Slack.

We talk detection occasions to prospects through Slack’s audit log. Clients can ingest audit logs into their very own Safety Occasion Supervisor reminiscent of Splunk or ELK and mix it with different information streams to attract a conclusion concerning the safety of their customers’ information.

Future improvement

At present we’re delivering detections to prospects through the audit log and permitting them to correlate logs of their inside instruments to make acceptable safety choices. Sooner or later, we imagine we may additional enhance the system by robotically invalidating classes flagged with a high-risk detection. This is able to robotically signal out each the legit customers and attackers. The legit customers must re-authenticate with Slack, whereas attackers would lose the connection and skill to impersonate the person.

All for constructing revolutionary tasks and making builders’ work lives simpler? We’re hiring 💼

Apply now

Source link