I had the opportunity this summer to work as a Software Engineer Intern at Optimizely. I was a frontend engineer on the Web Squad. In just 12 weeks, I met a couple of new friends, removed a major pain point for Optimizely customers, hacked on an awesome internal tool, and had the greatest summer I've ever had, right in the heart of San Francisco.

My experience at Optimizely has been phenomenal. I wanted an internship that gave me good insight on the work a full-time software engineer did and allowed me to work on meaningful projects. Optimizely was able to provide me with exactly that. During my 12 weeks there, Optimizely had an operational reorganization for engineering teams (squads!) and hired a new CEO, Jay Larson.

Angus standing across the street from Optimizely See accompanying Optimizely Squad swag that came with the re-org. Me on my last day of the internship. Thanks Alan, my mentor, for taking the picture!

Now, onto my project! My main project for the summer was implementing a revamped version of a setting called Cross-Origin Tracking, found in Optimizely X Web projects. The origin (heh) of this setting stems from the implementation of Optimizely's embedded JavaScript snippet for shared storage between domains. In order for Optimizely customers to track their visitors across multiple domains, subdomains, protocols, and ports, they'll need to add an entry to the cross-origin tracking settings for each site they own and want to track.

Now, you're probably thinking, why don't you just track every event the customer's JavaScript snippet ran on? It'll only exist on the customers' respective sites because why would anyone run the script on anything else. Now they don't even need to know the concept of cross-origin tracking! Problem solved, I'm going to the Giants game.

The stereotypical SF techie intern Giants game This was supposed to be a picture of the Optimizely 2017 interns and mentors going to the Giants game, but I can't find it so :shrug:

Well, the snippet can exist on pages the customers don't want to track:

  1. The snippet code is public. Anyone can put the snippet on their page and it'll track visitor data.
  2. Staging environments. These sites usually run code identical to those that'll be deployed eventually, but since its subject ot change they most likely don't want to track data on those pages.

These are two cases where customers definitely don't want to track visitor data. For the first case, we don't want the origin entries to be overly permissive or else there could be misrepresented data. In the second case, that means the customer would want a way to manually override any currently determined origins, in the case that they're still working on anything under a specific domain or subdomain. Considering these two things, there's no easy way to solve this problem without some user guidance, so we're going to need a way to let the customer tell us which sites to track events.

The old cross-origin tracking settings took a restrictive-first approach in handling entries--customers needed to add entries manually to the list. Optimizely provided some nice ways to whitelist a group of domains with match types. This was a confusing and important setting that many users of Optimizely, with or without technical backgrounds, ran into. Organizations are deploying their sites to multiple domains and protocols (especially TLS/HTTPS) nowadays, so it is necessary to be able to track visitor data between all of those sites. This became a major pain point for Optimizely customers, and they ultimately need help from Customer Support Engineers to sort out their problem with tracking event data.

To minimize interactions with cross-origin tracking and reduce the instances where customers are inputting the same piece of information twice, we need a way to automatically determine the origin entries for customers with the information (URLs in the pages they added for experimentation) they already provided. Additionally, for power users where they have staging environments or any pages they don't want to track on, they'll need the flexibility to override the automatically determined origins.

Another problem the old cross-origin tracking had was that the settings on the dashboard was inconsistent with the behavior of the JavaScript snippet. The previous settings were located on a per-project basis, where you can only view and change the cross-origin tracking for one project at a time. This sounds useful, because it reduces the amount of information noise the customer has to deal with. Unfortunately, since there is only one JavaScript snippet per account, the cross-origin settings from every projects are combined into one list, then shared between all projects (If you are using the shiny new Custom Snippets, cross-origin tracking is shared between all of your snippets!).

As a result, there's a visual disconnect between the customer dashboard and the snippet they use on their site. In the case where there is a cross-origin tracking entry buried in one project that is affecting another project, it quickly gets out of control the more projects they have.

This means before I can make cross-origin tracking smart, I have to make cross-origin tracking right.

Now that I figured out all the problems...

  1. Move cross-origin tracking from a project-based setting to an account-based setting.
  2. Implement an intelligent way of discovering the URLs customers' experiments and determining their cross-origin tracking entries.
  3. Now that we're automatically discovering origins for customers, there needs to be a way for customers to remove entries from cross-origin tracking.

... I can implement the solution.

New Optinauts help organize food at the SF Marin Food Bank Unrelated picture: Fellow Optinauts who started on the same day, at our volunteer event at the SF Marin Food Bank

Starting off, the old cross-origin settings saved data within the projects of an account. In order to have it exist on the account level, the data will need to be saved on the project level. That means the data model needed to be modified for both projects and accounts, moving the field from projects to accounts. Since the old cross-origin tracking was an existing feature that many, if not all, customers use rely on for their experiments, the current data within the account will need to be preserved and eventually moved to the project settings. This mean that the old settings cannot be simply removed and replaced with the new version.

The solution we chose was to implement the new cross-origin tracking first in parallel with the old one, along with a boolean flag on accounts that determines if the account is ready to use the project-based setting. Once the cross-origin tracking data from each project within an account is migrated to the account model, we can safely flip the switch on the account. The boolean flag will determine which set of origins will be built into the account's JavaScript snippet and which dashboard UI gets displayed for the customer. This is an elegant solution that makes the transition seamless to customers.

To accomplish this, I needed to write a migration script that took the data from the old fields, remove the duplicates (in theory, an account can have multiple projects with the same origin entries), and insert them into the new account-based fields. Once the data migrated successfully, I can flip the boolean flag on the account to switch them over to the new settings.

Once cross-origin tracking now exists on the account-level, I can continue to implement the functionality to automatically determine the cross-origins for an account. When an experiment is created on Optimizely, the customer is assumed to want to take advantage of the features of Optimizely and enable cross-origin tracking for all subdomains and protocols on the same domain. Sounds simple enough, so let's move onto the next step.

Optimizely interns sit on the 'intern' couch The 2017 Optimizely interns Zach, Angus, Caitlin, Derek, and Flora sitting on the intern couch in a candid manner.

Like I mentioned earlier, the entries for cross-origin tracking have a property called match types, which helps customers identify and whitelist a set of URLs given a pattern. I can leverage this property match types have to help customers automatically set their origin settings. There is a match type called Substring match that, as the name implies, will match and whitelist any website with a given substring. For example:

Substring match "map"
-> http://maps.google.com
-> https://maps.apple.com
-> https://mapquest.com
-> ...

If I were to enable cross-origin tracking for all subdomains and protocols on the same domain, substring match works well. I can take an experiment's URL and add it as an entry. However, there is a case where Substring match will match with unintended URLs. As an example:

Experiment URL: http://optimizely.com/some/path/here
Origin extracted: optimizely.com
Substring match "optimizely.com"
-> http://www.optimizely.com
-> https://app.optimizely.com
-> https://help.optimizely.com
-> https://notoptimizely.com
-> https://optimizely.companyevil.com

Now, clearly notoptimizely.com and companyevil.com are unintended matches as they exists outside of the optimizely.com domain. We really only want to match the suffix of a root domain and not just the substring in this case. The solution is to introduce a new match type, Suffix match, that prevents matching any domains outside of a given URL's root domain.

The way that Substring match is implemented is basically a preset Regex match, so Suffix match should be implemented the same way. The changes are to insert an explicit start or a period (^|\\.) at the beginning of the origin and insert an explicit end $ at the end of the origin. This will make sure the root domain of any URL will always be the origin and nothing else.

By implementing Suffix match, it can significantly reduce the amount of data noise a customer needs to see on their dashboard (as opposed to seeing every entry), while maintaining useful information about their cross-origin tracking settings.

Optimizely interns make rainbow cookies The 2017 Optimizely interns make pride cookies for the office to celebrate San Francisco Pride.

Now that the Suffix match type is set up for use, the last part is to backtrack a little and actually determine the origin to extract from a given experiment URL. A naive solution is to take the root domain from all the experiments a customer is running, dedupe the list, and add them as cross-origin tracking entries with Suffix match. However, there are cases where a customer might want to run an experiment on a URL where they don't own the entire root domain, but a subdomain (e.g. GitHub pages, Heroku instances, AWS instances etc). For example:

Experiment URL: http://optimizely.github.io/some/more/paths
Naive solution yields: github.io

In this case we definitely don't want to set github.io as an origin entry. This can even extend into further subdomain "chunks" (turns out there's not really a word to describe the subdomain parts separated by periods), so we need to be a generalized solution. We want the find the set of least permissive origins for the customer's experiments that accomplishes the goal of removing the configuration step.

As to how an automatically discovered origin entry is determined, I decided that if a customer runs two or more experiments with URLs that contain the same subdomains chunks, I can assume that they control the subdomain chunk one level lower (can be the root domain too), and the algorithm will provide a cross-origin tracking entry that encompasses those URLs.

That's a lot of words and might not make sense, but the behavior is actually quite simple. Here's a few examples:

Experiment URL: http://three.two.one.happynewyear.com
Auto Origin: three.two.one.happynewyear.com

Experiment URLs:
http://three.two.one.happynewyear.com
https://m.two.one.happynewyear.com
www.two.one.happynewyear.com
Auto Origin: two.one.happynewyear.com

Experiment URLs:
https://three.two.one.happynewyear.com
https://www.happynewyear.com
Auto Origin: happynewyear.com

Thankfully this step can be completed in linear complexity, with a dictionary of root domains to _n_-ary trees, where the _n_-ary tree stored the subdomain chunks into each node. Once all of the URLs are broken down into chunks and inserted into the right nodes, it traversed the tree to find the deepest node with only one child for each domain. The result was then added as a Suffix match entry into an account's cross-origin tracking setting. This allows us to set the least permissive origins that encompasses all of a customer’s domains. Yay! We solved cross-origin tracking!

One last thing; Some customers have staging or development environments set up on the same root domain (i.e. staging.company.com or dev.company.com), and they don't want to the JavaScript snippet to be enabled on those environments. The last thing is to allow a customer to remove an automatically discovered origin entry. We don't want customers to have the ability remove URLs from the list of automatically discovered URLs, since the automatically discovered origins are should be drawn from the collection of URLs a customer has in their account. If the customer has the ability to remove from the list, we'd have to answer questions like how should we keep track of it in our database, how that get should be presented to the user on the frontend, what happens when the customer adds another experiment, what happens when they remove the experiment from the list, etc.

Instead, a more sensible approach is to introduce Blocked Origins, a set of origin entries that trump and block the automatically discovered origins and manually added origins. In the off chance that a good majority of the automatically discovered origins are not suitable, the customer can always disable the feature entirely and revert back to manual input.

With all of these parts combined, this makes up the new version of Cross-Origin Tracking for Optimizely X Web! The feature is now live on Optimizely's dashboard, but hopefully you'll never need to touch the settings!

You can find Cross-Origin Tracking settings today on the Optimizely X's account settings page

I had a lot of fun working on my project! I was given a lot of responsibility and at the end it was very rewarding. Although I was the primary engineer for this feature, it wouldn't have been possible without the help and advice from my mentor, other engineers, product managers, and my engineering manager. My Optimizely experience wasn't awesome just because of my project, but also working with my coworkers, hanging out with the other interns, hacking on an internal tool, and so much more.

This post is getting a little long, so I'll save the rest for another time. I am really grateful that Optimizely brought me out to San Francisco this summer and I will never forget this remarkable experience.