I had the opportunity this summer to work as a Software Engineer Intern at Optimizely. I was a frontend engineer on the Web Squad. In just 12 weeks, I met a couple of new friends, removed a major pain point for Optimizely customers, hacked on an awesome internal tool, and had the greatest summer I've ever had, right in the heart of San Francisco.
My experience at Optimizely has been phenomenal. I wanted an internship that gave me good insight on the work a full-time software engineer did and allowed me to work on meaningful projects. Optimizely was able to provide me with exactly that. During my 12 weeks there, Optimizely had an operational reorganization for engineering teams (squads!) and hired a new CEO, Jay Larson.
See accompanying Optimizely Squad swag that came with the re-org. Me on my last day of the internship. Thanks Alan, my mentor, for taking the picture!
This was supposed to be a picture of the Optimizely 2017 interns and mentors going to the Giants game, but I can't find it so :shrug:
Well, the snippet can exist on pages the customers don't want to track:
These are two cases where customers definitely don't want to track visitor data. For the first case, we don't want the origin entries to be overly permissive or else there could be misrepresented data. In the second case, that means the customer would want a way to manually override any currently determined origins, in the case that they're still working on anything under a specific domain or subdomain. Considering these two things, there's no easy way to solve this problem without some user guidance, so we're going to need a way to let the customer tell us which sites to track events.
The old cross-origin tracking settings took a restrictive-first approach in handling entries--customers needed to add entries manually to the list. Optimizely provided some nice ways to whitelist a group of domains with match types. This was a confusing and important setting that many users of Optimizely, with or without technical backgrounds, ran into. Organizations are deploying their sites to multiple domains and protocols (especially TLS/HTTPS) nowadays, so it is necessary to be able to track visitor data between all of those sites. This became a major pain point for Optimizely customers, and they ultimately need help from Customer Support Engineers to sort out their problem with tracking event data.
To minimize interactions with cross-origin tracking and reduce the instances where customers are inputting the same piece of information twice, we need a way to automatically determine the origin entries for customers with the information (URLs in the pages they added for experimentation) they already provided. Additionally, for power users where they have staging environments or any pages they don't want to track on, they'll need the flexibility to override the automatically determined origins.
As a result, there's a visual disconnect between the customer dashboard and the snippet they use on their site. In the case where there is a cross-origin tracking entry buried in one project that is affecting another project, it quickly gets out of control the more projects they have.
This means before I can make cross-origin tracking smart, I have to make cross-origin tracking right.
Now that I figured out all the problems...
... I can implement the solution.
Unrelated picture: Fellow Optinauts who started on the same day, at our volunteer event at the SF Marin Food Bank
Starting off, the old cross-origin settings saved data within the projects of an account. In order to have it exist on the account level, the data will need to be saved on the project level. That means the data model needed to be modified for both projects and accounts, moving the field from projects to accounts. Since the old cross-origin tracking was an existing feature that many, if not all, customers use rely on for their experiments, the current data within the account will need to be preserved and eventually moved to the project settings. This mean that the old settings cannot be simply removed and replaced with the new version.
To accomplish this, I needed to write a migration script that took the data from the old fields, remove the duplicates (in theory, an account can have multiple projects with the same origin entries), and insert them into the new account-based fields. Once the data migrated successfully, I can flip the boolean flag on the account to switch them over to the new settings.
Once cross-origin tracking now exists on the account-level, I can continue to implement the functionality to automatically determine the cross-origins for an account. When an experiment is created on Optimizely, the customer is assumed to want to take advantage of the features of Optimizely and enable cross-origin tracking for all subdomains and protocols on the same domain. Sounds simple enough, so let's move onto the next step.
The 2017 Optimizely interns Zach, Angus, Caitlin, Derek, and Flora sitting on the intern couch in a candid manner.
Like I mentioned earlier, the entries for cross-origin tracking have a property called match types, which helps customers identify and whitelist a set of URLs given a pattern. I can leverage this property match types have to help customers automatically set their origin settings. There is a match type called Substring match that, as the name implies, will match and whitelist any website with a given substring. For example:
Substring match "map" -> http://maps.google.com -> https://maps.apple.com -> https://mapquest.com -> ...
If I were to enable cross-origin tracking for all subdomains and protocols on the same domain, substring match works well. I can take an experiment's URL and add it as an entry. However, there is a case where Substring match will match with unintended URLs. As an example:
Experiment URL: http://optimizely.com/some/path/here Origin extracted: optimizely.com Substring match "optimizely.com" -> http://www.optimizely.com -> https://app.optimizely.com -> https://help.optimizely.com -> https://notoptimizely.com -> https://optimizely.companyevil.com
companyevil.com are unintended matches as they exists outside of the optimizely.com domain. We really only want to match the suffix of a root domain and not just the substring in this case. The solution is to introduce a new match type, Suffix match, that prevents matching any domains outside of a given URL's root domain.
The way that Substring match is implemented is basically a preset Regex match, so Suffix match should be implemented the same way. The changes are to insert an explicit start or a period
(^|\\.) at the beginning of the origin and insert an explicit end
$ at the end of the origin. This will make sure the root domain of any URL will always be the origin and nothing else.
By implementing Suffix match, it can significantly reduce the amount of data noise a customer needs to see on their dashboard (as opposed to seeing every entry), while maintaining useful information about their cross-origin tracking settings.
The 2017 Optimizely interns make pride cookies for the office to celebrate San Francisco Pride.
Now that the Suffix match type is set up for use, the last part is to backtrack a little and actually determine the origin to extract from a given experiment URL. A naive solution is to take the root domain from all the experiments a customer is running, dedupe the list, and add them as cross-origin tracking entries with Suffix match. However, there are cases where a customer might want to run an experiment on a URL where they don't own the entire root domain, but a subdomain (e.g. GitHub pages, Heroku instances, AWS instances etc). For example:
Experiment URL: http://optimizely.github.io/some/more/paths Naive solution yields: github.io
In this case we definitely don't want to set
github.io as an origin entry. This can even extend into further subdomain "chunks" (turns out there's not really a word to describe the subdomain parts separated by periods), so we need to be a generalized solution. We want the find the set of least permissive origins for the customer's experiments that accomplishes the goal of removing the configuration step.
As to how an automatically discovered origin entry is determined, I decided that if a customer runs two or more experiments with URLs that contain the same subdomains chunks, I can assume that they control the subdomain chunk one level lower (can be the root domain too), and the algorithm will provide a cross-origin tracking entry that encompasses those URLs.
That's a lot of words and might not make sense, but the behavior is actually quite simple. Here's a few examples:
Experiment URL: http://three.two.one.happynewyear.com Auto Origin: three.two.one.happynewyear.com Experiment URLs: http://three.two.one.happynewyear.com https://m.two.one.happynewyear.com www.two.one.happynewyear.com Auto Origin: two.one.happynewyear.com Experiment URLs: https://three.two.one.happynewyear.com https://www.happynewyear.com Auto Origin: happynewyear.com
Thankfully this step can be completed in linear complexity, with a dictionary of root domains to _n_-ary trees, where the _n_-ary tree stored the subdomain chunks into each node. Once all of the URLs are broken down into chunks and inserted into the right nodes, it traversed the tree to find the deepest node with only one child for each domain. The result was then added as a Suffix match entry into an account's cross-origin tracking setting. This allows us to set the least permissive origins that encompasses all of a customer’s domains. Yay! We solved cross-origin tracking!
One last thing; Some customers have staging or development environments set up on the same root domain (i.e.
Instead, a more sensible approach is to introduce Blocked Origins, a set of origin entries that trump and block the automatically discovered origins and manually added origins. In the off chance that a good majority of the automatically discovered origins are not suitable, the customer can always disable the feature entirely and revert back to manual input.
With all of these parts combined, this makes up the new version of Cross-Origin Tracking for Optimizely X Web! The feature is now live on Optimizely's dashboard, but hopefully you'll never need to touch the settings!
You can find Cross-Origin Tracking settings today on the Optimizely X's account settings page
I had a lot of fun working on my project! I was given a lot of responsibility and at the end it was very rewarding. Although I was the primary engineer for this feature, it wouldn't have been possible without the help and advice from my mentor, other engineers, product managers, and my engineering manager. My Optimizely experience wasn't awesome just because of my project, but also working with my coworkers, hanging out with the other interns, hacking on an internal tool, and so much more.
This post is getting a little long, so I'll save the rest for another time. I am really grateful that Optimizely brought me out to San Francisco this summer and I will never forget this remarkable experience.