The Art of Data Aggregation: Personal Capital @ FinDevR 2014

On September 30, Personal Capital presented at the inaugural FinDevR conference in San Francisco. FinDevR, a spinoff of the successful Finovate conference, is oriented to developers and technologists in the very robust financial technology (aka fintech) space. The conference sponsorship was dominated by some well-known household names, such as MasterCard, Paypal, Intuit, and TD Ameritrade, as well as lesser-known (but prominent in fintech) companies such as Yodlee and XIgnite.

In addition to the big players there, there were plenty of emerging entrepreneurs there among the 40+ sponsors and presenters, and it was a great chance to talk tech, rather than business models and user acquisition, with some smart people that were out showcasing great ideas. Payment technology was a big theme at the conference, as well as small-business and micro-lending; social/community angles on financial domain;  big data (always), and lots more. Several companies were there launching new platforms and service, such as Finicity’s account aggregation platform, and Kabbage’s Karrot lending platform for individuals.  It was a well-organized event, especially considering that this was the first time it had been offered, and organizers said there were over 400 attendees.  (CES it is not, and thank goodness for that).

Ehsan and I presented a talk there for developers, “The Art of Data Aggregation”, sharing some of our ideas for financial data aggregation, the foundation for data-intensive fintech services and applications. While most of the other presenters were there pitching their b2b services for fintech developers, we got to relax and just share a purely technical talk, without the added burden of needing to sell anything. (OK, *fine*, we did mention that Personal Capital helps over 600,000 people manage over $100 billion in assets.  I actually don’t really get tired of saying that).  If you’re interested in watching a video of our 6-minute presentation, or any other the presentations, check it out.


Automate E2E Testing with Javascript

why

  • faster delivery of software and set us on the path of Continuos Delivery
  • immediate discovery of regression
  • and why javascript – facilitate frontend team to actively contribute to tests alongside feature development

what

  • build an automation framework which would simulate user interactions and run these tests consistently on real and headless browsers alike
  • set up integration with CI server(Jenkins)
  • import test results into our Test Managment Platform(QMetry)

how

The frontend team uses Mocha/Chai/karma for writing unit tests. The QA team uses Selenium for automation. We wanted to leverage our existing frameworks/tools as much as we can so that there is less of a learning curve and also the fact that when picked these frameworks/tools, we did evaluate them thoroughly for our needs. Fortunately for us, we found Selenium bindings in javascript. Actually there are quite a few, but prominent of them are webdriver.io and Selenium’s webdriverJs.

We chose Selenium’s webdriverJs primarily for following reasons.

  • it is the official implementation by the Selenium team who have written bindings in various other languages
  • the patterns of writing a test is very similar to a test written in java world to which our QA team was well used to
  • its use of promises to prevent callback hell

For more detailed explanation with examples, please refer here.

Our next piece of the puzzle was to figure out if we could use PhantomJs (headless browser) with webdriverJs. We needed this so that we could run the tests on our CI server where we may not necessarily have a real browser. We did have some initial challenges to run webdriverJs in combination with PhantomJs without using Selenium’s Remote Control Server, but looking at the source code (in javascript) helped us debug and get this to work. The challenges could also be attributed to lack of complete understanding of Selenium’s automation world.

The last piece of puzzle was integration with CI server. With PhantomJs already in place, all we needed to figure out was the reporting format of the tests that our CI server(Jenkins) could understand. One of the reasons we had picked Mocha is for its extensive reporting capabilites. Xunit was the obvious choice because of Jenkins support for it.

And there it is, our automation framework, all of it in Javascript stack.

Testing Stack for Web

In the past couple months, we have successfully automated our E2E tests that provide coverage for regression of our web platform and use it on a daily basis. And now that we have an established framework and gained immense experience writing tests, we are one step closer to Continuos Delivery. Integration with our Test Management Platform is in works and will post our findings soon.

reads

https://code.google.com/p/selenium/wiki/WebDriverJs
http://visionmedia.github.io/mocha/
http://chaijs.com/
http://chaijs.com/plugins/chai-webdriver
http://phantomjs.org/page-automation.html
https://speakerdeck.com/ariya/phantomjs-for-web-page-automation
http://casperjs.org/
http://engineering.wingify.com/posts/e2e-testing-with-webdriverjs-jasmine/
http://xolv.io/blog/2013/04/end-to-end-testing-for-web-apps-meteor
http://code.tutsplus.com/tutorials/headless-functional-testing-with-selenium-and-phantomjs–net-30545

Web Automation Testing v2

I grew up on the east coast, currently go to school in the midwest, and was fortunate enough to spend my summer on the west coast working with the Personal Capital engineering team. In addition to working on an amazing engineering team, I became familiar with the workings of a fast paced tech environment and learned a great deal about web and mobile automation. Javascript is now my strongest programming language, and I learned to appreciate its value to a commercial company (not just a coding assignment). I could have not asked for a better summer experience.

My coworker, Nick Fong, already wrote a post here describing the main points of our project this summer. So as to not be repetitive, I will be writing more about the problems and road blocks we faced along the way and how we overcame them. I highly suggest reading his post first to get a better idea of the general framework that I will be talking about. You can Nick’s post here.

Working with selenium WebDriverJS, there were many concepts that were new to me, but Javasript promises and how they worked in an asynchronous fashion, were one of the most confusing. First, promises were necessary for the scripts we wrote because they were the only way to access information from the driver. Below is an example of using a promise to access the pin field while linking accounts. In this piece of code, it is verifying that a pin field is there by checking if the information returned by the driver is not null.

driver.findElements(webdriver.By.css('[name="PASSWORD1"]')).then(function(pin) { 
    var length = pin.length; 
    if (pin.length > 0) { //makes sure the pin location is there 
        helper.enterInput('[name="PASSWORD1"]', accounts['L'+index].v); // Name distinct for Firstrade 
    } 
});

This in itself was not that difficult to do in our scripts. We created many ‘helper’ functions, which you can see used above, that use promises to access and manipulate the driver. What took some time to grasp was in asynchronous scripts; anything that happens within the promise stays within that promise. This turned into a scoping issue when I would edit global variables inside the promise and have another promise read the original value of this variable.

For the builds to pass, the scripts must run in PhantomJS, a headless browser we ran everything on before pushing to productions, with no errors. However, just because it worked on PhantomJS did not mean it would work on the other browsers. We found after much trial and error that PhantomJS behaved the most like Safari, but this did not guarantee a script working in Safari would work in PhantomJS. A very peculiar error I faced occurred when writing automation scripts for www.personalcapital.com in chrome. When I was testing the links on the page, everyone but the last one would fail and for the longest time I had no idea why. Eventually I figured out that the banner that followed the user down the page when he/she scrolled was blocking the link because our code would scroll so the link was the closest to x:0 y:0 before clicking. To change this:

driver.executeScript(‘window.scrollTo(0,’ + loc.y + ‘)’);
was changed to this:
driver.executeScript(‘window.scrollTo(0,’ + loc.y-50 + ‘)’);

This change, although extremely simple, took a long time to figure out. It also gave me a greater appreciation for this work and how much time it actually takes. Before working here, I would, like most developers, spend a lot of time debugging my code. Only after working for a company that is actually pushing a product out to a customer did I truly appreciate the time needed to get everything right.

I divided all the tests into two categories: tests that were completely internal and that that used outside information. Internal test would be something like checking to make sure our information gathering survey worked or that the marketing page’s links were working. The latter type consisted of such tests as linking accounts or checking transactions. One of the tests I wrote contained a script for added accounts to test IDs and checked to make sure everything was linked correctly. Not only did the parameters of this test change three times, thanks guys, but also I had to deal with naming conventions that were out of our control. For the most part, they were consistent for username and password, but when other fields were added, all bets were off.

Although I joked about the changing of parameters, it actually was an important part of my summer because it exposed me to the compromises that automation scripts need to accommodate. The debate was how dynamic the script would be. Obviously, in an ideal world, the script could link any account in any way. However, after a lot of work, and because we had to rely on third party information, this was not possible. So the question remained whether we wanted a smoother, simpler script that tests the basic functionality for a few accounts, or tries our best to be fully dynamic. Eventually we decided on the former, setting aside five accounts of different types to aggregate.

There is so much more I could talk about, but that is for another time. I would recommend using Selenium WebdriverJS, found here, for writing these automation scripts to anyone who might be interested. I want to thank all the people at personal capital for making me feel at home this summer; it was a pleasure coding with you.

Web Automation Testing

This summer I have the privilege of working as an Engineering Intern at Personal Capital. Not only do I have the pleasure of working with really great people, I am also learning about how various engineering teams come together to build an awesome product. There is only so much you can learn in a classroom; this is the real world we’re talking about!

My main project is implementing an automated test suite for Personal Capital’s marketing website and web app. Automated tests make the feedback loop faster, reduce the workload on testers, and allows testers to do more exploratory and higher-value activities. Overall, we’re trying to make the release process more efficient.

Our automated testing stack consists of Selenium WebDriverJS, Mocha + Chai, Selenium Server, and PhantomJS. Tests are run with each build by our continuous integration tool Hudson, and we can mark a build as a success or fail based on its results. Our tests are written in JavaScript since our entire WebUI team is familiar with it.

In an effort to keep our test scripts clean and easily readable, Casey, one of our Web Developers, ingeniously thought of creating helper functions. So instead of having numerous driver.findElement()’s and a chai.expect() throughout our scripts, these were integrated into a single function. An example of a one is below.

var expectText = function(selector, text) {
	scrollToElement(selector).then(function(el) {
		chai.expect(selector).dom.to.contain.text(text);
	});
};

We were having issues when testing in Chrome (while Hudson runs PhantomJS, our tests are written to work in Firefox, Chrome, and Safari) where elements weren’t visible so we need to scroll to their location first. We then have our scrollToElement() method that is chained with every other helper function.

var scrollToElement = function(selector) {
	var d = webdriver.promise.defer(),
		el;

	// Get element by CSS selector
	driver.findElement(webdriver.By.css(selector))
		// Get top and left offsets of element
		.then( function(elt)	{
			el = elt;
			return elt.getLocation(); 
		} )
		// Execute JS script to scroll to element's top offset
		.then(	function(loc)	{ 
			driver.executeScript('window.scrollTo(0,' + loc.y + ')');
		} )
		// If successful, fulfill promise.  Else, log ERR
		.then(	
			function(success)	{ 
				d.fulfill(el);
			}, 
			function(err)	{ 
				d.reject('Unable to locate element using selector: ' + selector);
			} );

	return d.promise;
};

Then a typical test script would look like this:
helper.clickLink();
helper.expectText();
helper.enterInput();
Super clean, simple, and awesome. Anyone can write an automation script!

One of the main challenges in automation is timing. Some browsers (I’m looking at you Chrome) are faster than others, and the driver will attempt to execute commands before elements on the page can be interacted with. So to overcome this we used a mixture of implicit and explicit waits. There are two ways to do an implicit wait. The first is setting WebDriverJS’s implicitlyWait() by having the following line of code after defining the driver:

driver.manage().timeouts().implicitlyWait(1300);

This is global, so before throwing an error saying an element cannot be found or be interacted with, WebDriverJS will wait up to 1.3 seconds. The second method is waiting for an element to be present on the page, and setting a timeout. This is helpful if we need more than 1.3 seconds on a certain element. We have a helper function called cssWait() that looks like this:

var cssWait = function(selector, timeout) {
	driver.wait(function() {
		return driver.isElementPresent(webdriver.By.css(selector));
	}, timeout);
};

On top of those we use explicit waits that are simply “driver.sleep(<time>)”. Sometimes we need to hard code a wait to get the timing just right.

Unfortunately that’s it for this post. If you have any questions feel free to leave a comment and I’ll get back to you. In my next blog post, or one that will be written by Aaron, I will talk more about some of the challenges we faced and how we dealt with them.

To get started with Web Automation, I suggest heading over to SimpleProgrammer.com where John Sonmez put together some instructions on getting your environment set up. While his are for Windows, the Mac version is pretty similar.

Evolving End-User Authentication

EV Certificate Display

The adoption of EV Certificates has rendered the login image obsolete.

This week, Personal Capital discontinued the use of the “login image”, as part of an upgrade to our security and authentication processes.   By “login image”, I mean the little personalized picture that is shown to you on our login page, before you enter your password.

Mine was a picture of a starfish.

Several users have asked us about this decision and, beyond the simple assertion that the login image is outmoded, a little more background is offered here.

 

The founders and technology principals in Personal Capital were responsible for introducing the login image for website authentication, a decade ago. In 2004, Personal Capital’s CEO Bill Harris founded, along with Louie Gasparini (now with Cyberflow Analytics), a company called PassMark Security, which invented and patented the login image concept, and the associated login flow process. Personal Capital’s CTO, Fritz Robbins, and our VP of Engineering, Ehsan Lavassani, led the engineering at PassMark Security and designed and built the login image technology, as well as additional security and authentication capabilities.

Server login images (or phrases, in some implementations) were a response to the spate of phishing scams that were a popular fraud scheme in the early- and mid-2000s.  When phishing, fraudsters create fake websites that impersonate financial institutions, e-commerce sites, and other secure websites.  The fraudsters send spam email containing links to the fake sites, and unsuspecting users click on the links and end up at the fake site. The user then enters their credentials (username/password), thinking they are at the real site. The hacker running the fake site then has the user’s username/password for the real site and, well, you know what happens next. It’s hard to believe that anyone actually falls for those sorts of things, but plenty of people have. (Phishing is still out there, and has gotten a lot more sophisticated (see spear-phishing for example), but that is a whole other topic).

So, the login image/phrase was a response to the very real question of:  “How can I tell that I am at the legitimate website rather than a fraudulent site?”  With login image/phrase, the user would pick/upload a personalized image or phrase at the secure website. And the login flow changed to a two-step flow: the user enters their username, then the secure site displays the personal image/phrase, and then, assured that they are at the legitimate secure site when they recognize the image/phrase, the user enters their password. The use of login image/phrase was a simple and elegant solution to a vexing problem. And when the FFIEC (U.S. banking regulatory agency) mandated stronger authentication standards for U.S. banking sites in 2005, login image quickly became ubiquitous across financial websites, including Bank of America and many others, during the mid-2000s.

From a security perspective, the login image/phrase is a kind of a shared secret between the secure site and the user. Not as important a secret as the password, of course, but important nonetheless, and here’s why: If a hacker posing as the real user enters the user name at the secure site, and the site displays the user’s login image/phrase then the hacker can steal the image/phrase and use it in constructing their fake website. Then the fake website would then look like the real website (since it would have the image/phrase) and could then fool the user to giving up the real prize (the password) at the fake phishing site. So, the issue of “how to protect the security of the login image?” becomes a relevant question.

Device identification is the answer:  If the website is able to recognize the device that is sending a request containing the username, and if the site knows that device has been authorized by the user, then the site can safely show the login image/phrase, and the user feels secure, and enters their password. This is essentially a process of exchanging more information in each step of the authentication conversation, a process of incremental and escalating trust, culminating in the user entering their password and being granted full access to the site.

But the use of device identification to protect the login image is secondary to the real technology advance of this approach: the use of device identification and device forensics as a second factor in authentication. Combining the device identity with the password creates a lightweight form of two-factor authentication, widely recognized as being far superior to single-factor (password only) authentication.

The simplest form of device identification involves placing a web cookie in the user’s browser. Anyone out there not heard of cookies and need an explanation? OK, good, I didn’t think so. Cookies work pretty well for a lot of purposes, but they have a couple of problems when being used for device identification: (1) the user can remove them from the machine; and (2) malware on the user’s machine can steal them.

The technology of device identification quickly evolved, at PassMark and other security companies, to move beyond cookies and to look at inherent characteristics of the web request, the browser, and the device being used. Data such as the IP address, User-Agent header (the browser identity information), other HTTP headers, etc. Not just the raw data elements, but derived data as well, such as geolocation and ISP data from the IP address. And, looking at patterns and changes in the data across multiple requests, including request velocity, characteristic time-of-day login patterns, changes in data elements such as User-Agent string etc.  Some providers started using opt-in plugins or browser extensions to extract deeper intrinsic device characteristics, such as hardware network (MAC) address, operating system information, and other identifiers.

“Device forensics” evolved as the practice of assembling large numbers of data points about the device and using sophisticated statistical techniques to create device “fingerprints” with a high degree of accuracy. The whole arena of device identification and device forensics is now leveraged in a variety of authentication and fraud-detection services, including at Personal Capital. This is the real value that grew out of the “login image” effort.

But, while the use of device identification and device forensics was flourishing and becoming a more central tool in the realm of website authentication, the need for the login image itself was becoming less compelling.

Starting in the late 2000s, the major SSL Certificate Authorities, (such as Verisign), and the major browser providers (such as IE, Firefox, Chrome, Safari) began adopting Extended Validation (EV) certificates. These certificates require a higher level of validation of the certificate owner (i.e. the website operator, such as Personal Capital), so they are more trusted. And, just as important, the browsers adopted a common user interface idiom for EV certificates, which include the display of the company name (e.g. “Personal Capital Corporation”) that owns the certificate, displayed in a distinctive color (generally, green) in the browser address bar (see picture). The adoption of EV certificates has essentially tackled the original question that led to the use of the login image (i.e. “how does the user know they are at the real website?”).

Which brings us to today. Personal Capital has removed the login image from our authentication flow. It is a simpler and more streamlined flow for our users, and has the added benefit of reducing complexity in the login process. It is a security truism that, all else being equal, simpler implementations are more secure implementations – fewer attack vectors, fewer states, fewer opportunities for errors. Personal Capital continues to use device identification and device forensics, allowing users to “remember” authorized devices and to de-authorize devices. We also augment device identification with “out of band” authentication, using one-time codes and even voice-response technology to verify user identity when they want to login from a non-authorized or new device.

I’ll admit that I will miss my little starfish picture when I log in to Personal Capital. But this small loss is offset by my knowledge that we are utilizing best, and current, security practices.

PassMark Security, circa 2005

“Ugly Shirt Fridays” at PassMark Security, circa 2005