All new technical blog posts from our product team here at Personal Capital will be on Medium.com!
Come join us:
At Personal Capital creating better financial lives through technology and people.
I have just joined the front end team and I’m loving it. The way we live in a startup pace, moving agile and collaborating every cycle surrounded by amazing people makes it perfect for professional development.
We have a great tech/skills stack and I wanted to share it with everybody especially those who are thinking in joining our team.
We believe in the power of technology to change the financial industry, making it more accessible, affordable, and honest. And we believe in the power of people to change the nature of investment advice, making it more transparent, objective, and personal.
Living in a world that is constantly talking about saving money and figuring out complex investment decisions, our front end engineers face a huge challenge: to make things as simple as one click.
To make that possible, we process lots of data. We utilize user information to generate what we call a progressive profile. Every user has a progressive profile which is generated from their individual inputs.
By doing this we can recommend plans, proposals and functionalities so our users can get the most out of our platform. This is how our award-winning platform allows 1.2 million users to track and invest their assets.
Ensuring that our technology operates on multiple devices is a must, not only a web platform, but within easy-to-use apps. Our users want access to their finances at home and on the go and that is what we deliver.
The Front-End Stack
This article will be cover our web platform tech stack, the one that we use day to day to make things as easy as one click/tap.
On development and build time we use Node.js, which we mix with some browser-sync flavor to proxy all non-static requests to a development server in the Amazon Cloud, Node and browser-sync serve the static resources to keep it simple. The team specifically chose not to use any build system like Gulp or Grunt to avoid having extra external dependencies.
Since the front-end layer is simple, NPM manages all the tasks.
Single Page Application
Our back-end services are written in Java, and we consume them using Backbone and Angular. This allows us to reuse tons of pre-implemented open source plug-ins and create reusable components that can be implemented across all of Personal Capital application’s domain.
Most of our platform is written using handlebars templates. This gives us the unique ability to use lots of pre-build helpers and accelerating the prototyping and development process.
We are migrating from Bootstrap to Inuit, refactoring code and creating our own styling framework based in OOCSS/BEM. By doing this we can ensure coding style consistency across our styles sheets, generating maintainable beautiful code that is reusable and easy to understand.
This is especially important since the company is growing rapidly, and by doing this we help our new engineers to jump in the train faster and deliver value sooner.
If you use our platform, you have seen a lot of beautiful visual components to our design. We use graphics whenever possible to make complex data easier to understand.
Our graphics manage complex datasets. They are dynamic so you can interact with them and get from the most generic information to the more specific transactional details.
To achieve this, we used Raphael. But as it is no longer being maintained, we are transitioning to D3 library, which is much more flexible and powerful, giving us the tools to generate almost any kind of graphics that we can think of.
Mocha, along with Chai (expect/should – BDD style) worked out well for us with a effective yet readable tests. I cannot emphasize the importance of readable tests enough. Now the feature set can be seen across the organization, comparing the expected outcome with the outcome. We’lI have to do a write up sharing more of our excitement.
The use of Mocha and Karma helped us on the path to continuous delivery and immediate regression reports. Tests are executed periodically by our continuous integration server, which monitors the health, watches for the codebase consistency and gives us the code coverage information. This is really important to ensure that we’re delivering a first class service/product.
Automation and CI
We use Jenkins as the orchestrator to build, test and deploy all of our environments. Our DevOps team wrote the tasks that run automatically or on-demand with one click.
Jenkins triggers Karma to run our Mocha test and Selenium to run-regression. Those tests are run in different browsers, the ones with head and without it (AKA headless browsers).
BTW, Karma watches our files and runs tests in development time too, eliminating the necessity of doing so before pushing code and giving us immediate feedback on what we are coding.
We also have integration with Husky that prevents our dev team from pushing code up to the origin/remote if there are failing unit tests.
We use Git and we love it. The workflow is pretty similar to git-flow. We just extend it, adding a few extra branches like a sandbox one, where we make al kinds of experiments.
Once we run “git push”, the linters and code style
At Personal Capital, everyone is a team player. We volunteer to get tasks assigned based on our expertise. We use scrum across all of the engineering teams and our sprints last one
Personal Capital is proud to sponsor Women Who Code, a non-profit organization dedicated to inspiring women to excel in the technology industry.
Most articles about the ratio of women to men in tech focus on the negative, citing differences in population, salary, and leadership roles. I would like to take time to put a spotlight on a couple of positive anecdotes that show how people can make a huge difference by providing a little support.
About a year ago, a female friend of mine pulled me aside at dinner and told me she had made a career change to become a web developer. She said the idea of entering the tech industry first took hold after a discussion we had years back, in which I (as a female developer myself) had encouraged her toward this new direction. The news was such a pleasant surprise because it made me realize how a simple conversation can change the course of a person’s life in a positive way, either through the introduction of an idea or by instilling confidence in one that had been lying dormant.
It reminded me of a conversation that changed the course of my own life. It was my first year at UC Berkeley, and I was taking CS3: Introduction to Symbolic Programming, on a whim. I loved the class and considered departing the hazy pre-med path my parents laid out for a computer science focus. I was concerned, though, because so many people in the class had previous programming experience and I was starting from scratch. I asked my sister if I was crazy for pursuing this foreign path, and she told me that if I was good at it and liked it, go for it.
Fast forward to now, I’m the Director of Mobile Technology at Personal Capital, a company that I believe in, where I’m doing what I love. It just goes to show how a little inspiration and support can go a long way, and I’m proud that Personal Capital is sponsoring Women Who Code, an organization that is dedicated in providing both.
We align our engineering efforts with our company’s mission: Better financial lives, through technology and people.
Just like our financial advisors abide by the fiduciary standard (meaning financial advice they give must be in the best interest of our clients), we develop products and services that serve the best interests of our users and clients.
We create new business opportunities by tackling complex financial problems facing millions of American families.
Every day we create a more connected and data-driven financial ecosystem for our users.
We take pride in our intuitive user experiences and a high quality of service that delights our users.
We filter out the noise to surface the most important information via data visualizations that show how financial actions create a storyline our users can follow to understand their current and future net worth.
We empower our users with personalized financial planning tools.
We believe the best solutions come from empowered cross-functional teams of engineering, product, marketing and advisory, and we believe in the spirit of collaboration.
We believe an important way to deliver financial advice is through data visualization and intuitive user interfaces.
We believe everything we do should be measurable,transparent and accessible to all.
We value open and streamlined communication from documentation in code, automated tests and captured product discussion; we speak our minds openly and freely.
We believe quality is everyone’s concern, and together we work hard to create a product that will improve people’s lives.
We embed best security practices in everything we do, at every level of our organization.
We value accountability, with the goal of fostering leadership and focus.
We work smarter, not harder: we use data to create and refine our business rules to get better results and use automation to scale them.
We promote our processes and best practices by automating them.
We let our code speak for itself through documentation, unit tests and test cases.
At Personal Capital, we’re all about changing the face of financial technology, and we are always looking for great talents to join our team. And you will join a team that works together to motivate, inspire, and change an industry.
At Personal Capital we have been leveraging machine-learning techniques to solve business problems since the early days of our company. As we’ve grown and faced new challenges that require intelligent and scalable solutions we’ve turned to machine-learning based solutions over and over again. We firmly believe that a continued investment in systems that help garner insights and solve complex problems for our customers will be critically important.
When Amazon announced their new machine-learning platform, Amazon Machine Learning, we were very excited to evaluate it. With the promise of scalability, speed, and ease of use, we felt that it could serve as a potential platform upgrade for our machine-learning challenges.
In the spirit of exploration, we spent a few days getting familiar with the AML product.
AML – Current Capabilities:
AML provides you the ability to consume CSV data from an S3 bucket. Additionally, it will allow you to define a query to run on your redshift servers in order to create a CSV file usable for machine learning.
After defining an input source, it will then process your file and attempt to assign types to each of your features (categorical, numeric, binary, and text). The UI presents a sampling of the feature values so you can see if an error was made. It will allow you to re-select the type for any column if it was chosen incorrectly. You must also choose a column to be trained on (dependent variable).
The interface for pointing to datasets and processing them is clean and intuitive. In fact, it seems that the entire AML UI has been carefully composed to make AML usable by almost anyone.
Once you’ve created a dataset you can then build a model on that dataset. Logistic regression is currently the only supported model type for binary classification.
By default AML will split your file into training and testing parts. It randomly samples 70% of the dataset for training and 30% for testing. This is par for the course when it comes to machine learning. AML will also allow you to specify the training and testing sets separately. This is good because sometimes it’s desirable to split in a different way. You may want to split by a timestamp or ensure that a given user is only in the training or testing set.
AML will evaluate the model for you as part of the model-building step. It has a nice interface that allows you to see how accuracy, false positive rate, precision, and recall vary with different thresholds. You can choose a new threshold for the model if you wish, which will then be used for scoring purposes.
You can generate predictions using your new model in batches (upload a file of observations to score) or in real-time using their prediction API.
Thoughts On AML:
Ok so the system seems to be well thought out and intuitive. How does it work for us though? Here are our thoughts on each of the various pieces:
Logistic Regression (LR) is the only model type available with AML. LR generally requires more time and care in handcrafting your features than other model types. For example, if you have two features, one called state and the other called age, a logistic regression model will not be able to figure out that older people who live in Florida prefer a specific type of product. By default it will have one weight for state code and one weight for age. You need to create a feature that is a combination of state and age in order to capture the combined signal. AML does provide the ability to create combination variables by combining all possible text values of two variables. This is good, but still requires you to have a suspicion that two variables interact before performing the combination. It’s not feasible to combine all features with each other, especially when three way or four way combinations may contain most of the predictive power. Additionally, adding features to a dataset increases the dimensionality and therefore the amount of data required to train a robust model. Other model types like decision trees, neural networks, etc. can figure out these feature combinations for you and drastically reduce the time to build a good model.
Logistic Regression cannot handle numeric features which are non-monotonic in relation to the outcome. An example of where this is important would be a feature tracking the number of site logins a user has and assigning a probability of an outcome to that user based entirely on that feature. The likelihood of the outcome could be higher when you’ve not seen the user before, lower when you’ve seen them a normal amount of times, and higher again when you’ve seen them a very large number of times. You could solve this problem by binning your numeric variables using a supervised technique. It would be nice if AML supported this use case as one of its feature transformations. Something like the MDLP function within the discretization package in R would be great.
Amazon will not expose the model that it built to you. You will never be able to see which features received which weights. Depending on your application and goals, this can be a real deal breaker. You can espouse the great AUC and PR curve that your model has but when the consumers of your model notice something amiss and you state that you “don’t know what’s going on” … well I certainly would not want to be put in that position. Keep in mind that these metrics for measuring models do not usually take into account the subtleties of errors. Its great that your model has a recall rate of 95%, but perhaps that 5% contains one of the largest and most important cases. You could argue that the impact of each observation should have been modeled, but to be honest these things tend to be iterative. Not being able to see what your model is doing is a HUGE downfall and in my opinion cannot form the basis of a system for any serious type of machine learning, but your mileage may vary.
In total it cost me $0.61 to build the sample model that comes built into AML … that represents over an hour of machine time to build a single logistic regression model. If we gloss over the issue of whether this sample model is representative of real world models this amount can either be very low or very high, depending on what you’re doing. If you are building a model across you’re entire dataset which can be leveraged for a few months without retraining then this cost is extremely low. If on the other hand you are building models for subsets of data (each user, each item, each offer, etc.) and you are doing this on an ongoing basis (every few days/hours/minutes) then the costs can really begin to add up.
The ability to create real time scores on your models is what will sell many people on AML immediately. You can bypass the creation of software required to score models in real time. Typically this code is not complex but you have to manage models in memory, determine which model to apply to which event, and be very rigorous about testing the accuracy of the scoring system. All of this can be bypassed by using AML.
They’ll charge you a penny for 100 predictions. Whether this is expensive again depends on the scale of your classification/scoring problem. If you’re scoring new user registrations, and there are a few thousand of those a month, then this would likely be low cost. If however you’re scoring something that happens hundreds of millions of times per month your cost would be large and ongoing.
When it comes to latency, here’s what they say, “The Amazon ML system is designed to respond to most online prediction requests within 100 milliseconds.” Depending on your use case this may not be particularly comforting. If you want your site to appear seamless while scoring events AML is not likely for you. You can of course write a wrapper around your scoring requests and serve some default when AML does not respond in time, but that’s a bit of a gamble. Generally, scoring a logistic regression model should be lightning-quick … if you code the scoring system yourself.
It would be nice if they made the trained models available through a download in some format like PMML. If they’ve structured their profits around scoring though, this is not likely to ever happen. Still, it’s a major flaw in the system design from a usability perspective.
The concept of transformation expressed in a flexible way is a great idea. No doubt many machine learning teams have thought of this and hoped for it but didn’t have the time to build it for their specific application. It makes a lot of sense that a large-scale service provider like Amazon would build something like this. That being said, the set of transformations is light. As Amazon themselves state on their AML website, feature pre-processing is usually the most impactful part of building models. I’d personally take a bad model and a rich feature set over the opposite any day. That is why it’s surprising that their transformation set is so limited. Perhaps they are planning to build it out over time. In any case, this can be overcome by using EMR to transform and create features prior to model building. That begs the question though of why you’re using AML at all when EMR comes with mahout built in. You can do your own feature transformations and build a random forest model with minimal effort. Creating code to score a random forest in production is not too difficult. It’s true that you should be very rigorous around testing it, but once you have it your system will be much more flexible in general and there is no cost per transaction (on top of server costs).
AML is a wonderful proof of concept tool. Its great to show management that this nebulous thing they’ve heard of called machine learning can be wrestled down and made functional in a few hours. For very simple tasks that do not require much oversight (i.e. anything is better than nothing) AML would work quite well. In general though, I would imagine that any team solving serious Machine Learning problems would have to evolve past AML at a very early point. Either that or wait for the evolution of AML, which will no doubt occur.