Diversify Portfolio

Stock market tools for actively engaged, self-directed investors and traders.

Table of Contents

For every project I add to Grok My Code, I provide extra information on design decisions, code snippets, technologies used or anything of interest related to the development.

Tech used on this project:

Python Django Redis Javascript PostgreSQL

1. Overview

Diversify Portfolio is a customer serving stock portfolio analysis SaaS product which provides tools focused around diversification, asset allocation and factor analysis.

I was responsible for the entire development of the site including all back end, front end, server setup / maintenance and marketing / SEO related activities.

A few key features include:

  • 5 years of data on over 7 500 stocks, updated daily.
  • Portfolio level backtesting and analysis.
  • Interactive charts and visualisation tools.
  • Stocks screens and opportunity finders.
  • PayPal integration.
  • MailChimp integration.

Live site:


If you wish to test the functionality of the site, you can sign up for the free 10 day trial which will grant access to all the analysis tools.

2. Nightly Data Update

Every night a multi stage process (which I developed in Python) is run to obtain, clean and incorporate the latest stock market data.

2.1. Data Download

The most recent days stock market activity is downloaded via FTP from a 3rd party vendor for over 7 500 stocks. This includes the raw stock data such as closing price, volume, company fundamentals etc.

Next, 3 csv and txt files are downloaded from the Nasdaq stock exchange which includes Sector and Industry related data.

2.2. Data Scrubbing

The downloaded files from the prior step are then scrubbed to address any anomalies and to format the data correctly.

These files are then merged to create a single master representation of all stock, Sector and Industry activity that occured in the market over the prior day. Lastly, the master file is used to update the relevant database tables.

2.3. Long Running Calculations

Many of the calculations performed on the site involve several thousand data points spanning multiple years of historical stock market activity.

Therefore due to the amount of data involved and the time it takes to perform certain calculations, specific database tables are used to pre-calculate and store the analysis for 1 500 of the most popular stocks.

One such example of this technique is employed as follows:

Stock correlation plays a central role in Diversify Portfolio. However, calculating the correlation between stocks takes time, especially when done on the fly for customer portfolios containing unique stock combinations.

Therefore, the database contains a correlation table which is updated at this stage of the nightly run. The table contains correlation data over 4 different time frames (1, 3, 6 and 12 months) for 1 500 of the most popular stocks.

When users of the site need access to correlation related information, Django will first check if the data is already contained within this table before calculating it on the fly for less popular stocks.

Popular stocks are defined as those which have traded the most (ie: the highest volume) over the last 3 months.

Once the historical data for the most popular stocks has been selected, it is loaded into a Pandas dataframe which is used to calculated the 1, 3, 6 and 12 month correlations between every pair wise stock combination.

The resulting correlation matrix is then pivoted using Pandas to create the correct data shape before writing it to the correlation table.

2.4. Clearing the cache

The final step of the nightly run is to clear the Redis cache.

Redis is used to cache several items such as customer portfolio calculations and performance metrics. The cache must be cleared at this stage so that the next time a customer accesses their portfolio, all relevant cacluations are redone to include the latest data before once again caching the results.

Redis is discussed further in the following section.

3. Improving response times using Redis as a cache

As already discussed, many calculations used throughout the site involve several thousand data points. I therefore use Redis as a cache wherever possible.

One such area where Redis is used is the Portfolio Analyser Tool:

  • Initially when a user accesses their portfolio, Django will check to see if it already exists in the cache. If it does, Django will simply add the users cached portfolio analysis results to the response context.
  • If a cache miss occurs, the correlation table from the prior section will be checked. If the required stocks are found, the users portfolio analysis will be calculated using the pre-calculated correlation data. If the required stocks are not found, everything will all be calcualted from scratch.
  • The portfolio analysis results are then cached in Redis using a namespace for the user such as username:portfolio_id:portfolio. Lastly, the results are added to the response context to be sent to the user.
  • Whenever a user makes a change to their portfolio such as adding / removing stocks or changing position sizes, the resulting portfolio analysis is recalculated and cached (replacing the original).
  • There are several other areas throughout the site which use caching. Namespaces are used to identify the relevant users and data being cached.

The following code illustrates how the portfolio analysis results are cached and retrieved:

4. Third Party Integration

Diversify Portfolio integrates with:

  • PayPal for payment of once off and subscription memberships.
  • MailChimp for programmatically moving users between different mailing lists depending on their status (ie: Newsletter, Trial User, Member)
  • The data vendor discussed previously.

PayPal Integration

PayPal provides several integration options, one of which is IPN (Instant Payment Notification). Briefly, the process that is followed is:

  1. Two Django forms are created (once off and subscription) for membership sign ups and payment. These forms are submitted to a specific PayPal url along with identification and payment information.
  2. As soon as PayPal has accepted a payment, it sends a message back to a configured end point within Django in a separate HTTP request. It will make multiple attempts if there are connectivity issues.
  3. If a valid IPN response is received from PayPal, Django will call a payment_notification() method. This is possible thanks to the use of Django Signals which is a way to be notified when specific actions occur in decoupled applications.
  4. By analysing the response data received from PayPal, the users details are updated accordingly (eg: new member, new payment, membership cancellation etc).

    • Validation is performed (not shown here) to ensure the correct amounts have been paid within the correct payment cycles (monthly / yearly / once off).

  5. Users are also moved into different MailChimp groups when needed (sign-up and cancellation).

The below snippet illustrates how Django Signals is used to call paypal_notification() when a valid IPN is received. Note the last line of the snippet to register the handler: