Basic Data Analysis using Python: Pt IV


     In this series, I am going to outline a basic data analysis exercise using a real world data set. This example is a direct result of a relatively simple physics experiment I was a part of, and we required this analysis in order to determine several parameters in order to move forward with the rest of the experiment. This exercise is not simply an example, but in fact uses real data captured in a real lab from which the results were required to proceed. We will start from the data set and demonstrate using only Python to turn that data into presentation quality graphic results. This series will not demonstrate using Python to clean large data sets, this is a capability that has already been well established. In this final part, we will explore how to generate presentation quality graphics from our data!


The difference

     I'll just jump straight into it with a demonstration of the differences in looks when you take some time to change the style parameters.

     The size is larger, the font is changed, and we're using LaTeX to write the formula itself. Not to mention that we've added titles and labels. But we've also changed the look of the curve itself. The whole purpose of spending time to do this is to make this more visually appealing. Poor quality graphics, which doesn't stop at this point but is also influenced by the card stock used, print quality and other variables. A good marketer would tell you that every aspect of a presentation can turn someone off to your product. In our case, we don't need poor visualization clouding the message we're trying to get across in our report or presentation of our findings. People could even make the assumption that the poor presentation also reflects the quality of our work, even if that is not true. Research journals have strict guidelines on how reports are written and presented that the author must adhere to or the paper will never see the light of day. The last thing is that poor presentation can put off the idea that, just perhaps, your other work was also as lazy. Therefore its import to make sure we have good graphics.


The How

     The main way of controlling how our graphics look are through style scripts. Matplotlib includes quite a number of default scripts outright, which you can view here. I however have found that all of these lack something. Seaborn came close but I still wanted to alter the look a bit and particularly it did not produce a graph size big enough for printing and presentation. Therefore, I wrote my own style sheet which is used in the second graph above. You can view the details of my personal style sheet in another blog post here or on the github wiki page for the style sheet (where you can download it as well) here. Unfortunately, the commands to alter a sheet are extensive and would be unwieldy to explain here. You can refer to matplotlib's documentation to get the details. For most people, finding a custom style sheet already written should suffice in most circumstances. In my sheet, I changed the font, font size, grid display, color, graph size produced and quite a bit more. In order to invoke a style sheet, use one of the two commands below.

import matplotlib.pyplot as plt

# For style sheets shipped with matplotlib, use this line of code.'style-sheet-name')

# For custom style sheets NOT shipped, use this line of code.'/path/to/custom/style/sheet/style-sheet-name')

     However, we don't stop there. We also need titles, labels, and legends. These are simple commands found in matplotlib. We can also invoke another feature of matplotlib, it support LaTeX code to display mathematical formula, as I've demonstrated above. These are simple wrappers included in the pyplot commands. Because I wanted the text display to be uniform, I included the wrappers around standard text labels as well. The code used to display the labels, title, and legend is shown below. It also has the LaTeX wrappers which are the dollar signs in the label strings. You can also see how LaTeX is used ti display the math formula in the legend. Now of course, you don't have to make the line a dashed line, it can be solid, dotted, or a combination. Its controlled by the flag "k:" which tells Python to make the line black (k) and dotted (:). You could change the color or appearance by altering the flag. I elected to keep it dotted but some may wish for a solid line, just remove the : from the flag. That flag isn't even required. The style sheet will handle them accordingly if you wish for the defaults.

plt.plot(xpts2, xfunc(xpts, a, b, c), 'k:', label = r"$a\/cos(x-b)^2\/+\/c$")

plt.xlabel(r'$Angle \/(degrees)$')
plt.ylabel(r'$Intensity \/ (mV)$')
plt.title(r"$Malus's \/ Law \/ Model \/ Fit$")

      It is difficult to detail every aspect that can be invoked in order to alter the appearance of the default graphics. It takes time to learn and anyone that is going to code in Python will either need to learn this or use a different software package to produce results. However, why would you spend time learning and other program when you can invest in the time to learn how to properly alter the appearance and write your own style sheet? Once you accomplish this, producing quality graphics is just a few lines of code. As an aide, I've included the python scripts I've written for this blog series. You can download them using the links below.

Script to Fit Models to Data

Script for Function Comparison

Unformatted Final Graph Script

Print Quality Graph Script

© 2016 Zachary G Wolfe -- Remember to turn your brain off for a reboot sometimes...