Thursday, October 20, 2016

Stemgraphic, a new visualization tool

PyData Carolinas 2016

At PyData Carolinas 2016 I presented the talk Stemgraphic: A Stem-and-Leaf Plot for the Age of Big Data.


The stem-and-leaf plot is one of the most powerful tools not found in a data scientist or statistician’s toolbox. If we go back in time thirty some years we find the exact opposite. What happened to the stem-and-leaf plot? Finding the answer led me to design and implement an improved graphical version of the stem-and-leaf plot, as a python package. As a companion to the talk, a printed research paper was provided to the audience (a PDF is now available through

The talk

Thanks to the organizers of PyData Carolinas, videos of all the talks and tutorials have been posted on youtube. In just 30 minutes, this is a great way to learn more about stemgraphic and the history of the stem-and-leaf plot for EDA work. This updated version does include the animated intro sequence, but unfortunately the sound was recorded from the microphone, and not the mixer. You can see the intro sequence in higher audio and video quality on the main page of the website below.

I've created a web site for stemgraphic, as I'll be posting some tutorials and demo some of the more advanced features, particularly as to how stemgraphic can be used in a data science pipeline, as a data wrangling tool, as an intermediary to big data on HDFS, as a visual validation for building models and as a superior distribution plot, particularly when faced with non uniform distributions or distributions showing a high degree of skewness (long tails).

Github Repo

Francois Dion

Tuesday, October 11, 2016

PyData Carolinas 2016 Tutorial: Datascience on the web

PyData Carolinas 2016

Don Jennings and I presented a tutorial at PyData Carolinas 2016: Datascience on the web.

The plan was as follow:


Learn to deploy your research as a web application. You have been using Jupyter and Python to do some interesting research, build models, visualize results. In this tutorial, you’ll learn how to easily go from a notebook to a Flask web application which you can share.


Jupyter is a great notebook environment for Python based data science and exploratory data analysis. You can share the notebooks via a github repository, as html or even on the web using something like JupyterHub. How can we turn the work we have done in the notebook into a real web application?
In this tutorial, you will learn to structure your notebook for web deployment, how to create a skeleton Flask application, add a model and add a visualization. While you have some experience with Jupyter and Python, you do not have any previous web application experience.
Bring your laptop and you will be able to do all of these hands-on things:
  1. get to the virtual environment
  2. review the Jupyter notebook
  3. refactor for reuse
  4. create a basic Flask application
  5. bring in the model
  6. add the visualization
  7. profit!
Now that is has been presented, the artifacts are a github repo and a youtube video.

Github Repo

After the fact

The unrefactored notebook is here while the refactored one is here.
Once you run through the whole refactored notebook, you will have train and test sets saved in data/ and a trained model in trained_models/. To make these available in the tutorial directory, you will have to run the script. On a unix like environment (mac, linux etc):
chmod a+x


The whole session is now on youtube: Francois Dion & Don Jennings Datascience on the web

Francois Dion

Thursday, October 6, 2016

Improving your communications: Professional Audio-Video Production on Linux

Pro AV on Linux

I'll be presenting on the subject of Professional Audio-Video Production on Linux, next week at TriLug.

From concept to finished product, it has never been easier to obtain professional results when it comes to audio-video production on Linux.

We will cover some of the hardware that should be part of your production suite, from microphones to jog wheels and highlight some of the top tools for animation, audio, broadcasting, effects, modeling, music, transcoding and video. We will also go beyond the usual suspects and introduce some tools that might not be typically used for AV production.
By the end of the presentation, you will have all the tools you need to improve the quality of your communications, for your personal enjoyment, your career, or your business.


Thursday, 13 October 2016 - 7:00pm to 9:00pm
The Frontier, 800 Park Offices Drive, Durham, NC
Francois Dion

Wednesday, October 5, 2016

Something For Your Mind, Polymath Podcast episode 2

A is for Anomaly

In this episode, "A is for Anomaly", our first of the alphabetical episodes, we cover financial fraud, the Roman quaestores, outliers, PDFs and EKGs. Bleep... Bleep... Bleep...
"so perhaps this is not the ideal way of keeping track of 15 individuals..."

Something for your mind is available on



Francois Dion
P.S. There is a bit more detail on this podcast as a whole, on linkedin.