The Most Popular Python Scientific Libraries

Time to read
9 mins
Table of Contents
  • 30+ essential Python libraries for data science, machine learning, and more
    • 1. Astropy
    • 2. Biopython
    • 3. Bokeh
    • 4. Cubes
    • 5. Dask
    • 6. DEAP
    • 7. DMelt
    • 8. graph-tool
    • 9. matplotlib
    • 10. Mlpy
    • 11. NetworkX
    • 12. Nilearn
    • 13. NumPy
    • 14. Pandas
    • 15. Pipenv
    • 16. PsychoPy
    • 17. PySpark
    • 18. python-weka-wrapper
    • 19. PyTorch
    • 20. SQLAlchemy
    • 21. SageMath
    • 22. ScientificPython
    • 23. scikit-image
    • 24. scikit-learn
    • 25. SciPy
    • 26. SCOOP
    • 27. SunPy
    • 28. SymPy
    • 29. TensorFlow
    • 30. Theano
    • 31. TomoPy
    • 32. Veusz
  • Did we miss anything?

Python is many things.

Cross-platform. General-purpose. High-level.

As such, the programming language has numerous applications and has been widely adopted by all sorts of communities, from data science to business. These communities value Python for its precise and efficient syntax, relatively flat learning curve, and good integration with other languages (e.g. C/C++).

The language’s popularity has resulted in a plethora of Python packages being produced for data visualization, machine learning, natural language processing, complex data analysis, and more.

Here is our list of the most popular Python libraries.

30+ essential Python libraries for data science, machine learning, and more

1. Astropy

Astropy is a collection of packages designed for use in astronomy.

The core Astropy package contains functionality aimed at professional astronomers and astrophysicists, but may be useful to anyone developing software for astronomy.

2. Biopython

Biopython is a collection of non-commercial Python tools for computational biology and bioinformatics.

It contains classes to represent biological sequences and sequence annotations. The library can also read and write to a variety of file formats.

3. Bokeh

Bokeh is a Python interactive visualization library that targets modern web browsers for presentation.

It can help anyone who wishes to quickly and easily create interactive plots, dashboards, and data applications.

The purpose of Bokeh is to provide elegant, concise construction of novel graphics in the style of D3.js, but also deliver this capability with high-performance interactivity over very large or streaming datasets.

4. Cubes

Cubes is a light-weight Python framework and set of tools for the development of reporting and analytical applications, Online Analytical Processing (OLAP), multidimensional analysis, and browsing of aggregated data.

5. Dask

Dask is a flexible parallel computing library for analytic computing, composed of two components:

  1. dynamic task scheduling optimized for computation and interactive computational workloads;
  2. Big Data collections like parallel arrays, dataframes, and lists that extend common interfaces such as NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments.

DEAP is an evolutionary computation framework for rapid prototyping and testing of ideas.

It incorporates the data structures and tools required to implement the most common evolutionary computation techniques, such as genetic algorithms, genetic programming, evolution strategies, particle swarm optimization, differential evolution, and estimation of distribution algorithms.

7. DMelt

DataMelt, or DMelt, is a software for numeric computation, statistics, analysis of large data volumes (Big Data), and scientific visualization.

It can be used with several scripting languages, including Python/Jython, BeanShell, Groovy, Ruby, and Java.

The library has numerous applications, such as natural sciences, engineering, modeling, and analysis of financial markets.

8. graph-tool

Graph-tool is a module for the manipulation and statistical analysis of graphs.

9. matplotlib

Matplotlib is a Python 2D plotting library that produces publication-quality figures in a variety of hard-copy formats and interactive cross-platform environments.

It allows you to generate plots, histograms, power spectra, bar charts, error charts, scatter plots, and more.

10. Mlpy

Mlpy is a machine learning library built on top of NumPy/SciPy, the GNU Scientific Libraries.

It provides a wide range of machine learning methods for supervised and unsupervised problems, and is aimed at finding a reasonable compromise between modularity, maintainability, reproducibility, usability, and efficiency.

11. NetworkX

NetworkX is a library for studying graphs which helps you create, manipulate, and study the structure, dynamics, and functions of complex networks.

12. Nilearn

Nilearn is a Python module for fast and easy statistical learning on neuroimaging data.

This library makes it easy to use many advanced machine learning, pattern recognition, and multivariate statistical techniques on neuroimaging data for applications such as MVPA (Multi-Voxel Pattern Analysis), decoding, predictive modelling, functional connectivity, brain parcellations, or connectomes.

13. NumPy

NumPy is the fundamental package for scientific computing with Python, adding support for large, multidimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays.

14. Pandas

Pandas is a library for data manipulation and analysis, providing data structures and operations for manipulating numerical tables and time series.

15. Pipenv

Pipenv is a tool designed to bring the best of all packaging worlds to the Python world.

It automatically creates and manages a virtualenv for your projects, along with adding or removing packages from your Pipfile as you install or uninstall packages.

Pipenv is primarily meant to provide users and developers of applications with an easy method to set up a working environment.

16. PsychoPy

PsychoPy is a package for the generation of experiments for neuroscience and experimental psychology.

It is designed to allow the presentation of stimuli and collection of data for a wide range of neuroscience, psychology, and psychophysical experiments.

17. PySpark

PySpark is the Python API for Apache Spark.

Spark is a distributed computing framework for big data processing. It serves as a unified analytics engine, built with speed, ease of use, and generality in mind.

Spark offers modules for streaming, machine learning, and graph processing. It’s also completely open-source.

18. python-weka-wrapper

Weka is a suite of machine learning software written in Java, developed at the University of Waikato, New Zealand.

It contains a collection of visualization tools and algorithms for data analysis and predictive modeling, together with graphical user interfaces for easy access to these functions.

The python-weka-wrapper package makes it easy to run Weka algorithms and filters from within Python.

19. PyTorch

PyTorch is a deep learning framework for fast, flexible experimentation.

This package provides two high-level features: Tensor computation with strong GPU acceleration and deep neural networks built on a tape-based autodiff system.

It can be used either as a replacement for numpy to use the power of GPUs, or a deep learning research platform that provides maximum flexibility and speed.


20. SQLAlchemy

SQLAlchemy is an open-source SQL toolkit and Object-Relational Mapper that gives application developers the full power and flexibility of SQL.

It provides a full suite of well-known enterprise-level persistence patterns, designed for efficient and high-performing database access, adapted into a simple and Pythonic domain language.

The main goal of the library is to change the way we approach databases and SQL.

21. SageMath

SageMath is a mathematical software system with features covering multiple aspects of mathematics, including algebra, combinatorics, numerical mathematics, number theory, and calculus.

It uses Python to support procedural, functional, and object-oriented constructs.

22. ScientificPython

ScientificPython is a collection of modules for scientific computing.

It contains support for geometry, mathematical functions, statistics, physical units, IO, visualization, and parallelization.

23. scikit-image

Scikit-image is an image processing library.

It includes algorithms for segmentation, geometric transformations, color space manipulation, analysis, filtering, morphology, feature detection, and more.

24. scikit-learn

Scikit-learn is a machine learning library.

It features various classification, regression, and clustering algorithms, including support vector machines, random forests, gradient boosting, k-means, and DBSCAN.

The library is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

25. SciPy

SciPy is a library used by scientists, analysts, and engineers doing scientific computing and technical computing.

It contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers, and other tasks common in science and engineering.


SCOOP is a Python module for distributing concurrent parallel tasks on various environments, from heterogeneous grids of workstations to supercomputers.

27. SunPy

SunPy is a data-analysis environment specializing in providing the software necessary to analyze solar and heliospheric data in Python.

28. SymPy

SymPy is a library for symbolic computation, offering features ranging from basic symbolic arithmetic to calculus, algebra, discrete mathematics, and quantum physics.

It provides computer algebra capabilities either as a standalone application, a library to other applications, or live on the web.

29. TensorFlow

TensorFlow is an open-source software library for machine learning across a range of tasks, developed by Google to meet their needs for systems capable of building and training neural networks to detect and decipher patterns and correlations, analogous to the learning and reasoning employed by humans.

It is currently used for both research and production at Google products,‍ often replacing the role of its closed-source predecessor, DistBelief.

30. Theano

Theano is a numerical computation Python library, allowing you to define, optimize, and evaluate mathematical expressions involving multidimensional arrays efficiently.

31. TomoPy

TomoPy is an open-source Python toolbox for performing tomographic data processing and image reconstruction tasks.

It offers a collaborative framework for the analysis of synchrotron tomographic data, with the goal to unify the efforts of different facilities and beamlines performing similar tasks.

32. Veusz

Veusz is a scientific plotting and graphing package designed to produce publication-quality plots in popular vector formats, including PDF, PostScript, and SVG.

Did we miss anything?

With so many great Python packages and tools out there to explore, there’s a good chance you know some exciting Python libraries that belong on this list, but didn't make the cut.

Do you think we’ve missed something we shouldn't have?

Feel free to suggest any additional Python software you find relevant in the comment section below. We’ll reply to you and consider expanding our selection.

And since you’ve gotten through our whole list of Python libraries, maybe we could interest you in a beginner's introduction to Python web frameworks?

Get your free ebook

Get your free ebook

Download ebook
Download ebook
Python Powerhouse
Share this post