bokeh.sampledata#

The sampledata module can be used to download data sets used in Bokeh examples.

The simplest way to download the data is to use the execute the command line program:

bokeh sampledata

Alternatively, the download function described below may be called programmatically.

>>> import bokeh.sampledata
>>> bokeh.sampledata.download()

By default, data is downloaded and stored to a directory $HOME/.bokeh/data. This directory will be created if it does not already exist.

Bokeh also looks for a YAML configuration file at $HOME/.bokeh/config. The YAML key sampledata_dir can be set to the absolute path of a directory where the data should be stored. For example, add the following line to the config file:

sampledata_dir: /tmp/bokeh_data

This will cause the sample data to be stored in /tmp/bokeh_data.

download(progress: bool = True) None[source]#

Download larger data sets for various Bokeh examples.


anscombe#

The four data series that comprise Anscombe’s Quartet.

License: CC BY-SA 3.0

Sourced from: https://en.wikipedia.org/wiki/Anscombe%27s_quartet

This module contains one pandas Dataframe: data.

data

Ix Iy IIx IIy IIIx IIIy IVx IVy
0 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
1 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
2 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
3 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
4 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47

Examples

antibiotics#

A table of Will Burtin’s historical data regarding antibiotic efficacies.

License: MIT license

Sourced from: https://bl.ocks.org/borgar/cd32f1d804951034b224

This module contains one pandas Dataframe: data.

data

bacteria penicillin streptomycin neomycin gram start end colors
0 Mycobacterium tuberculosis 800.0 5.0 2.00 negative 1.016398 1.385997 #e69584
1 Salmonella schottmuelleri 10.0 0.8 0.09 negative 0.646798 1.016398 #e69584
2 Proteus vulgaris 3.0 0.1 0.10 negative 0.277199 0.646798 #e69584
3 Klebsiella pneumoniae 850.0 1.2 1.00 negative -0.092400 0.277199 #e69584
4 Brucella abortus 1.0 2.0 0.02 negative -0.461999 -0.092400 #e69584

Examples

burtin

airport_routes#

Airport routes data from OpenFlights.org.

License: ODbL 1.0

Sourced from https://openflights.org/data.html on September 07, 2017.

This module contains two pandas Dataframes: airports and routes.

airports

Name City Country IATA ICAO Latitude Longitude Altitude Timezone DST TZ Type source
index
3411 Barter Island LRRS Airport Barter Island United States BTI PABA 70.134003 -143.582001 2 -9 A America/Anchorage airport OurAirports
3413 Cape Lisburne LRRS Airport Cape Lisburne United States LUR PALU 68.875099 -166.110001 16 -9 A America/Anchorage airport OurAirports
3414 Point Lay LRRS Airport Point Lay United States PIZ PPIZ 69.732903 -163.005005 22 -9 A America/Anchorage airport OurAirports
3415 Hilo International Airport Hilo United States ITO PHTO 19.721399 -155.048004 38 -10 N Pacific/Honolulu airport OurAirports
3416 Orlando Executive Airport Orlando United States ORL KORL 28.545500 -81.332901 113 -5 A America/New_York airport OurAirports

routes

Airline AirlineID Source start Destination end Codeshare Stops Equipment
0 2O 146 ADQ 3531 KLN 7162 NaN 0 BNI
1 2O 146 KLN 7162 KYK 7161 NaN 0 BNI
2 3E 10739 BRL 5726 ORD 3830 NaN 0 CNC
3 3E 10739 BRL 5726 STL 3678 NaN 0 CNC
4 3E 10739 DEC 4042 ORD 3830 NaN 0 CNC

Examples

airports#

US airports with field elevations > 1500 meters.

License: Public Domain

Sourced from USGS service http://services.nationalmap.gov on October 15, 2015.

This module contains one pandas Dataframe: data.

data

name elevation x y
0 CHINLE MUNICIPAL AIRPORT 1691 -1.219788e+07 4.315889e+06
1 ELY AIRPORT /YELLAND FIELD/ AIRPORT 1908 -1.278414e+07 4.764692e+06
2 TRUCKEE-TAHOE AIRPORT 1798 -1.337387e+07 4.767619e+06
3 GARFIELD COUNTY REGIONAL AIRPORT 1691 -1.199211e+07 4.797343e+06
4 SANTA FE MUNICIPAL AIRPORT 1935 -1.180982e+07 4.248063e+06

Examples

autompg#

A version of the Auto MPG data set.

License: CC0

Sourced from https://archive.ics.uci.edu/ml/datasets/auto+mpg

This module contains two pandas Dataframes: autompg and autompg_clean. The “clean” version has cleaned up the "mfr" and "origin" fields.

autompg

mpg cyl displ hp weight accel yr origin name
0 18.0 8 307.0 130 3504 12.0 70 1 chevrolet chevelle malibu
1 15.0 8 350.0 165 3693 11.5 70 1 buick skylark 320
2 18.0 8 318.0 150 3436 11.0 70 1 plymouth satellite
3 16.0 8 304.0 150 3433 12.0 70 1 amc rebel sst
4 17.0 8 302.0 140 3449 10.5 70 1 ford torino

autompg_clean

mpg cyl displ hp weight accel yr origin name mfr
0 18.0 8 307.0 130 3504 12.0 70 North America chevrolet chevelle malibu chevrolet
1 15.0 8 350.0 165 3693 11.5 70 North America buick skylark 320 buick
2 18.0 8 318.0 150 3436 11.0 70 North America plymouth satellite plymouth
3 16.0 8 304.0 150 3433 12.0 70 North America amc rebel sst amc
4 17.0 8 302.0 140 3449 10.5 70 North America ford torino ford

Examples

kde2d

autompg2#

A version of the Auto MPG data set.

License: CC0

Sourced from https://archive.ics.uci.edu/ml/datasets/auto+mpg

This module contains one pandas Dataframe: autompg.

autompg2

Unnamed: 0 manufacturer model displ year cyl trans drv cty hwy fl class
0 1 Audi A4 1.8 1999 4 auto(l5) front 18 29 p compact
1 2 Audi A4 1.8 1999 4 manual(m5) front 21 29 p compact
2 3 Audi A4 2.0 2008 4 manual(m6) front 20 31 p compact
3 4 Audi A4 2.0 2008 4 auto(av) front 21 30 p compact
4 5 Audi A4 2.8 1999 6 auto(l5) front 16 26 p compact

Examples

boxplot
whisker

browsers#

Browser market share by version from November 2013.

License: CC BY-SA 3.0

Sourced from http://gs.statcounter.com/#browser_version-ww-monthly-201311-201311-bar

Icon images sourced from alrra/browser-logos

This module contains one pandas Dataframe: browsers_nov_2013.

browsers_nov_2013

Version Share Browser VersionNumber
0 Chrome 30.0 18.51 Chrome 30.0
1 Chrome 31.0 17.31 Chrome 31.0
2 Firefox 25.0 11.21 Firefox 25.0
3 IE 10.0 11.10 IE 10.0
4 IE 8.0 8.65 IE 8.0

The module also contains a dictionary icons with base64-encoded PNGs of the logos for Chrome, Firefox, Safari, Opera, and IE.

Examples

donut

commits#

Time series of commits for a GitHub user between 2012 and 2016.

License: Public Domain

This module contains one pandas Dataframe: data.

data

day time
datetime
2017-04-22 15:11:58-05:00 Sat 15:11:58
2017-04-21 14:20:57-05:00 Fri 14:20:57
2017-04-20 14:35:08-05:00 Thu 14:35:08
2017-04-20 10:34:29-05:00 Thu 10:34:29
2017-04-20 09:17:23-05:00 Thu 09:17:23

Examples

cows#

Butterfat percentage in the milk of five cattle breeds.

License: Public Domain

This module contains one pandas Dataframe: data.

data

butterfat age breed
0 3.74 Mature Ayrshire
1 4.01 Mature Ayrshire
2 3.77 Mature Ayrshire
3 3.78 Mature Ayrshire
4 4.10 Mature Ayrshire

Examples

density

daylight#

Provide 2013 Warsaw daylight hours.

License: free to use and redistribute (see this FAQ for details).

Sourced from http://www.sunrisesunset.com

This module contains one pandas Dataframe: daylight_warsaw_2013.

daylight_warsaw_2013

Date Sunrise Sunset Summer
0 2013-01-01 07:45:00 15:34:00 0
1 2013-01-02 07:45:00 15:35:00 0
2 2013-01-03 07:45:00 15:36:00 0
3 2013-01-04 07:45:00 15:37:00 0
4 2013-01-05 07:44:00 15:38:00 0

Examples

span

degrees#

Provide a table of data regarding bachelor’s degrees earned by women.

The data is broken down by field for any given year.

Licence: CC0

Sourced from: https://www.kaggle.com/datasets/sureshsrinivas/bachelorsdegreewomenusa

This module contains one pandas Dataframe: data.

data

Year Agriculture Architecture Art and Performance Biology Business Communications and Journalism Computer Science Education Engineering English Foreign Languages Health Professions Math and Statistics Physical Sciences Psychology Public Administration Social Sciences and History
0 1970 4.229798 11.921005 59.7 29.088363 9.064439 35.3 13.6 74.535328 0.8 65.570923 73.8 77.1 38.0 13.8 44.4 68.4 36.8
1 1971 5.452797 12.003106 59.9 29.394403 9.503187 35.5 13.6 74.149204 1.0 64.556485 73.9 75.5 39.0 14.9 46.2 65.5 36.2
2 1972 7.420710 13.214594 60.4 29.810221 10.558962 36.6 14.9 73.554520 1.2 63.664263 74.6 76.9 40.2 14.8 47.6 62.6 36.1
3 1973 9.653602 14.791613 60.2 31.147915 12.804602 38.4 16.4 73.501814 1.6 62.941502 74.9 77.4 40.9 16.5 50.4 64.3 36.4
4 1974 14.074623 17.444688 61.9 32.996183 16.204850 40.5 18.9 73.336811 2.2 62.413412 75.3 77.9 41.8 18.2 52.6 66.1 37.3

emissions#

CO2 emmisions of selected countries in the years from 1950 to 2012. Note that not all countries have values for the whole time range.

License: Public Domain

This module contains one pandas Dataframe: data.

data

country year emissions
0 Afghanistan 1950.0 0.010346
1 Albania 1950.0 0.244444
2 Algeria 1950.0 0.432728
3 Angola 1950.0 0.045087
4 Argentina 1950.0 1.746283

Examples

forensic_glass#

Correlations in mineral content for forensic glass samples.

License: Public Domain

This module contains one pandas Dataframe: data.

data

RI Na Mg Al Si K Ca Ba Fe type
0 3.01 13.64 4.49 1.10 71.78 0.06 8.75 0.0 0.0 WinF
1 -0.39 13.89 3.60 1.36 72.73 0.48 7.83 0.0 0.0 WinF
2 -1.82 13.53 3.55 1.54 72.99 0.39 7.78 0.0 0.0 WinF
3 -0.34 13.21 3.69 1.29 72.61 0.57 8.22 0.0 0.0 WinF
4 -0.58 13.27 3.62 1.24 73.08 0.55 8.07 0.0 0.0 WinF

Examples

gapminder#

Four of the datasets from Gapminder.

License: CC BY 2.0

Sourced from https://www.gapminder.org/data/

This module contains four pandas Dataframes: fertility, life_expectancy, population, and regions.

fertility

1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Country
Afghanistan 7.671 7.671 7.671 7.671 7.671 7.671 7.671 7.671 7.671 7.671 7.671 7.671 7.670 7.670 7.670 7.669 7.669 7.670 7.671 7.673 7.676 7.679 7.681 7.682 7.682 7.682 7.687 7.700 7.725 7.758 7.796 7.832 7.859 7.869 7.854 7.809 7.733 7.623 7.484 7.321 7.136 6.930 6.702 6.456 6.196 5.928 5.659 5.395 5.141 4.900
Albania 5.711 5.594 5.483 5.376 5.268 5.160 5.050 4.933 4.809 4.677 4.538 4.393 4.244 4.094 3.947 3.807 3.678 3.562 3.460 3.372 3.297 3.233 3.177 3.126 3.075 3.023 2.970 2.917 2.867 2.819 2.772 2.723 2.670 2.611 2.543 2.467 2.383 2.291 2.195 2.097 2.004 1.919 1.849 1.796 1.761 1.744 1.741 1.748 1.760 1.771
Algeria 7.653 7.655 7.657 7.658 7.657 7.652 7.641 7.622 7.591 7.548 7.492 7.422 7.339 7.244 7.138 7.021 6.889 6.741 6.576 6.392 6.192 5.976 5.747 5.508 5.263 5.014 4.761 4.503 4.238 3.971 3.705 3.449 3.207 2.987 2.794 2.634 2.514 2.439 2.407 2.412 2.448 2.507 2.580 2.656 2.725 2.781 2.817 2.829 2.820 2.795
American Samoa NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Andorra NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

life_expectancy

1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Country
Afghanistan 33.639 34.152 34.662 35.170 35.674 36.172 36.663 37.143 37.614 38.075 38.529 38.977 39.417 39.855 40.298 40.756 41.242 41.770 42.347 42.977 43.661 44.400 45.192 46.024 46.880 47.744 48.601 49.439 50.247 51.017 51.738 52.400 52.995 53.527 54.009 54.449 54.863 55.271 55.687 56.122 56.583 57.071 57.582 58.102 58.618 59.124 59.612 60.079 60.524 60.947
Albania 65.475 65.863 66.122 66.316 66.500 66.702 66.948 67.251 67.595 67.966 68.356 68.748 69.121 69.459 69.753 70.001 70.218 70.426 70.646 70.886 71.144 71.398 71.615 71.770 71.853 71.870 71.842 71.799 71.779 71.813 71.920 72.117 72.415 72.796 73.235 73.713 74.200 74.664 75.081 75.437 75.725 75.949 76.124 76.278 76.433 76.598 76.780 76.979 77.185 77.392
Algeria 47.953 48.389 48.806 49.205 49.592 49.976 50.366 50.767 51.195 51.670 52.213 52.861 53.656 54.605 55.697 56.907 58.198 59.524 60.826 62.051 63.160 64.120 64.911 65.554 66.072 66.479 66.796 67.049 67.265 67.468 67.674 67.893 68.123 68.350 68.565 68.769 68.963 69.149 69.330 69.508 69.682 69.854 70.020 70.180 70.332 70.477 70.615 70.747 70.874 71.000
American Samoa NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Andorra NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

population

1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Country
Afghanistan 10474903.0 10697983.0 10927724.0 11163656.0 11411022.0 11676990.0 11964906.0 12273101.0 12593688.0 12915499.0 13223928.0 13505544.0 13766792.0 14003408.0 14179656.0 14249493.0 14185729.0 13984092.0 13672870.0 13300056.0 12931791.0 12625292.0 12372113.0 12183387.0 12156685.0 12414686.0 13032161.0 14069854.0 15472076.0 17053213.0 18553819.0 19789880.0 20684982.0 21299350.0 21752257.0 22227543.0 22856302.0 23677385.0 24639841.0 25678639.0 26693486.0 27614718.0 28420974.0 29145841.0 29839994.0 30577756.0 31411743.0 32358260.0 33397058.0 34499915.0
Albania 1817098.0 1869942.0 1922993.0 1976140.0 2029314.0 2082474.0 2135599.0 2188650.0 2241623.0 2294578.0 2347607.0 2400801.0 2454255.0 2508026.0 2562121.0 2616530.0 2671300.0 2725029.0 2777592.0 2831682.0 2891004.0 2957390.0 3033393.0 3116009.0 3194854.0 3255859.0 3289483.0 3291695.0 3266983.0 3224901.0 3179442.0 3141102.0 3112597.0 3091902.0 3079037.0 3072725.0 3071856.0 3077378.0 3089778.0 3106701.0 3124861.0 3141800.0 3156607.0 3169665.0 3181397.0 3192723.0 3204284.0 3215988.0 3227373.0 3238316.0
Algeria 11654905.0 11923002.0 12229853.0 12572629.0 12945462.0 13338918.0 13746185.0 14165889.0 14600659.0 15052371.0 15524137.0 16018195.0 16533323.0 17068212.0 17624756.0 18205468.0 18811199.0 19442423.0 20095648.0 20762767.0 21433070.0 22098298.0 22753511.0 23398470.0 24035237.0 24668100.0 25299182.0 25930560.0 26557969.0 27169903.0 27751086.0 28291591.0 28786855.0 29242917.0 29673694.0 30099010.0 30533827.0 30982214.0 31441848.0 31913462.0 32396048.0 32888449.0 33391954.0 33906605.0 34428028.0 34950168.0 35468208.0 35980193.0 36485828.0 36983924.0
American Samoa 22672.0 23480.0 24283.0 25087.0 25869.0 26608.0 27288.0 27907.0 28470.0 28983.0 29453.0 29897.0 30305.0 30696.0 31139.0 31727.0 32526.0 33557.0 34797.0 36203.0 37706.0 39253.0 40834.0 42446.0 44048.0 45595.0 47052.0 48402.0 49648.0 50801.0 51885.0 52919.0 53901.0 54834.0 55745.0 56667.0 57625.0 58633.0 59687.0 60774.0 61871.0 62962.0 64045.0 65130.0 66217.0 67312.0 68420.0 69543.0 70680.0 71834.0
Andorra 17438.0 18529.0 19640.0 20772.0 21931.0 23127.0 24364.0 25656.0 26997.0 28357.0 29688.0 30967.0 32156.0 33279.0 34432.0 35753.0 37328.0 39226.0 41390.0 43636.0 45702.0 47414.0 48653.0 49504.0 50236.0 51241.0 52773.0 54996.0 57767.0 60670.0 63111.0 64699.0 65227.0 64905.0 64246.0 63985.0 64634.0 66390.0 69043.0 72203.0 75292.0 77888.0 79874.0 81390.0 82577.0 83677.0 84864.0 86165.0 87518.0 88909.0

regions

Group ID
Country
Angola Sub-Saharan Africa AO
Benin Sub-Saharan Africa BJ
Botswana Sub-Saharan Africa BW
Burkina Faso Sub-Saharan Africa BF
Burundi Sub-Saharan Africa BI

glucose#

A CSV timeseries of blood glucose measurements.

This module contains one pandas Dataframe: data.

data

isig glucose
datetime
2010-03-24 09:51:00 22.59 258
2010-03-24 09:56:00 22.52 260
2010-03-24 10:01:00 22.23 258
2010-03-24 10:06:00 21.56 254
2010-03-24 10:11:00 20.79 246

Examples

haar_cascade#

Provide a Haar cascade file for face recognition.

License: MIT license

Sourced from the OpenCV project.

This module contains an attribute frontalface_default_path . Use this attribute to obtain the path to a Haar cascade file for frontal face recognition that can be used by OpenCV.

iris#

Provide Fisher’s Iris dataset.

License: CC0

Sourced from: https://www.kaggle.com/datasets/arshid/iris-flower-dataset

This module contains one pandas Dataframe: flowers.

Note

This sampledata is maintained for historical compatibility. Please consider alternatives to Iris such as penguins.

flowers

sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

Examples

iris

lincoln#

Mean daily temperatures in Lincoln, Nebraska, 2016.

License: Public Domain

This module contains one pandas Dataframe: data.

data

STATION NAME DATE TMAX TMIN TAVG MONTH
0 USW00094996 LINCOLN 11 SW, NE US 2016-01-01 36.0 15.0 25.5 Jan
1 USW00094996 LINCOLN 11 SW, NE US 2016-01-02 39.0 18.0 28.5 Jan
2 USW00094996 LINCOLN 11 SW, NE US 2016-01-03 32.0 15.0 23.5 Jan
3 USW00094996 LINCOLN 11 SW, NE US 2016-01-04 27.0 15.0 21.0 Jan
4 USW00094996 LINCOLN 11 SW, NE US 2016-01-05 40.0 21.0 30.5 Jan

Examples

les_mis#

Provide JSON data for co-occurrence of characters in Les Miserables.

License: CC BY-ND 4.0

Source from http://ftp.cs.stanford.edu/pub/sgb/jean.dat

This module contains one dictionary: data.

data

{
    'nodes': [
        {'name': 'Myriel', 'group': 1},
        ...
        {'name': 'Mme.Hucheloup', 'group': 8}
    ],
    'links': [
        {'source': 1, 'target': 0, 'value': 1},
        ...
        {'source': 76, 'target': 58, 'value': 1}
    ]
}

Examples

les_mis

movies_data#

A small subset of data from the Open Movie Database.

License: CC BY-NC 4.0

Sourced from http://www.omdbapi.com

This modules has an attribute movie_path. This attribute contains the path to a SQLite database with the data.

mtb#

Route data (including altitude) for a bike race in Eastern Europe.

Sourced from https://bikemaraton.com.pl

This module contains one pandas Dataframe: obiszow_mtb_xcm.

obiszow_mtb_xcm

lon lat alt
0 16.116775 51.578265 118.0
1 16.116741 51.578265 118.0
2 16.116776 51.578253 118.0
3 16.116792 51.578223 119.0
4 16.116584 51.578058 119.0

Examples

trail

olympics2014#

Provide medal counts by country for the 2014 Olympics.

Sourced from public news sources in 2014.

This module contains a single dict: data.

The dictionary has a key "data" that lists sub-dictionaries, one for each country:

{
    'abbr': 'DEU',
    'medals': {'total': 15, 'bronze': 4, 'gold': 8, 'silver': 3},
    'name': 'Germany'
}

penguins#

Provide data from the Palmer Archipelago (Antarctica) penguin dataset.

License: CC0

Sourced from mwaskom/seaborn-data

This module contains one pandas Dataframe: data.

data

species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex colors
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 MALE red
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 FEMALE red
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 FEMALE red
3 Adelie Torgersen NaN NaN NaN NaN NaN red
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 FEMALE red

Examples

splom

perceptions#

Provides access to probly.csv and numberly.csv.

License: MIT license

Sourced from: zonination/perceptions

This module contains two pandas Dataframes: probly and numberly.

probly

Almost Certainly Highly Likely Very Good Chance Probable Likely Probably We Believe Better Than Even About Even We Doubt Improbable Unlikely Probably Not Little Chance Almost No Chance Highly Unlikely Chances Are Slight
0 95.0 80 85 75 66 75 66 55.0 50 40 20.0 30 15.0 20 5.0 25 25
1 95.0 75 75 51 75 51 51 51.0 50 20 49.0 25 49.0 5 5.0 10 5
2 95.0 85 85 70 75 70 80 60.0 50 30 10.0 25 25.0 20 1.0 5 15
3 95.0 85 85 70 75 70 80 60.0 50 30 10.0 25 25.0 20 1.0 5 15
4 98.0 95 80 70 70 75 65 60.0 50 10 50.0 5 20.0 5 1.0 2 10

numberly

A couple A few Dozens A lot Some Several Many Fractions of Scores of Hundreds of
0 2 3 30 20 4 7 12 0.15 80 250
1 2 3 24 12 6 10 50 0.50 40 200
2 2 5 30 15 5 4 25 0.25 500 500
3 2 5 30 15 5 4 25 0.25 500 500
4 2 3 48 50 3 5 5 0.01 100000 599

Examples

periodic_table#

Provide a periodic table data set.

License: Public Domain

This module contains one pandas Dataframe: elements.

elements

atomic number symbol name atomic mass CPK electronic configuration electronegativity atomic radius ion radius van der Waals radius IE-1 EA standard state bonding type melting point boiling point density metal year discovered group period
0 1 H Hydrogen 1.00794 #FFFFFF 1s1 2.20 37.0 NaN 120.0 1312.0 -73.0 gas diatomic 14.0 20.0 0.00009 nonmetal 1766 1 1
1 2 He Helium 4.002602 #D9FFFF 1s2 NaN 32.0 NaN 140.0 2372.0 0.0 gas atomic NaN 4.0 0.00000 noble gas 1868 18 1
2 3 Li Lithium 6.941 #CC80FF [He] 2s1 0.98 134.0 76 (+1) 182.0 520.0 -60.0 solid metallic 454.0 1615.0 0.54000 alkali metal 1817 1 2
3 4 Be Beryllium 9.012182 #C2FF00 [He] 2s2 1.57 90.0 45 (+2) NaN 900.0 0.0 solid metallic 1560.0 2743.0 1.85000 alkaline earth metal 1798 2 2
4 5 B Boron 10.811 #FFB5B5 [He] 2s2 2p1 2.04 82.0 27 (+3) NaN 801.0 -27.0 solid covalent network 2348.0 4273.0 2.46000 metalloid 1807 13 2

Examples

population#

Historical and projected population data by age, gender, and country.

License: CC BY 3.0 IGO

Sourced from: https://population.un.org/wpp/Download/Standard/Population/

This module contains one pandas Dataframe: data.

data

LocID Location Year Sex AgeGrp AgeGrpStart Value
0 4 Afghanistan 1950 Male 0-4 0 662064.0
1 4 Afghanistan 1950 Male 5-9 5 508166.0
2 4 Afghanistan 1950 Male 10-14 10 444396.0
3 4 Afghanistan 1950 Male 15-19 15 390480.0
4 4 Afghanistan 1950 Male 20-24 20 337318.0

sample_geojson#

Provide geojson data for the UK NHS England area teams.

License: Open Government Licence

Sourced from JeniT/nhs-choices

A snapshot of data available from NHS Choices on November 14th, 2015.

Examples

sample_superstore#

Provide the Sample Superstore data set.

License: CC0

Sourced from: https://www.kaggle.com/datasets/arshid/iris-flower-dataset

This module contains one pandas Dataframe: data.

data

Ship Mode Segment Country City State Postal Code Region Category Sub-Category Sales Quantity Discount Profit
0 Second Class Consumer United States Henderson Kentucky 42420 South Furniture Bookcases 261.9600 2 0.00 41.9136
1 Second Class Consumer United States Henderson Kentucky 42420 South Furniture Chairs 731.9400 3 0.00 219.5820
2 Second Class Corporate United States Los Angeles California 90036 West Office Supplies Labels 14.6200 2 0.00 6.8714
3 Standard Class Consumer United States Fort Lauderdale Florida 33311 South Furniture Tables 957.5775 5 0.45 -383.0310
4 Standard Class Consumer United States Fort Lauderdale Florida 33311 South Office Supplies Storage 22.3680 2 0.20 2.5164

Examples

treemap

sea_surface_temperature#

Time series of historical average sea surface temperatures.

License: free to use and redistribute (see this table for details).

Sourced from http://www.neracoos.org/erddap/tabledap/index.html (table B01_sbe37_all)

This module contains one pandas Dataframe: sea_surface_temperature.

sea_surface_temperature

temperature
time
2016-02-15 00:00:00+00:00 4.929
2016-02-15 00:30:00+00:00 4.887
2016-02-15 01:00:00+00:00 4.821
2016-02-15 01:30:00+00:00 4.837
2016-02-15 02:00:00+00:00 4.830

sprint#

Historical results for Olympic sprints by year.

Sourced from public news sources.

This module contains one pandas Dataframe: sprint.

sprint

Name Country Medal Time Year Abbrev Speed MetersBack MedalFill MedalLine SelectedName
0 Usain Bolt JAM gold 9.63 2012 JAM 10.384216 0.000000 #efcf6d #c8a850
1 Yohan Blake JAM silver 9.75 2012 JAM 10.256410 1.230769 #cccccc #b0b0b1
2 Justin Gatlin USA bronze 9.79 2012 USA 10.214505 1.634321 #c59e8a #98715d
3 Usain Bolt JAM gold 9.69 2008 JAM 10.319917 0.619195 #efcf6d #c8a850
4 Richard Thompson TRI silver 9.89 2008 TRI 10.111223 2.628918 #cccccc #b0b0b1

Examples

sprint

titanic#

Demographic details of the passengers on board of the Titanic.

License: Public Domain

This module contains one pandas Dataframe: data.

data

name class age sex survived
0 Allen, Miss Elisabeth Walton 1st 29.00 female 1
1 Allison, Miss Helen Loraine 1st 2.00 female 0
2 Allison, Mr Hudson Joshua Creighton 1st 30.00 male 0
3 Allison, Mrs Hudson JC (Bessie Waldo Daniels) 1st 25.00 female 0
4 Allison, Master Hudson Trevor 1st 0.92 male 1

Examples

pyramid

stocks#

Provide historical ticker data for selected stocks.

Sourced from public news sources.

This module contains five dicts: AAPL, FB, GOOG, IBM, and MSFT.

Each dictionary has the structure:

AAPL['date']       # list of date string
AAPL['open']       # list of float
AAPL['high']       # list of float
AAPL['low']        # list of float
AAPL['close']      # list of float
AAPL['volume']     # list of int
AAPL['adj_close']  # list of float

Examples

bounds
stocks

unemployment#

Per-county unemployment data for Unites States in 2009.

License: Public Domain

Sourced from: https://www.bls.gov

This module contains one dict: data.

The dict is indexed by the two-tuples containing (state_id, county_id) and has the unemployment rate (2009) as the value.

{
    (1, 1): 9.7,
    (1, 3): 9.1,
    ...
}

Examples

unemployment1948#

US Unemployment rate data by month and year, from 1948 to 2013.

License: Public Domain

Sourced from: https://www.bls.gov

This module contains one pandas Dataframe: data.

data

Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Annual
0 1948 4.0 4.7 4.5 4.0 3.4 3.9 3.9 3.6 3.4 2.9 3.3 3.6 3.8
1 1949 5.0 5.8 5.6 5.4 5.7 6.4 7.0 6.3 5.9 6.1 5.7 6.0 5.9
2 1950 7.6 7.9 7.1 6.0 5.3 5.6 5.3 4.1 4.0 3.3 3.8 3.9 5.3
3 1951 4.4 4.2 3.8 3.2 2.9 3.4 3.3 2.9 3.0 2.8 3.2 2.9 3.3
4 1952 3.7 3.8 3.3 3.0 2.9 3.2 3.3 3.1 2.7 2.4 2.5 2.5 3.0

Examples

us_cities#

Locations of US cities with more than 5000 residents.

License: CC BY 2.0

Sourced from http://www.geonames.org/export/ (subset of cities5000.zip)

This module contains one dict: data.

data['lat']  # list of float
data['lon']  # list of float

us_counties#

This modules exposes geometry data for Unites States.

This module contains one dict: data.

The data is indexed by two-tuples of (state_id, county_id) that have the following dictionaries as values:

In [25]: data[(1,1)]
Out[25]:
{
    'name': 'Autauga',
    'detailed name': 'Autauga County, Alabama',
    'state': 'al',
    'lats': [32.4757, ..., 32.48112],
    'lons': [-86.41182, ..., -86.41187]
}

Entries for 'name' can have duplicates for certain states (e.g. Virginia). The combination of 'detailed name' and 'state' will always be unique.

Examples

us_holidays#

Calendar file of US Holidays from Mozilla provided by icalendar.

License CC BY-SA 3.0

Sourced from: https://www.mozilla.org/en-US/projects/calendar/holidays/

This module contains one list: us_holidays.

us_holidays

[
    (datetime.date(1966, 12, 26), 'Kwanzaa'),
    (datetime.date(2000, 1, 1), "New Year's Day"),
    ...
    (datetime.date(2020, 12, 25), 'Christmas Day (US-OPM)')
]

Examples

us_marriages_divorces#

Provide U.S. marriage and divorce statistics between 1867 and 2014

License: Public Domain

Sourced from http://www.cdc.gov/nchs/

This module contains one pandas Dataframe: data.

data

Year Marriages Divorces Population Marriages_per_1000 Divorces_per_1000
0 1867 357000.0 10000.0 36970000 9.7 0.3
1 1868 345000.0 10000.0 37885000 9.1 0.3
2 1869 348000.0 11000.0 38870000 9.0 0.3
3 1870 352000.0 11000.0 39905000 8.8 0.3
4 1871 359000.0 12000.0 41010000 8.8 0.3

Examples

us_states#

Geometry data for US States.

This module contains one dict: data.

The data is indexed by the two letter state code (e.g., ‘CA’, ‘TX’) and has the following structure:

In [4]: data["OR"]
Out[4]:
{
    'name': 'Oregon',
    'region': 'Northwest',
    'lats': [46.29443, ..., 46.26068],
    'lons': [-124.03622, ..., -124.15935]
}

Examples

eclipse

world_cities#

Names and locations of world cities with at least 5000 inhabitants.

License: CC BY 2.0

Sourced from http://www.geonames.org/export/ (cities5000.zip)

This module contains one pandas Dataframe: data.

data

name lat lng
0 Ordino 42.55623 1.53319
1 les Escaldes 42.50729 1.53414
2 la Massana 42.54499 1.51483
3 Encamp 42.53474 1.58014
4 Canillo 42.56760 1.59756

Examples