# Introduction to pandas dataframes

[Pandas](https://pandas.pydata.org/) is a library providing high-performance, easy-to-use data structures and data analysis tools for Python.

In particulars, it enables the user to work with tabular datastructures. The main keywords used in pandas are tables, known as **Dataframes** and arrays, known as **Series**. 
Pandas has a huge number of api for file parsing, data manipulation, visualization, data groupby and so on. We will look at only some of the key features with some practical examples.

Some useful reference:

* Pandas 10 minutes intro: http://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#min
* Pandas cookbook: http://pandas.pydata.org/pandas-docs/stable/user_guide/cookbook.html#cookbook
* Pandas cheat sheet: http://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
* Pandas general user guide: http://pandas.pydata.org/pandas-docs/stable/user_guide/index.html 

In [1]:
# the usual import statement is this one
import pandas as pd
# just in case
import numpy as np

# Dataframes and series

Dataframes are two dimensional datastructure with possibly heterogeneous data. The easiest way to think about them is a table. Each dataframe has a number of rows and columns. The rows are indexed and the columns are labeled.

In [2]:
# example of dataframe
df = pd.DataFrame({'fruit': ['apple', 'pear', 'banana'],
                   'price': [1.5, 0.9, 1.1]})

df

Unnamed: 0,fruit,price
0,apple,1.5
1,pear,0.9
2,banana,1.1


The columns in a pandas dataframe are known as **series**. A series is like a numpy array with some more pandas api on top of it.

In [3]:
s = df.fruit
print(s)
print(type(s))

0     apple
1      pear
2    banana
Name: fruit, dtype: object
<class 'pandas.core.series.Series'>


Pandas dataframes and series are built on top of numpy arrays and the underlying datastructur can be always accessed via the `.values` attribute

In [4]:
df.values

array([['apple', 1.5],
       ['pear', 0.9],
       ['banana', 1.1]], dtype=object)

Let's look at some example of I/O file manipulation

# From files to pandas
Pandas has several functions to import data from one format into a dataframe. Let's look at the most used examples

## From csv to dataframe
`pd.read_csv()` can read csv data into dataframes. There are lot of arguments that can be passed to configure the parsing of the file.

In [6]:
# let's read a csv from an url:
# https://vincentarelbundock.github.io/Rdatasets/doc/DAAG/frogs.html
csv_url = 'https://vincentarelbundock.github.io/Rdatasets/csv/DAAG/frogs.csv'
df = pd.read_csv(csv_url)
df.head()  # prints only the first rows

Unnamed: 0.1,Unnamed: 0,pres.abs,northing,easting,altitude,distance,NoOfPools,NoOfSites,avrain,meanmin,meanmax
0,2,1,115,1047,1500,500,232,3,155.0,3.566667,14.0
1,3,1,110,1042,1520,250,66,5,157.666667,3.466667,13.8
2,4,1,112,1040,1540,250,32,5,159.666667,3.4,13.6
3,5,1,109,1033,1590,250,9,5,165.0,3.2,13.166667
4,6,1,109,1032,1590,250,67,5,165.0,3.2,13.166667


It's also possible to read data from a particular format csv/txt/tsv file

In [7]:
%%writefile csv_example1.csv
first_column;second_column;third_column
'a';12;3
'b1';13;5

Overwriting csv_example1.csv


In [8]:
df = pd.read_csv('csv_example1.csv', 
                 delimiter=';',  # specify a different delimiter
                 quotechar="'") # specify that strings are enclosed by ''
df

Unnamed: 0,first_column,second_column,third_column
0,a,12,3
1,b1,13,5


Pandas can read also data from excel files

In [9]:
url = 'https://github.com/bharathirajatut/sample-excel-dataset/blob/master/airline.xls?raw=true'
df = pd.read_excel(url)
df.head()

Unnamed: 0,YEAR,Y,W,R,L,K
0,1948,1.214,0.243,0.1454,1.415,0.612
1,1949,1.354,0.26,0.2181,1.384,0.559
2,1950,1.569,0.278,0.3157,1.388,0.573
3,1951,1.948,0.297,0.394,1.55,0.564
4,1952,2.265,0.31,0.3559,1.802,0.574


It has also some nice functions to parse simple html tables from a webpage

In [10]:
url = 'https://en.wikipedia.org/wiki/Table_(information)#Simple_table'
df_list = pd.read_html(url)
df_list

[    0                                                  1
 0 NaN  This article may require cleanup to meet Wikip...,
             0            1    2
 0  First name    Last name  Age
 1        Tinu     Elejogun   14
 2   Blaszczyk  Kostrzewski   25
 3        Lily    McGarrett   18
 4   Olatunkbo     Chijiaku   22
 5    Adrienne     Anthoula   22
 6      Axelia   Athanasios   22
 7   Jon-Kabat         Zinn   22
 8     Thabang        Mosoa   15
 9    Kgaogelo        Mosoa   11,
    0  1  2  3
 0  Ã—  1  2  3
 1  1  1  2  3
 2  2  2  4  6
 3  3  3  6  9,
                          0                                                  1  \
 0  Standard Representation                             Tabular Representation   
 1                    2 3 1  Risk levels of hazardous materials in this fac...   
 2              Health Risk                                       Flammability   
 3                  Level 3                                            Level 2   
 4              Health Risk     

In [16]:
df_list[1]

Unnamed: 0,0,1,2
0,First name,Last name,Age
1,Tinu,Elejogun,14
2,Blaszczyk,Kostrzewski,25
3,Lily,McGarrett,18
4,Olatunkbo,Chijiaku,22
5,Adrienne,Anthoula,22
6,Axelia,Athanasios,22
7,Jon-Kabat,Zinn,22
8,Thabang,Mosoa,15
9,Kgaogelo,Mosoa,11


Further reference on supported file formats: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html

# Data indexing and selection

Let's look at the different way to access data from a dataframe

In [18]:
csv_url = 'https://raw.githubusercontent.com/plotly/datasets/master/timeseries.csv'
df = pd.read_csv(csv_url, 
                 index_col=0, # set the index to the first column
                 parse_dates=True) # try to parse the date in the index

df

Unnamed: 0_level_0,A,B,C,D,E,F,G
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2008-03-18,24.68,164.93,114.73,26.27,19.21,28.87,63.44
2008-03-19,24.18,164.89,114.75,26.22,19.07,27.76,59.98
2008-03-20,23.99,164.63,115.04,25.78,19.01,27.04,59.61
2008-03-25,24.14,163.92,114.85,27.41,19.61,27.84,59.41
2008-03-26,24.44,163.45,114.84,26.86,19.53,28.02,60.09
2008-03-27,24.38,163.46,115.4,27.09,19.72,28.25,59.62
2008-03-28,24.32,163.22,115.56,27.13,19.63,28.24,58.65
2008-03-31,24.19,164.02,115.54,26.74,19.55,28.43,59.2
2008-04-01,23.81,163.59,115.72,27.82,20.21,29.17,56.18
2008-04-02,24.03,163.32,115.11,28.22,20.42,29.38,56.64


In [21]:
# access one row:
# given a known specific index: loc
df.loc['2008-03-25']
# as if the df was a numpy array: iloc
df.iloc[3]  # acccess the 2d array structure

A     24.14
B    163.92
C    114.85
D     27.41
E     19.61
F     27.84
G     59.41
Name: 2008-03-25 00:00:00, dtype: float64

In [25]:
df['A']
df.A

Date
2008-03-18    24.68
2008-03-19    24.18
2008-03-20    23.99
2008-03-25    24.14
2008-03-26    24.44
2008-03-27    24.38
2008-03-28    24.32
2008-03-31    24.19
2008-04-01    23.81
2008-04-02    24.03
2008-04-03    24.34
Name: A, dtype: float64

In [26]:
df.values

array([[ 24.68, 164.93, 114.73,  26.27,  19.21,  28.87,  63.44],
       [ 24.18, 164.89, 114.75,  26.22,  19.07,  27.76,  59.98],
       [ 23.99, 164.63, 115.04,  25.78,  19.01,  27.04,  59.61],
       [ 24.14, 163.92, 114.85,  27.41,  19.61,  27.84,  59.41],
       [ 24.44, 163.45, 114.84,  26.86,  19.53,  28.02,  60.09],
       [ 24.38, 163.46, 115.4 ,  27.09,  19.72,  28.25,  59.62],
       [ 24.32, 163.22, 115.56,  27.13,  19.63,  28.24,  58.65],
       [ 24.19, 164.02, 115.54,  26.74,  19.55,  28.43,  59.2 ],
       [ 23.81, 163.59, 115.72,  27.82,  20.21,  29.17,  56.18],
       [ 24.03, 163.32, 115.11,  28.22,  20.42,  29.38,  56.64],
       [ 24.34, 163.34, 115.17,  28.14,  20.36,  29.51,  57.49]])

In [27]:
# combined access of rows and columns:
# access the first 5 rows and the
# last three columns
df.iloc[:5, -3:]

Unnamed: 0_level_0,E,F,G
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2008-03-18,19.21,28.87,63.44
2008-03-19,19.07,27.76,59.98
2008-03-20,19.01,27.04,59.61
2008-03-25,19.61,27.84,59.41
2008-03-26,19.53,28.02,60.09


In [31]:
# Access rows from 2008-03-19 to 2008-03-30
# and only B, C, D columns
df.loc['2008-03-19':'2008-03-30', ['B', 'C', 'D']]

Unnamed: 0_level_0,B,C,D
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2008-03-19,164.89,114.75,26.22
2008-03-20,164.63,115.04,25.78
2008-03-25,163.92,114.85,27.41
2008-03-26,163.45,114.84,26.86
2008-03-27,163.46,115.4,27.09
2008-03-28,163.22,115.56,27.13


In [32]:
(df.B > 163.3) & (df.index < '2008-03-31')

Date
2008-03-18     True
2008-03-19     True
2008-03-20     True
2008-03-25     True
2008-03-26     True
2008-03-27     True
2008-03-28    False
2008-03-31    False
2008-04-01    False
2008-04-02    False
2008-04-03    False
Name: B, dtype: bool

In [33]:
# query data for particular conditions (like in SQL)
# Method 1: subsetting
df[(df.B > 163.3) & (df.index < '2008-03-31')]

# Method 2: query() function
df.query('B > 163.3 & index < "2008-03-31"')

Unnamed: 0_level_0,A,B,C,D,E,F,G
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2008-03-18,24.68,164.93,114.73,26.27,19.21,28.87,63.44
2008-03-19,24.18,164.89,114.75,26.22,19.07,27.76,59.98
2008-03-20,23.99,164.63,115.04,25.78,19.01,27.04,59.61
2008-03-25,24.14,163.92,114.85,27.41,19.61,27.84,59.41
2008-03-26,24.44,163.45,114.84,26.86,19.53,28.02,60.09
2008-03-27,24.38,163.46,115.4,27.09,19.72,28.25,59.62


## Dataframe data manipulation

Data can be also manipulated by adding and removing column. The operations usually returns a copy of the array so the result of the operation must be assigned to a new variable

In [36]:
# add a column from the operation
# of two other columns
df['Z'] = df.A + df.B
df

Unnamed: 0_level_0,A,B,C,D,E,F,G,Z,W
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2008-03-18,24.68,164.93,114.73,26.27,19.21,28.87,63.44,189.61,13
2008-03-19,24.18,164.89,114.75,26.22,19.07,27.76,59.98,189.07,13
2008-03-20,23.99,164.63,115.04,25.78,19.01,27.04,59.61,188.62,13
2008-03-25,24.14,163.92,114.85,27.41,19.61,27.84,59.41,188.06,13
2008-03-26,24.44,163.45,114.84,26.86,19.53,28.02,60.09,187.89,13
2008-03-27,24.38,163.46,115.4,27.09,19.72,28.25,59.62,187.84,13
2008-03-28,24.32,163.22,115.56,27.13,19.63,28.24,58.65,187.54,13
2008-03-31,24.19,164.02,115.54,26.74,19.55,28.43,59.2,188.21,13
2008-04-01,23.81,163.59,115.72,27.82,20.21,29.17,56.18,187.4,13
2008-04-02,24.03,163.32,115.11,28.22,20.42,29.38,56.64,187.35,13


In [35]:
# add a single constant value
df['W'] = 13

df.head()

Unnamed: 0_level_0,A,B,C,D,E,F,G,Z,W
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2008-03-18,24.68,164.93,114.73,26.27,19.21,28.87,63.44,189.61,13
2008-03-19,24.18,164.89,114.75,26.22,19.07,27.76,59.98,189.07,13
2008-03-20,23.99,164.63,115.04,25.78,19.01,27.04,59.61,188.62,13
2008-03-25,24.14,163.92,114.85,27.41,19.61,27.84,59.41,188.06,13
2008-03-26,24.44,163.45,114.84,26.86,19.53,28.02,60.09,187.89,13


In [37]:
# Remove column Z
# Method 1: inverse subsetting
df = df.loc[:, df.columns != 'Z']

# Method 2: calling appropriate function
df = df.drop(columns=['W'])
df.head()

Unnamed: 0_level_0,A,B,C,D,E,F,G
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2008-03-18,24.68,164.93,114.73,26.27,19.21,28.87,63.44
2008-03-19,24.18,164.89,114.75,26.22,19.07,27.76,59.98
2008-03-20,23.99,164.63,115.04,25.78,19.01,27.04,59.61
2008-03-25,24.14,163.92,114.85,27.41,19.61,27.84,59.41
2008-03-26,24.44,163.45,114.84,26.86,19.53,28.02,60.09


In [38]:
# Add a row
row = {'A': 25, 'B': 133, 'C': 111, 'D': 26, 'E': 20, 'F': 29, 'G':60}
# appending like this will make us loose the time series index
df.append(row, ignore_index=True)

Unnamed: 0,A,B,C,D,E,F,G
0,24.68,164.93,114.73,26.27,19.21,28.87,63.44
1,24.18,164.89,114.75,26.22,19.07,27.76,59.98
2,23.99,164.63,115.04,25.78,19.01,27.04,59.61
3,24.14,163.92,114.85,27.41,19.61,27.84,59.41
4,24.44,163.45,114.84,26.86,19.53,28.02,60.09
5,24.38,163.46,115.4,27.09,19.72,28.25,59.62
6,24.32,163.22,115.56,27.13,19.63,28.24,58.65
7,24.19,164.02,115.54,26.74,19.55,28.43,59.2
8,23.81,163.59,115.72,27.82,20.21,29.17,56.18
9,24.03,163.32,115.11,28.22,20.42,29.38,56.64


In [40]:
# Add a row
df.loc[pd.to_datetime('2008-11-10')] = [25, 133, 111, 26, 20, 29, 60]
df

Unnamed: 0_level_0,A,B,C,D,E,F,G
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2008-03-18,24.68,164.93,114.73,26.27,19.21,28.87,63.44
2008-03-19,24.18,164.89,114.75,26.22,19.07,27.76,59.98
2008-03-20,23.99,164.63,115.04,25.78,19.01,27.04,59.61
2008-03-25,24.14,163.92,114.85,27.41,19.61,27.84,59.41
2008-03-26,24.44,163.45,114.84,26.86,19.53,28.02,60.09
2008-03-27,24.38,163.46,115.4,27.09,19.72,28.25,59.62
2008-03-28,24.32,163.22,115.56,27.13,19.63,28.24,58.65
2008-03-31,24.19,164.02,115.54,26.74,19.55,28.43,59.2
2008-04-01,23.81,163.59,115.72,27.82,20.21,29.17,56.18
2008-04-02,24.03,163.32,115.11,28.22,20.42,29.38,56.64


In [41]:
# remove a row based on the date
df.drop(index=pd.to_datetime('2008-04-03'))

Unnamed: 0_level_0,A,B,C,D,E,F,G
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2008-03-18,24.68,164.93,114.73,26.27,19.21,28.87,63.44
2008-03-19,24.18,164.89,114.75,26.22,19.07,27.76,59.98
2008-03-20,23.99,164.63,115.04,25.78,19.01,27.04,59.61
2008-03-25,24.14,163.92,114.85,27.41,19.61,27.84,59.41
2008-03-26,24.44,163.45,114.84,26.86,19.53,28.02,60.09
2008-03-27,24.38,163.46,115.4,27.09,19.72,28.25,59.62
2008-03-28,24.32,163.22,115.56,27.13,19.63,28.24,58.65
2008-03-31,24.19,164.02,115.54,26.74,19.55,28.43,59.2
2008-04-01,23.81,163.59,115.72,27.82,20.21,29.17,56.18
2008-04-02,24.03,163.32,115.11,28.22,20.42,29.38,56.64


In [45]:
# Remove rows where total row sum is less
# than 437
mask = df.sum(axis=1) < 437
df[mask]

Unnamed: 0_level_0,A,B,C,D,E,F,G
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2008-03-19,24.18,164.89,114.75,26.22,19.07,27.76,59.98
2008-03-20,23.99,164.63,115.04,25.78,19.01,27.04,59.61
2008-03-28,24.32,163.22,115.56,27.13,19.63,28.24,58.65
2008-04-01,23.81,163.59,115.72,27.82,20.21,29.17,56.18
2008-11-10,25.0,133.0,111.0,26.0,20.0,29.0,60.0


# Groupby operations

pandas dataframes have a very powerful SQL-like groupby methods to help achieve some data manipulation that would require lot of work otherwise. Grouping can be useful for computing aggregated data statistics or for visualization.

In [49]:
url = 'https://raw.githubusercontent.com/Pierian-Data/AutoArima-Time-Series-Blog/master/Electric_Production.csv'
df = pd.read_csv(url, index_col=0, parse_dates=True)
df.head()

Unnamed: 0_level_0,IPG2211A2N
DATE,Unnamed: 1_level_1
1985-01-01,72.5052
1985-02-01,70.672
1985-03-01,62.4502
1985-04-01,57.4714
1985-05-01,55.3151


In [50]:
# I want to group by month
group = df.groupby(by=df.index.year)
group

<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x7fb7c8166860>

As you see nothing is printed out. This is because the groupby object is expecting that some aggregation function is executed in order to print out the results as a new dataframe. However we can inspect the internal groups in the groupby object

In [51]:
group.groups

{1985: DatetimeIndex(['1985-01-01', '1985-02-01', '1985-03-01', '1985-04-01',
                '1985-05-01', '1985-06-01', '1985-07-01', '1985-08-01',
                '1985-09-01', '1985-10-01', '1985-11-01', '1985-12-01'],
               dtype='datetime64[ns]', name='DATE', freq=None),
 1986: DatetimeIndex(['1986-01-01', '1986-02-01', '1986-03-01', '1986-04-01',
                '1986-05-01', '1986-06-01', '1986-07-01', '1986-08-01',
                '1986-09-01', '1986-10-01', '1986-11-01', '1986-12-01'],
               dtype='datetime64[ns]', name='DATE', freq=None),
 1987: DatetimeIndex(['1987-01-01', '1987-02-01', '1987-03-01', '1987-04-01',
                '1987-05-01', '1987-06-01', '1987-07-01', '1987-08-01',
                '1987-09-01', '1987-10-01', '1987-11-01', '1987-12-01'],
               dtype='datetime64[ns]', name='DATE', freq=None),
 1988: DatetimeIndex(['1988-01-01', '1988-02-01', '1988-03-01', '1988-04-01',
                '1988-05-01', '1988-06-01', '1988-07-01', '19

Sometimes it might be helpful to visualize mentally (and on the screen) what is happening to the groupby. The groupby in this case will group by the month in the index. A visual results is equivalent to add a new index like this:

In [54]:
df.set_index([df.index.year, df.index])

Unnamed: 0_level_0,Unnamed: 1_level_0,IPG2211A2N
DATE,DATE,Unnamed: 2_level_1
1985,1985-01-01,72.5052
1985,1985-02-01,70.6720
1985,1985-03-01,62.4502
1985,1985-04-01,57.4714
1985,1985-05-01,55.3151
1985,1985-06-01,58.0904
1985,1985-07-01,62.6202
1985,1985-08-01,63.2485
1985,1985-09-01,60.5846
1985,1985-10-01,56.3154


We can now apply some functions. For example, we want to compute the mean value for each column for the three months.

In [57]:
group.mean()

Unnamed: 0_level_0,IPG2211A2N
DATE,Unnamed: 1_level_1
1985,62.165667
1986,62.709892
1987,65.740275
1988,69.716358
1989,71.895167
1990,73.313433
1991,75.11185
1992,75.120908
1993,77.678992
1994,79.255058


In [58]:
# more sofisticated: we want to apply a 
# custom function. We can use a
# a lambda function
def custom_function(array):
    return np.std(array)

group.apply(lambda x: x.min())
group.apply(custom_function)

Unnamed: 0_level_0,IPG2211A2N
DATE,Unnamed: 1_level_1
1985,5.501963
1986,5.091016
1987,4.651648
1988,5.735234
1989,5.796188
1990,5.278903
1991,5.331017
1992,5.633756
1993,6.302929
1994,7.061054


In [59]:
# We can also groupby by multiple labels:
# for example, we want to groupby year
# and month
group2 = df.groupby(by=[df.index.year, df.index.month])
group2.mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,IPG2211A2N
DATE,DATE,Unnamed: 2_level_1
1985,1,72.5052
1985,2,70.6720
1985,3,62.4502
1985,4,57.4714
1985,5,55.3151
1985,6,58.0904
1985,7,62.6202
1985,8,63.2485
1985,9,60.5846
1985,10,56.3154


In [61]:
# if we want save this dataframe in a more usable way
# for plotting or for using it with a different application
# we can flatten the index with reset_index().
# Judt be sure that if there is a multi index
# (it usually happens when grouping by multiple columns)
# the indexes have different names
agg_mean = group2.mean()
agg_mean.index = agg_mean.index.set_names(['year', 'month'])
agg_mean.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,IPG2211A2N
year,month,Unnamed: 2_level_1
1985,1,72.5052
1985,2,70.672
1985,3,62.4502
1985,4,57.4714
1985,5,55.3151


In [62]:
# flatten multi index
agg_mean.reset_index()

Unnamed: 0,year,month,IPG2211A2N
0,1985,1,72.5052
1,1985,2,70.6720
2,1985,3,62.4502
3,1985,4,57.4714
4,1985,5,55.3151
5,1985,6,58.0904
6,1985,7,62.6202
7,1985,8,63.2485
8,1985,9,60.5846
9,1985,10,56.3154


This covers a brief outlook of the pandas features. There are so many other features that is difficult to cover them all, so we will look at some pratical examples