Data In Out Of Pandas
Following my last post on visualisations in pandas. In this post I will discuss getting data into and out of Pandas.
There are a large number of options for loading and saving your data. This page gives a full list https://pandas.pydata.org/docs/reference/io.html
I will look at some of the more popular ones now.
import pandas as pd
import numpy as np
import sqlite3
Input¶
Read_CSV¶
Read_csv comes with a large number of Parameters. Some of the most useful ones are:
- Skiprows (skip n rows at top)
- dtype (change the dtype of the columns)
- header (select the row which should be used)
- parse_dates (get Pandas to parse the date columns)
- dayfirst (useful when dealing with UK dates)
df = pd.read_csv('data.csv')
df
NAME | AGE | SALARY | |
---|---|---|---|
0 | John | Smith | 30000 |
1 | Jane | Smith | 40000 |
Read_excel¶
Read_excel also has a number of useful Parameters many are the same as read_csv but these are also useful:
- sheet_name (if more than one sheet)
pd.read_excel('data.xlsx')
NAME | AGE | SALARY | |
---|---|---|---|
0 | John | Smith | 30000 |
1 | Jane | Smith | 40000 |
Read_html¶
If you have a webpage with a table Pandas can quickly grab that table and add to a dataframe.
If there is more than one table on the page it will return a list of datframes.
pd.read_html('https://pandas.pydata.org/docs/getting_started/overview.html')
[ Dimensions Name Description 0 1 Series 1D labeled homogeneously-typed array 1 2 DataFrame General 2D labeled, size-mutable tabular struc...]
Read_sql¶
We will make a database and load some data.
conn = sqlite3.connect('input_ouput.db')
conn.execute('''CREATE TABLE COMPANY
(NAME TEXT,
AGE INT,
SALARY INT)''')
<sqlite3.Cursor at 0x21af296e730>
Add a record to the DB.
conn.execute("""INSERT INTO COMPANY (NAME,AGE,SALARY)
VALUES ('Matt', 32, 20000)""")
<sqlite3.Cursor at 0x21af29b1260>
Now we have a database with one row lets read into a dataframe
pd.read_sql('SELECT * FROM COMPANY', con=conn)
NAME | AGE | SALARY | |
---|---|---|---|
0 | Matt | 32 | 20000 |
Output¶
Now lets look at how we output data to the same popular sources.
to_csv¶
df.to_csv('output.csv')
to_excel¶
Again there are a number of useful perimeters one good one is index=False
df.to_excel('output.xlsx')
to_sql¶
Lets add some data to our database
df.to_sql('COMPANY', con=conn, if_exists='append', index=False)
conn.execute("SELECT * FROM COMPANY").fetchall()
[('Matt', 32, 20000), ('John', 'Smith', 30000), ('Jane', 'Smith', 40000)]
That is all for today.