Lots of people say that you should code thinking that you will forget reasons why you wrote things the way you did. You should write comments explaining to your future self.
In this post I will upload things that I want to remember how to use. Most of it I learned from pandas tagged questions in
StakOverflow.
How to define a dataframe from a dictionary:
import pandas as pd
d = {
'A' : [1, 2, 3, 4, 5, 6, 7, 7, 7, 7],
'B' : ['a','b','c','d','e','f','g','h','h','h'],
}
df = pd.DataFrame(data=d)
How to read a csv file and create a dataframe:
df = pd.read_csv("your_csv_file.csv")
How to import an Excel file and create a data-frame:
xl = pd.ExcelFile("file_name.xls")
df = xl.parse("Sheet1") # Or the sheet that is appropriate
How to select all the rows that match the condition that a specific Column takes a specific value
This returns a dataframe object, that we call new_dataframe
new_dataframe = df[df["Column"] == 'Value']
new_dataframe is a Pandas dataframe and conserves the indexes from its parent dataframe.
How to delete a Column from a dataframe
df.drop('Column', axis = 1, inplace = True)
How to locate rows that have a null value for a specific Column
np.where(pd.isnull(df['Column']))
How to update the cell at a specific row_number in a specific Column to a new value
df.iloc[row_number][Column] = 'new value'
How to update the name of a Column to New_Column
df.rename(columns={'Column':'New_Column'}, inplace=True)
Groupby and count
grouped = df.groupby(['Column_1', 'Column_2']).count()
grouped.reset_index(inplace=True)