I’m doing online courses to train myself to become a data scientist.
Here I want to post some general python commands that I usually use and often forget.
How to load a file:
Say that you have a database file.csv
with comma and line separated values. For example
A,1
B,2
C,3
in a csv file.
To import/load this into python you can do:
f = open("file.csv", 'r')
data = f.read()
rows = data.split("\n") #first separate by \n new lines.
new_list = []
for row in rows:
new_list.append(row.split(',')) #then separate each line by comma and append it to the new_list
Another way is using csv
. Say that you have data stored in a csv file, including a first row that has the names of the columns. You’d like to load that into a list that has only the data (not the header).
import csv
f = open("your_file.csv", 'r')
csvreader = csv.reader(f)
data_with_header = list(csvreader)
data_no_header = data_with_header[1:]
There’s more information in Python’s doc site
Slicing lists in Python
Python starts counting from 0. So the first element of the list
A = ['a','b','c','d','e']
,
is A[0] = 'a'
.
Now, the last element of A
is A[4] = 'e'
. If I wanted to use len
to get the last element, I could try A[len(A)]
, but len(A) = 5
and A[5]
does not exist. If you call A[len(A)]
you will get a IndexError: list index out of range
error. Instead, the correct way to obtain 'e'
would be A[len(A) - 1]
.
Now, to run through the elements of a list (or to slice a list), you can use list[start : end]
. But there’s a trick: Python does not include the last element so it will start at start
but it will stop one element before end
.
Hence, list[start : end]
will give as output the set {list[start], list[start + 1], list[start + 2], ... , list[end-1]}
. For example, if you wanted to print out the first three elements of A
, that is, {0, 1, 2}
, you should call print(A[0:3])
. A[0:3]
means A[0], A[1], A[2]
and it does not include A[3]
because Python does not take the last element of the range used in the slicing. So, if you call print(A[0:2])
you will get 'a', 'b'
, which is {A[0], A[1]}
.
For more syntax on slicing lists, check out this Stackoverflow question. There you’ll find this reference:
a[start:end] # items start through end-1
a[start:] # items start through the rest of the array
a[:end] # items from the beginning through end-1
a[:] # a copy of the whole array