Learn Python Programming in 3 hours-Part3

ubaid darwaish

This is in continuation of the series “Learn Python Programming in 3 hours”. In case you have not gone through the previous sections (Part-1, Part-2), please do so before you proceed.

So far we have discussed the following:-

Installing Anaconda
The basic Spyder interface.
Import data into Python
Check: no of rows, no of column and the structure of your data in Python
View the data that you imported in Python
Create computed columns and use basic numeric functions in Python
Sort the data in Python
Derive the value of column based on condition in Python
Apply basic character functions to your data in Python
Convert data types in Python
Deal with basic date columns in Python
Do mathematical calculations with your data in Python
Remove duplicates in your data in Python
Replace and find missing values in your data in Python
Remove or Keep columns from a dataset in Python

In this chapter, we will continue from where we left and by end of this chapter you should be able to do the following in Python:- [Read more…]

You should be able to do joins in Python (Left, Right, Inner, Outer)
You should be able to do the basic summary and aggregation of data in Python
You should be able to subset the data in Python
You should be able to create new datasets in Python
You should be able to create basic plots with data in Python
You should be able to export a dataset into a file using Python

So lets begin…

Join the tables

Now let us try to join the two tables aka data frames,names and salary and create a new table “fulldata”. We will be joining them by a common column “id”. We will be using an inner join. The function we use is from pandas called merge.
The syntax is as follows

pd.merge(dataframe1,dataframe2,on=’ColumnForJoin’,how=’WhichJoin’)

full=pd.merge(names,salary,on=’id’,how=’inner’)

Lets check number of rows we got and also view the top 5 rows.

full.head(1)

Similarly we can do the Left,Right or Outer joins by modifying the paramter for ‘how’.


Aggregation and Summary

Now that we got employee info and salary info in one table, lets do some basic aggregation. Lets try to do the basic summary calculation and collect some statistics on the columns of table “full”. This can be achieved by describe() function.:-

full.describe()

We can do some more aggregation, where we calculate the mean for column Salary and group it by Gender as follows:

full.groupby(‘Gender’)[‘Salary’].mean()

Here what I am telling Python is to calaculate the mean on column “Salary” of table ‘full’ and then group it by ‘Gender’

Subset the dataset and create a new dataset

Now that we have checked our summary, lets see how we can subset the data. We will create a data frame newdata_f that shall contain information only about Female employees by using subset function.

male=full[full.sex==’boy’]
male.head(1)

Here I am telling Python to create a data frame ‘male’ by subsetting data frame ‘full’ on column sex, where sex==”boy”. Then lets try to view top 5 rows of the newly created data frame using head function.

Scatter Plot

Lets try to create a simple scatter plot depicting the salary and bonus. For that we will use library matplotlib.pyplot

import matplotlib.pyplot as plt
x=full[[‘Salary’]]
y=full[[‘Bonus’]]
plt.scatter(x,y)

plt.title(‘Python Learnin-Sal vs Bonus’)
plt.xlabel(‘Sal’)
plt.ylabel(‘Bonus’)

Bar Plot

Now lets try one bar chart as well. We will plot mean of salary for both genders

plotdata=full.copy()
plot_data=plotdata.groupby(‘Gender’)[‘Salary’].mean()
plot_data.head(3)
plot_data.plot(kind=’barh’)

Here we are creating the copy of the “full” data frame into “plotdata”. Then we creating another data frame “plot_data”, with aggregated value from “plotdata”. Then we plot it.

Write a dataset to a file

Lets try to create a csv file “newdata.txt” from dataframe full

full.to_csv(“/home/dell/Python/newdata.csv”,index=False)

Please note, if you are using windows use \ instead of / for your respective path

You can open and view the file.

This marks the end of our journey of Learning basic Python programming for Data Science.

Contributed by: Ubaid Darwaish