IntroductionPython is widely used and very popular for a variety of software engineering tasks such as website development, cloud-architecture, back-end etc. It is equally popular in data science world. In advanced analytics world, there has been several debates on R vs. Python. There are some areas such as number of libraries for statistical analysis, where R wins over Python but Python is catching up very fast.With popularity of big data and data science, Python has become first programming language of data scientists.
There are several reasons to learn Python. Some of them are as follows –
- Python runs well in automating various steps of a predictive model.
- Python has awesome robust libraries for machine learning, natural language processing, deep learning, big data and artificial Intelligence.
- Python wins over R when it comes to deploying machine learning models in production.
- It can be easily integrated with big data frameworks such as Spark and Hadoop.
- Python has a great online community support.
Do you know these sites are developed in Python?
- YouTube
- Dropbox
- Disqus
Python 2 vs. 3
Google yields thousands of articles on this topic. Some bloggers opposed and some in favor of 2.7. If you filter your search criteria and look for only recent articles, you would find Python 2 is no longer supported by the Python Software Foundation. Hence it does not make any sense to learn 2.7 if you start learning it today. Python 3 supports all the packages. Python 3 is cleaner and faster. It is a language for the future. It fixed major issues with versions of Python 2 series. Python 3 was first released in year 2008. It has been 12 years releasing robust versions of Python 3 series. You should go for latest version of Python 3.
How to install Python?
There are two ways to download and install Python
- Download Anaconda. It comes with Python software along with preinstalled popular libraries.
- Download Pythonfrom its official website. You have to manually install libraries.
Recommended : Go for first option and download anaconda. It saves a lot of time in learning and coding Python
Coding EnvironmentsAnaconda comes with two popular IDE :
- Jupyter (Ipython) Notebook
- Spyder
Spyder. It is like RStudio for Python. It gives an environment wherein writing python code is user-friendly. If you are a SAS User, you can think of it as SAS Enterprise Guide / SAS Studio. It comes with a syntax editor where you can write programs. It has a console to check each and every line of code. Under the ‘Variable explorer’, you can access your created data files and function. I highly recommend Spyder!

Jupyter (Ipython) Notebook Jupyter is equivalent to markdown in R. It is useful when you need to present your work to others or when you need to create step by step project report as it can combine code, output, words, and graphics.
Spyder Shortcut keysThe following is a list of some useful spyder shortcut keys which makes you more productive.
- Press F5 to run the entire script
- Press F9 to run selection or line
- Press Ctrl+ 1 to comment / uncomment
- Go to front of function and then press Ctrl + I to see documentation of the function
- Run %reset -f to clean workspace
- Ctrl + Left click on object to see source code
- Ctrl+Enter executes the current cell.
- Shift+Enter executes the current cell and advances the cursor to the next cell
List of arithmetic operators with examples
Arithmetic Operators | Operation | Example |
---|---|---|
+ | Addition | 10 + 2 = 12 |
– | Subtraction | 10 – 2 = 8 |
* | Multiplication | 10 * 2 = 20 |
/ | Division | 10 / 2 = 5.0 |
% | Modulus (Remainder) | 10 % 3 = 1 |
** | Power | 10 ** 2 = 100 |
// | Floor | 17 // 3 = 5 |
(x + (d-1)) // d | Ceiling | (17 +(3-1)) // 3 = 6 |
Basic programs in Python
Example 1
#Basics x = 10 y = 3 print("10 divided by 3 is", x/y) print("remainder after 10 divided by 3 is", x%y)
Result :
10 divided by 3 is 3.33
remainder after 10 divided by 3 is 1
Example 2
x = 100 x > 80 and x <=95 x > 35 or x < 60
x > 80 and x <=95 Out[45]: False
x > 35 or x < 60 Out[46]: True
Comparison, Logical and Assignment Operators
Comparison & Logical Operators | Description | Example |
---|---|---|
> | Greater than | 5 > 3 returns True |
< | Less than | 5 < 3 returns False |
>= | Greater than or equal to | 5 >= 3 returns True |
<= | Less than or equal to | 5 <= 3 return False |
== | Equal to | 5 == 3 returns False |
!= | Not equal to | 5 != 3 returns True |
and | Check both the conditions | x > 18 and x <=35 |
or | If atleast one condition hold True | x > 35 or x < 60 |
not | Opposite of Condition | not(x>7) |
Assignment Operators
It is used to assign a value to the declared variable. For e.g. x += 25 means x = x+25.
x = 100 y = 10 x += y print(x)
print(x) 110
In this case, x+=y implies x=x+y which is x = 100+ 10.
Similarly, you can use x-=y, x*=y and x /=y
Python Data Structures
In every programming language, it is important to understand the data structures. Following are some data structures used in Python.
1. ListIt is a sequence of multiple values. It allows us to store different types of data such as integer, float, string etc. See the examples of list below. First one is an integer list containing only integer. Second one is string list containing only string values. Third one is mixed list containing integer, string and float values.
- x = [1, 2, 3, 4, 5]
- y = [‘A’, ‘O’, ‘G’, ‘M’]
- z = [‘A’, 4, 5.1, ‘M’]
Get List ItemWe can extract list item using Indexes. Index starts from 0 and end with (number of elements-1).
x = [1, 2, 3, 4, 5] x[0] x[1] x[4] x[-1] x[-2]
x[0] Out[68]: 1 x[1] Out[69]: 2 x[4] Out[70]: 5 x[-1] Out[71]: 5 x[-2] Out[72]: 4
x[0] picks first element from list. Negative sign tells Python to search list item from right to left. x[-1] selects the last element from list.
You can select multiple elements from a list using the following method
x[:3] returns[1, 2, 3]
2. TupleA tuple is similar to a list in the sense that it is a sequence of elements. The difference between list and tuple are as follows –
- A tuple cannot be changed once constructed whereas list can be modified.
- A tuple is created by placing comma-separated values inside parentheses ( ). Whereas, list is created inside square brackets [ ]
Examples
K = (1,2,3)
State = (‘Delhi’,’Maharashtra’,’Karnataka’)
Perform for loop on Tuple
for i in State:
print(i)
Delhi Maharashtra Karnataka
Detailed Tutorial : Python Data Structures
FunctionsLike print(), you can create your own custom function. It is also called user-defined functions. It helps you in automating the repetitive task and calling reusable code in easier way.
Rules to define a function
- Function starts with def keyword followed by function name and ( )
- Function body starts with a colon (:) and is indented
- The keyword return ends a function andgive value of previous expression.
def sum_fun(a, b): result = a + b return result
z = sum_fun(10, 15)
Result : z = 25
Suppose you want python to assume 0 as default value if no value is specified for parameter b.
def sum_fun(a, b=0): result = a + b return result z = sum_fun(10)
In the above function, b is set to be 0 if no value is provided for parameter b. It does not mean no other value than 0 can be set here. It can also be used asz = sum_fun(10, 15)
Python Conditional StatementsConditional statements are commonly used in coding. It is IF ELSE statements. It can be read like : ” if a condition holds true, then execute something. Else execute something else”
Note : The if and else statements ends with a colon :
Example
k = 27 if k%5 == 0: print('Multiple of 5') else: print('Not a Multiple of 5')
Result :Not a Multiple of 5
List of popular packages (comparison with R)
Some of the leading packages in Python along with equivalent libraries in R are as follows-
- pandas. For data manipulation and data wrangling. A collections of functions to understand and explore data. It is counterpart of dplyr and reshape2 packages in R.
- NumPy. For numerical computing. It’s a package for efficient array computations. It allows us to do some operations on an entire column or table in one line. It is roughly approximate to Rcpp package in R which eliminates the limitation of slow speed in R. Numpy Tutorial
- Scipy.For mathematical and scientific functions such asintegration, interpolation, signal processing, linear algebra, statistics, etc. It is built on Numpy.
- Scikit-learn. A collection of machine learning algorithms. It is built on Numpy and Scipy. It can perform all the techniques that can be done in R usingglm, knn, randomForest, rpart, e1071 packages.
- Matplotlib.For data visualization. It’s a leading package for graphics in Python. It is equivalent to ggplot2 package in R.
- Statsmodels.For statistical and predictive modeling. It includes various functions to explore data and generate descriptive and predictive analytics. It allows users to run descriptive statistics, methods to impute missing values, statistical tests and take table output to HTML format.
- pandasql. It allows SQL users to write SQL queries in Python. It is very helpful for people who loves writing SQL queries to manipulate data. It is equivalent to sqldf package in R.
Maximum of the above packages are already preinstalled in Spyder.
Comparison of Python and R Packages by Data Mining Task
Task | Python Package | R Package |
---|---|---|
IDE | Rodeo / Spyder | Rstudio |
Data Manipulation | pandas | dplyr and reshape2 |
Machine Learning | Scikit-learn | glm, knn, randomForest, rpart, e1071 |
Data Visualization | ggplot + seaborn + bokeh | ggplot2 |
Character Functions | Built-In Functions | stringr |
Reproducibility | Jupyter | Knitr |
SQL Queries | pandasql | sqldf |
Working with Dates | datetime | lubridate |
Web Scraping | beautifulsoup | rvest |
Popular python commands
The commands below would help you to install and update new and existing packages. Let’s say, you want to install / uninstall pandas package.
Run these commands from IPython console window. Don’t forget to add ! before pip otherwise it would return syntax error.Install Package
!pip install pandas
Uninstall Package
!pip uninstall pandas
Show Information about Installed Package
!pip show pandas
List of Installed Packages
!pip list
Upgrade a package
!pip install –upgrade pandas –userHow to import a packageThere are multiple ways to import a package in Python. It is important to understand the difference between these styles.
1. import pandas as pd It imports the package pandas under the alias pd. A function DataFrame in package pandas is then submitted with pd.DataFrame.
2. import pandas
It imports the package without using alias but here the function DataFrame is submitted with full package name pandas.DataFrame
3. from pandas import *
It imports the whole package and the function DataFrame is executed simply by typing DataFrame. It sometimes creates confusion when same function name exists in more than one package.Pandas Data Structures – Series and DataFrameIn pandas package, there are two data structures – series and dataframe. These structures are explained below in detail -1. SeriesIt is a one-dimensional array. You can access individual elements of a series using position. It’s similar to vector in R. In the example below, we are generating 5 random values.
import pandas as pd import numpy as np s1 = pd.Series(np.random.randn(5)) s1
0 -2.412015 1 -0.451752 2 1.174207 3 0.766348 4 -0.361815 dtype: float64
Extract first and second valueYou can get a particular element of a series using index value. See the examples below –
s1[0]
-2.412015
s1[1]
-0.451752