Introduction to Python

Introduction to Python

Ok, let´s start coding with these new bits of info we already have.

x = 5

y = 4

print(x < y)

Result = FALSE

x = 5

y = 4

print(x > y)

Result = TRUE

x = 5

y = 4

print(x >= y and y == 4)

Result = TRUE

import random

x = random.randint (1, 5)

if x == 1:

    print ("I am number 1")

elif x == 2:

    print ("I am number 2")

else:

    print ("I am number", x)

if we get 1 -> I am number 1

if we get 2-> I am number 1

if we get another number -> I am number 1


For loop

import random
y = 0
for i in range (10):
	y += random.randint(1, 5)
	print (y)
print ("y = ", y)

Result:

5
7
12
16
17
18
21
23
26
29
y =  29

While

import random
while w != 5:
	w = random.randint (1, 100)
	print  (w)
print ("\nw = ", w)

Lists

Lists are like variables BUT they can store several ordered values

This is a LIST -> [5, 4, 24, 36, 9]

But we can also have a LIST with different elements, like numbers, float, strings…

The awesome feature of LISTS is that being an ORDERED structure. EACH element within the list has an INDEX (starting from 0), so we can call each element of the list with no mistake.

Take a look at the previous image; starting from the right, we get the FIRST index as 0; starting from the left, e have NEGATIVE indexes, in this case, we have -3.(starting with -1)

To create a LIST, we must put a name = [] -> being the [] what defines our list.

list_! = [] -> this is an empty list

list_2 = [2, 5, 9, 6, 56] #-> this is a FIVE elements list (always comma separated)

So, let´s say I want to access the third element of this list. How can I do that? Well, remember that a list has ORDERED elements with INDEXES starting from 0 (right) or -1 (left), so I can call it this way:

list = [4, 6, 78, 43, 56]

print (list[2]) #-> it will show 78 (index0, index1, index2 - third position)

Now, let´s say I want to access the last element ...we can use negative indexes (remember that negative indexes start with -1), so the last element would be
list = [4, 6, 78, 43, 56]

print (list[-1]) #-> it will return 56

We can also ADD (append) elements to a list.

Appends ADDS an element at the end of the list

list = [4, 6, 78, 43, 56]

list.append (21)

print (list)

[4, 6, 78, 43, 56, 21]

We can also REMOVE an element from the list, by VALUE (bear in mind that it will remove the first occurrence from the list)

list = [4, 6, 78, 43, 56]

list.append (21)
list.remove (6)
 #-> it will remove the VALUE 6
print (list)
[4, 78, 43, 56, 21]

Or we can remove a value by INDEX using POP

list = [4, 6, 78, 43, 56]
list.pop (4)
#-> it will remove index 4, fifth position; number 56

print (list)
[4, 6, 78, 43]

We can aslo EDIT or REPLACE elements within the list

list = [4, 6, 78, 43, 56]

list [0] = 456# replaces index 0, first element (4) with 456
print (list)
[456, 6, 78, 43, 56]

list = [4, 6, 78, 43, 56]


list [0] += 456# edit, adds 456 to first element
print (list)
[460, 6, 78, 43, 56]

How many elements has my list?

list = [4, 6, 78, 43, 56]

print (len(list))
# Result = 5

Show the value calling it by its index

list = [4, 6, 78, 43, 56]

print (list.index(78))

#Result = 2 -> index number 2 has the element/value 78

To check if a value is within a list

list = [4, 6, 78, 43, 56]

print (4 in list)

#Result = True

Sorting and reversing a list

list = [4, 6, 78, 43, 56]

list.sort() #shows list in ASCENDING order

print (list)

list.reverse()

print (list)

Result –> sort = [4, 6, 43, 56, 78]

Result-> reverse = [78, 56, 43, 6, 4]

Loop through a list

We use a FOR loop, setting a variable name to identify the elements within the list

list = [4, 6, 78, 43, 56]

For element in list:

    print (element) -> for each loop, the element variable will take every value within the list

Result =

4

6

78

43

56

I can also loop the list by its index numbers instead of by its value

list = [4, 6, 78, 43, 56]

for i in range (len(list)): -> I = variable to host index, len to get the length of list

    print (i)

Result =

0

1

2

3

4

I can also loop the list by its index numbers instead of by its value

list = [4, 6, 78, 43, 56]

for i in range (len(list)): -> I = variable to host index, len to get the length of list

    print (i)

Result =

0

1

2

3

4

Finally I can loop the list by its index numbers bringing their values

list = [4, 6, 78, 43, 56]

for i in range (len(list)):

    print (list [i])

Result =

4

6

78

43

56

Loop Brief

list = [4, 6, 78, 43, 56]

for element in list:

    print (element)# loop by value

print ("\n")#adds a newline, a separator

for i in range (len(list)):

    print (list [i])#loop by index

The two pieces of code return the same value

4

6

75

43

56

Strings

Strings can also be seen as Lists, they also have indexes

We can also count the elements of the string

string_test = "Hello World"
print (len(string_test))#we get the lenght of this string
Result = 11

We can also index a string like a list

string_test = "Hello World"
print (string_test[0])# I call the FIRST character of the string, index 0
Result = H

And we can loop the string

string_test = "Hello World"
for character in string_test:
    print (character)

We also have several functions to use with string

string_test = "Hello World"
print (string_test.lower())# convert text to lowercase
print ("\n")
print (string_test.upper())# convert text to uppercase
print ("\n")
print (string_test.capitalize())# capitalize text

hello world

HELLO WORLD

Hello world

We also have methods that return True or False

string_test = "Hello World"
print (string_test.startswith("Hi"))# result-> False
print ("\n")
print (string_test.startswith("He"))# result-> True
print ("\n")
print (string_test.endswith("wo"))# result-> False
print ("\n")
print (string_test.endswith("ld"))# result-> True
print ("\n")
print (string_test.isalpha())#return False because the space betweeno and w is not an alphabet character
print ("\n")
print (string_test.isdigit())# result-> False

Strip, to remove spces before and after text

string_test = " Hello World "#Note that we added a space before H and another after d

print (string_test.strip())Astrip will remove both spaces before and after text

Result
Hello World (with NO spaces before or after text

Split

Split will transform a string into a list, taking as separator the one we pass the function as parameter (usually a comma)

fruits = "apple, orange, banana, tangerine"# note that every fruit is comma separated
print (fruits.split(","))#split will return as a LIST, the COMMA is the separator

Result = ['apple', ' orange', ' banana', ' tangerine']

Join-> list to string

To use join the sintax is like this

“”.join, where “here goes the character I want to use to join”

fruits = "apple, orange, banana, tangerine"# note that every fruit is comma separated
list1 = fruits.split(",")#save the result to the variable list1

list2 = "-".join(list1)# the character "-" will join the list
print (list2)
Result = apple- orange- banana- tangerine

It goes from a list to a string joined by the “-” character

Note -> “handmade ” way to fill a List with characters from a string

Let´s say I have my string and want to “fill” a List with every character with a For loop. We can do it this way

fruits = "apple, orange, banana, tangerine"# note that every fruit is comma separated
print (fruits)
#prints the string fruits
print ("\n")

liststring = []
for c in fruits:
    liststring.append(c)
print (liststring)#prints the List liststring

Result = 
apple, orange, banana, tangerine


['a', 'p', 'p', 'l', 'e', ',', ' ', 'o', 'r', 'a', 'n', 'g', 'e', ',', ' ', 'b', 'a', 'n', 'a', 'n', 'a', ',', ' ', 't', 'a', 'n', 'g', 'e', 'r', 'i', 'n', 'e']

Ok, of course, that if you want to append the words (NOT the characters one-by-one) there is always an easy and right way to do it

fruits = "apple, orange, banana, tangerine"
lista = []
lista.append(fruits)
print (lista)

Result = ['apple, orange, banana, tangerine']

or if you want to convert a string to a list just put the string name within []

fruits = "apple, orange, banana, tangerine"
lista = [fruits]
print (lista)

Result = ['apple, orange, banana, tangerine']

Tuple-> immutable Lists; once defined we can not change it.

It is defined as a List, BUT with () instead of []

tupla1 = (1, 3, 5)

print (tupla1, "\n")

Result = (1, 3, 5) 

I can access its indexes/values like a List

tupla1 = (1, 3, 5)

print (tupla1 [0], "\n")

Result = 1

We can make some operations with the tuple.

tupla1 = (1, 3, 5)

print (tupla1[0] + tupla1[1])# we´ll get 4 (1 + 3)

print (tupla1)

tupla2 = tupla1 + (1, 3, 5, 8)
# we add tupla1 plus values 1, 3, 5 and 8
print (tupla2)

Result = 
4
(1, 3, 5)
(1, 3, 5, 1, 3, 5, 8)

BUT now if i want to edit it I can´t do it, because it is an immutable List

var = (3, 4 , 8)
var[0]=8
print (var)
Result = 
Traceback (most recent call last):
     var[0]=8
TypeError: 'tuple' object does not support item assignment

I can also convert a List to a Tuple, using the reserved word TUPLE

Take a look at the example, we also show the TYPE of the variables to check they are a List or a Tuple

var = [3, 4 , 8]
print("var = " ,type(var))
tupla = tuple(var)
print (var)

print("\n")
print("tupla = " ,type(tupla))
print(tupla)

Result = 
var =  <class 'list'>
[3, 4, 8]

tupla =  <class 'tuple'>
(3, 4, 8)

TIP: Tuples are Faster than Lists

SETS

Sets DO NOT keep an order (they have no index), the go between curly braces {}

They don´t allow duplicated elements, so we can tell that a SET is a List of UNIQUE and UNORDERED elements

As any Set, we can perform some operations with them like union, intersection, difference and symmetric difference.

we have two ways to create a set; by assigning a variable to a {} with comma-separated elements within

set_1 = {1, 4, 6, 8,55}

or take a List and convert it to a set with the reserved keyword SET

list_1 = [5, 4, 76, 22]
set_2 = set(list_1)

We can add elements, but with the ADD function (APPEND is for Lists)

set_1.add(10)

Now let´s see what happens when we want to add a duplicated element

set_1 = {1, 4, 6, 8, 22}
print (set_1)

Result = {1, 4, 6, 8, 22}
#now we add a duplicated element
set_1.add(22)
print (set_1)

Result = {1, 4, 6, 8, 22}

as we can see, we have NO error but NO change, the SET just ignores the duplicated element and returns the same set with no change

We can remove an element

set_1.remove(22)
print (set_1)

Result = {1, 4, 6, 8}

We can also loop the set

set_1 = {1, 4, 6, 8}
for elem in set_1:
    print (elem)

Result = 
1
4
6
8

Ok, let´s check if a given element is within the set with the IN keyword

set_1 = {1, 4, 6, 8,22}
print (5 in set_1)

Result = False -> there is NO 5 in our set

set_1 = {1, 4, 6, 8,22}
print (6 in set_1)

Result = True -> yes, we have a 6

Tip: searching within a Set is fastest than doing it within a List

Let´s test some operations with sets

Union

set_1 = {1, 4, 6, 8,22}
set_2 = {11, 2, 6, 8, 9}

print (set_1.union(set_2))

Result = {1, 2, 4, 6, 8, 9, 11, 22}
 -> note that we had duplicated 6 and 8 and the final set did NOT duplicate them, just kept one element

Intersection

set_1 = {1, 4, 6, 8,22}
set_2 = {11, 2, 6, 8, 9}
print (set_1.intersection(set_2))

Result = 
{8, 6}

Dictionaries

Dictionaries DO NOT store individual values but key-values pairs

dic1 = {"name": "Freelancer", "age": 45, "profession": "developer"}
print (dic1)
Result = {'name': 'Freelamcer', 'age': 45, 'profession': 'developer'}

Note the example; we have 3 key-values elements (comma-separated) with the first key-element “name” and value 19, and so…

we can think of dictionaries as unordered lists (we have no indexes) with no content restriction (I can “mix” numbers and letters)

Either way we can access any element within the dictionary because they have a key to every element.

So the keys would be our indexes -> I get the values through the keys

A good practice is to write dictionaries this way (pretty much like Json code) to make them clear

dic1 = {"name": "Freelancer",
        "age": 45,
        "profession": "developer"
        }
print (dic1["age"])#we call the value (45) by its key (age)

Result = 45

Adding values

dic1 = {"name": "Freelancer",
        "age": 45,
        "profession": "developer"
        }
dic1["country"] = "Argentina"# add a new key-value pair
print (dic1)

Result = {'name': 'Freelancer', 'age': 45, 'profession': 'developer', 'country': 'Argentina'}

Deleting and editing values

dic1 = {"name": "Freelancer",
        "age": 45,
        "profession": "developer"
        }
dic1["country"] = "Argentina"# add a new key-value pair
del dic1["age"]# delete a pair
dic1["name"] = "Web Developer"#edit a value
print (dic1)

Result = {'name': 'Web Developer', 'profession': 'developer', 'country': 'Argentina'}

Looping the dictionary

Ok, we´ll use for as usual, BUT now we have to go though 2 values, not just one (for element in….working no more), so how can we do that?

for key_variable_name, value_variable_name in list (dictionary_name.items()):

dic1 = {"name": "Freelancer",
        "age": 45,
        "profession": "developer"
        }
for k, v in list(dic1.items()):
    print (k, v)

Result = 
name Freelancer
age 45
profession developer

Nested structures

dic1 = {"name": ["This is a list within a dictionary"],
        #and now another dictionary within the dic1
        "details": {
            "age": 45,
            "profession": "developer",
            "country": "Norway"
        }
    }
for k, v in list(dic1.items()):
    print (k, v)
Result = 
name ['This is a list within a dictionary']
details {'age': 45, 'profession': 'developer', 'country': 'Norway'}

But, how do I access the age? I should “index” from the outer structure to inner one

dic1 = {"name": ["This is a list within a dictionary"],
        #and now another dictionary within the dic1
        "details": {
            "age": 45,
            "profession": "developer",
            "country": "Norway"
        }
    }
print (dic1["details"]["age"])# note the double "indexation"
#[details][age] to finally reach the value

Result = 45

Another example

dic1 = {"name": ["This is a list within a dictionary", 89, 65, 77],
        #and now another dictionary within the dic1
        "details": {
            "age": 45,
            "profession": "developer",
            "country": "Norway"
        }
    }
print (dic1["name"][1])# note the double "indexation"
#[name][1] to finally reach the value

Result = 89

Tip: accesing a value within a dictionary is faster than a list


Functions

A function is a block of code with a anme associated to it, which only runs when it is called.

You can pass data, known as parameters, into a function.

A function can return data or execute a task as a result.

Bear in mind that everything used so far are functions (print, len, int, range, input and so). Here a list of built-in Python functions

But we can also create OUR OWN functions. Remember that we can call a function several ways, that´s one of its main goals; doing a repetitive task several times without the need to coding all again and again, just call the function wiy¿th the new parameters and we are done.

Let´s say I want to get the summation of all numbers within the list, and that I am gonna use this 5 times in my program. I can write 5 times the code to do the same task, or I can create a function and call it several times without the need to code all from scratch over and over again.

We start by creating our new function with the reserved word “def” followed by the “name_of_this_function” and “()“. We finish the command with “:”

Inside the () we must pass the “parameters” for the function to work with. The parameters are all the information that the function needs to operate. Usually we{ll work with some kind of abstraction, meaning, let´s say I want to create the function to add all the elements of a list, so my main parameter will we a List, then I should create a function like this

def myfunct (list_1):

Where the (list_1) parameter is not yet created, BUT we need a list to work with, the function needs a parameter

Ok, once created the function, and after the “:” note that the next line will be indented, we should define the function as it, this menas, we created it with “def”, now let´s define what the function should do.

Ok, we wanted to sum up all the elements of the list, so it would be nice to initialize a variable (x = 0 ) to keep tracking of the summation of elements from the list

And then we should do a for loop to go through all the elements in the list, as usual, AND sum them.

 x = 0
    for elem in list_1:
        x += elem

Now we have the sum of tle elements in the list stored within the “x” variable, so our function worked so far, but the function not only receives parameters, but it also returns a result, so how can we get that result? Of course, with the reserved keyword “return”. And, what should we return? well, in this case the “x” variable containing the sum of elements

Note that so far it is all an abstraction, nothing happens yet, if I do run this code nothing seems to happen

def myfunct (list_1):
    x = 0
    for elem in list_1:
        x += elem
    return x
Result = 
>>> %Run functions.py

To use this function, we need to “call” it. BUT to call it we have to satisfy the parameters, meaning, I have to “pass the parameters” to the function, so it can execute the abstraction and returns a value

How do we call a function? Well, by its name -> myfunct

BUT we must save the results sent to myfunt in a variable so we can print it, AND we must pass the list (between []) to the function.

summation = myfunct ([1, 2, 3, 4, 5, 6, 7, 8, 9])
print (summation)

Remember that we did this?

def myfunct (list_1):

Well, this one

myfunct ([1, 2, 3, 4, 5, 6, 7, 8, 9])

“fills” (list_1) with the elements that are used by the function. It sounds a bit weird, I know, is like starting from the end, but that´s how it works. It happened why when we “call” the function with the parameters, then it is executed the “abstraction” of the function, not before, this is not line-by-line from 1 to 10, but it goes from start to the end, fetch the parameters and then executes the code and show the results.

Let´s see and check the full code


def myfunct (list_1):
    x = 0
    for elem in list_1:
        x += elem
    return x

summation = myfunct ([1, 2, 3, 4, 5, 6, 7, 8, 9])
print (summation)

Result = 45

So, when I call the function by its name (myfunct) it is the moment the function runs using the values that I pass as a List (note that we assign the function to a variable “summation”so we can later print it)

So, the parameters are the communication between the program and the function, and the return is the communication between the function and the program.

We must pay attention to variables when defining functions; local and global ones.

In our recent example we defined the variable as “local” to the function, meanind that if we call the variable “out” of it, we´ll get an error . Let´s check

def myfunct (list_1):
    x = 0
    for elem in list_1:
        x += elem
    return x

print (x)-> calling a local variable out of the function
summation = myfunct ([1, 2, 3, 4, 5, 6, 7, 8, 9])
print (summation)

Result = 
print (x)
NameError: name 'x' is not defined

We get this error because the “x” variable is defined WITHIN the function. The same thing will occur if I define a variable out of the function…my function will not recognize it because it is “out” of the function environment -> myfunct do not know the variable “summation”

Then, we´ll repeat thi “The communication between the function and the program is through to the parameters, this info (parameters) I give to the function are used to get a result, how can I get those results? with a return”

I can also use default parameters, this is, if the code do not pass me parameters, the function uses its own definde default parameters


Archives

To operate with files we have some “modes”

r : Read mode

a = Add

w = Write from 0 (start a file)

So, to open a file to Read, we have to get a variable where to assign the file we want to open/read/modify “=” open (“filename to operate with” , “mode to operate”)

So if I have a file named filex.txt contaning:

Peech,1 ,1
orange,3, 3
Apple,10, 5

I can READ it like this, and obtain a List from its elements

filex = open ("file1.txt", "r")
print ( filex.readlines())
Result = ['Peech,1 ,1\n', 'orange,3, 3\n', 'Apple,10, 5\n']

Note that we have every file with a /n after them. Why?

Because they are special characters; a/n means a Newline (Enter) and our filex.txt has the lines and 3 newlines/Enters

To fix this we must iterate (for loop) the fil instead of readint it as a whole

for variable name to iterate every line in archive_to_read

Then I have to “clean” every line (the /n in this case) when reading it with strip function

filex = open ("file1.txt", "r")
for line in filex:
    line = line.strip()
    print (line)

Result = 
Peech,1 ,1
orange,3, 3
Apple,10, 5

We could iterate the filex.txt, get every line WITHOUT the /n, and we printed it

Now let´s see how to get, maybe the 5 from the Apple

To get that done, we must find a pattern; this time we can see that tehre are commas “,”, splitting every fruit from its quantity and proce. We also know that we have a function to convert a strinh to a list (split)

filex = open ("file1.txt", "r")
for line in filex:
    line = line.strip()
    line = line.split(",")
 -> this is the split function
    print (line)

Result = 
['Peech', '1 ', '1']
['orange', '3', ' 3']
['Apple', '10', ' 5']

Now, we have a file structure that can be indexed, so I can easily retrieve the element I need.

Let´s see an exampl, let´s index [0] to grab the fruit names

filex = open ("file1.txt", "r")
for line in filex:
    line = line.strip()
    line = line.split(",")
    print (line[0]) -> indexing the first positio (0) to get the fruit name

Result = 
Peech
orange
Apple

Remember that I get STRINGS when operating with files, so if we get the index [1]

filex = open ("file1.txt", "r")
#print ( filex.readlines())

for line in filex:
    line = line.strip()
    line = line.split(",")
    print (line[2])

Result = 
 1
 3
 5

Those “numbers” are TEXT, so to operate with them I must first transform them with INT

filex = open ("file1.txt", "r")
for line in filex:
    line = line.strip()
    line = line.split(",")
    print (int (line [1]) * 5) -> here we convert text to integer and multiply 

Result = 
5
15
50

Adding info to a file

Remember to open the file in “a-mode”, no more r-mode

filex = open ("file1.txt", "a")

filex.write ("Grapes ,7, 8")

Result (in our txt file) =
Peech,1, 1
orange,3, 3
Apple,10, 5
Grapes ,7, 8 -> it was added to the end

BUT if I add another file, we could get this

filex = open ("file1.txt", "a")

filex.write ("Grapes ,7, 8")
filex.write ("Berries, 2, 4")
    
Result =
Peech,1, 1
orange,3, 3
Apple,10, 5
Grapes ,7, 8Berries ,2, 4 -> what happened? it is with no newline!

As we can see, we need to add a newline (Enter) so every new line is added as a new sentence, like this

filex = open ("file1.txt", "a")

filex.write ("Grapes ,7, 8\n")
filex.write ("Berries, 2, 4\n")

Result =
Peech,1, 1
orange,3, 3
Apple,10, 5
Grapes ,7, 8
Berries ,2, 4

Creating a new archive (w)

Bear in mind that with “W” we create a new file, if it exists, it will be overwritten.

filex = open ("file3.txt", "w")
filex.write("This is a New File created with W- mode")
filex.close()

Result (within the newly created file3.txt)
This is a New File created with W- mode

Error Handling

A complete list of errors here

We need to avoid our program to crash when finding an error, so we need to handle those erros in an elegant way. (Try-Except)

Let´s see an example that will give an error:

text1 = "Hi there"
x = int (text1)

print ("if all ok I get here")

Resust = 
x = int (text1)
ValueError: invalid literal for int() with base 10: 'Hi there'

So, to avoid the rpogram to crash, we use Try-except to handle the error and keep moving

When I have some “dangerous piece of code that could crash my program, I should put it within a Try so it can handle en error

text1 = "Hi there"
try:
 #-> the dangerous code here
    x = int (text1)
 
except:
#here the code that handles the exception, in our case a print
    print ("Something went wrong")
# -> the message to show when an error occurs
    
print ("\nif all ok I get here")

Result = 
Something went wrong

if all ok I get here

“Now, if I want to see what kind of error raises the code, I must add Exception as variable_to_hold_error to except

text1 = "Hi there"
try:
    x = int (text1)
except Exception as err:
#Print our custom message "Something went wrong" + the Python error message
    print ("Something went wrong", err)
    
print ("\nif all ok I get here")

Result = 
Something went wrong invalid literal for int() with base 10: 'Hi there'

if all ok I get here

Classes and Objects

Here a full doc

To understand the difference between Classes and Objets we´ll get a real nice example; let´s suppose you are an architect building some houses.

In order to build them, you FIRST need drawings/designs regarding how the house will look, with its pipes, windows, floors, and more.

ONCE you have those plans, THEN you build the house or houses.

Then:

Plans/drawings -> CLASSES

Houses/Buildings -> OBJECTS

Classes are just concepts (abstractions) in where we initialize/create the Objects
Furthermore, to every class-plans, I can define multiple PROPERTIES (color, power consumption, and so) that could change to every object-house.
Finally, I can add some METHODS to the classes defined, so my Objects can perform or be able to do some action (maybe ring a bell in a house)

So we end with this:

Plans/drawings -> CLASSES

Houses/Buildings -> OBJECTS

Attributes/Features -> PROPERTIES

Actions -> METHODS (functions that work WITHIN the Class, and can only be used but the Objects of the class)

Ok, enough theory let´s create a class

We have a reserverd word for Classes called…you guessed: class. Then the name of our abstraction/class followed by “:”

After that we need to use a “constructor; “__init__” is a reseved method in python classes. This method is called when an object is created from a class and it allows the class to initialize the attributes of the class

“The constructor allows us to add the properties we wish for our “house” (object)

Remember that a Method is just a function that work ONLY within the class for the objects created for that class, so you guessed again, an __init__ method is defined with….def.

def __init__ (self, parameters) -> remember that we always get the “self parameter”

def __init__ (self, color)

Then, to define the properties of my Object (house), I have to use self for every one of them. Self works as a dictionaty (pair key-value) BUT using this sintaxis; self.property = value (self.color = color) -> that color variable is this one def __init__ (self, color)

So far we have:to build a house, assign it a color, we have 0 water and electricity consumption, so the code is

class house:
    def __init__(self, color):
        self.color = color
        self.electricity = 0
        self.water = 0

Then, EVERY house I build, will have 3 properties; color, electricity and water consumption.

These “self” are very important because we can use them to access these properties from within another methods

Now we could define the methods; what can I do with a house? Maybe we could paint it, so let´s define a paint method

Remember that every method within a class MUST have the self as first parameter

def paint (self, color):
        self.color = color

Now we can define another methods like…using the lights,

def lights_on (self):
        self.electricity += 10

the water

 def use_water(self):
        self.water += 7

and the doorbell

def ring_doorbell(self):
        print ("rinnnnnnnnnggggggg!!!!!!")
        self.electricity += 1

Our code so far would look like this

class house:
    def __init__(self, color):
        self.color = color
        self.electricity = 0
        self.water = 0
        
    def paint (self, color):
        self.color = color
        
    
    def lights_on (self):
        self.electricity += 10
        
    def use_water(self):
        self.water += 7
        
    def ring_doorbell(self):
        print ("rinnnnnnnnnggggggg!!!!!!")
        self.electricity += 1

Bear in mind that if we run this code NOTHING happens (or shoe) because we just DEFINED the Class, we only have the plans, we did not put a single brick of our house yet

Let´s begin to create our objects

First I choose a variable to host the object, the class name and the parameters to “build the house” in this case “color” that we created before in the constructor (__init__). Remember that in our example we defined only one parameter; color (electricity and water are initialized to zero, not passed as parameter)

So we “call” the constructor method with the parameters defined in the method

my_house = house ("red")
 -> we call the constructor method to build the house with a red color

So this piece of code will build a RED house with ZERO water and electricity consumption

class house:
    def __init__(self, color):
        self.color = color
        self.electricity = 0
        self.water = 0

Now that the object is created, how do I access it? How can I see it?

It is as easy as invoking the variable name where we host the object DOT the property name

print (my_house.color)
print (my_house.electricity)
print (my_house.water)

Result = 
red
0
0

Now I can start calling the METHODS we created for our Object, so we´ll try with the doorbell

my_house.ring_doorbell()
print (my_house.electricity)

Result =
rinnnnnnnnnggggggg!!!!!!
1 -> note that before we had 0 consumption, but now we have 1 because we called the method my_house.ring_doorbell()
 and it performed the action +=1

Now I can paint again my house calling the paint methos BUT passing a different parameter-color

my_house.paint ("green")
print (my_house.color)

Result = green

The full code so far with comments so it is easy to understand:

class house:
    #I create the class "house"
    def __init__(self, color): #I invoke the constructor with only one parameter; color
        self.color = color #property color
        self.electricity = 0 #property electricity
        self.water = 0 #property water
    #so we have or class-plans created with the properties color, electricity and water
     
    #Now we start creating methods 
    def paint (self, color):#method paint
        self.color = color
            
    def lights_on (self):#method lights_on taht will add 10 to the 0 initial state
        self.electricity += 10
        
    def use_water(self):#method use_water taht will add 7 to the 0 initial state
        self.water += 7
        
    def ring_doorbell(self):#method ring_doorbell that will print a mesaage and add 1 to the electricity state
        print ("rinnnnnnnnnggggggg!!!!!!")
        self.electricity += 1
#so fr we have only created the plans, all abstraction        

#now let´s create the object house
#I create a variable (my_house) who will host the object house with a
#color parameter (red)        
my_house = house ("red")

print (my_house.color)#shows the house color
print (my_house.electricity)#show electricity use
print (my_house.water)#shows water use

my_house.ring_doorbell()#we call the methos ring_doorbell
print (my_house.electricity)#we show the electricity use, notice now is 1 and not 0

my_house.paint ("green")#we call the paint methos and our house change from red to green
print (my_house.color)

Result = 
red
 
0
0
rinnnnnnnnnggggggg!!!!!!
1
green

Brief: the Class is the abstraction, so I began to build it when I call the class and pass it parameters so the constructor method can build the object.

But, what if I want to use the rpevious class-plans to build another thing, not a house but a mansion maybe?

Of course, I could write the class-plans from scratch, but it is wiser to get the useful things from our class and use them, this is called inheritance -> is the mechanism of deriving new classes from existing ones

The new (child) class will have the same properties and method than the father class

And how do we get this done? Just by creatinbg a new class and, between brackets, the name of the father -class

class mansion (house):

So our mansion class will have the same properties and methods as “house”, but the reason to do this is to change some methods as we need them to perfom another actions.

Let´s say that I want to change poweer consumption , so my mansion will spend more than 10 of electricity

class mansion (house):
    def lights_on (self):
        self.electricity += 38

Note that we did not use a constructor (__init__) because it is inherited from the father (house)

We can also modify the other methods too

class mansion (house):
    def lights_on (self):
        self.electricity += 38
        
    def use_water (self):
        self.use_water += 19
    
    def ring_doorble (self):
        print ("Ding-Dong!!")
        self.ring_doorbel += 3

So we changed 3 methods,

Now to create the objet, let´s assign it to a variable as we did before, the name of the class and between brackets the parameter to the constructor (inherited from the father)

my_mansion = mansion ("white")  

So the constructor “def init(self, color):” will use the white color to build the mansion.

my_mansion = mansion ("white")
print (my_mansion.color)

Result = white

Now let´s call the other methods to check that we are using the inherited methods defined within mansion class

class mansion (house):
    def lights_on (self):
        self.electricity += 38
        
    def use_water (self):
        self.use_water += 19
    
    def ring_doorbell (self):
        print ("Ding-Dong!!")
        self.electricity += 3
        
my_mansion = mansion ("white")
print (my_mansion.color)#shows mansion color
print (my_mansion.electricity)#shows mansion electricity use
print (my_mansion.water)#shows mansion water use

my_mansion.ring_doorbell()#we call the methos ring_doorbell
print (my_mansion.electricity)

my_mansion.paint ("gold")#we call the paint method FROM the parent
print (my_mansion.color)

Result = 
red
0
0
rinnnnnnnnnggggggg!!!!!!
1
green


white
0
0
Ding-Dong!!
3
gold

Intro to Web Scrapping

Web scraping, is the process of retrieving or “scraping” data from a website. automatically, not manually. Web scraping uses intelligent automation to retrieve millions or even billions of data points from the internet’s websites..

If there is no API to download data from a site, I can use web scrapping

How does a website work?

To perform web scrapping, we need to know exactly how a website works, so take a look at this awesome intro to HTML here

We need to understand some TAGS and its structure, meaning, who is the parent tag and its children.

Sol et´s say we have <div> </div> , so this parent tag will have children…whre? Just inside them, so ANYTHING within <div> </div> will be children of that DIV tag.

<body>
    <div>
        <p>
            Mi message here
        </p>
    </div>
</body>

Now let´s take a look at this HTML

<body>
    <div>
        <p>
            Mi mesaage here
        </p>
    </div>
	<span>
			I am here
	</span>
</body>

Why are they called siblings? Because the TAGS <div> and <span> are at the same level, they are childs from <body>, but siblings to each other

In order to make easy to identify a TAG, we can put them something called ATTRIBUTES

These attributes can be Class or ID and are formed by a NAME and a VALUE

 <div class="main container">
        <p>
            Mi mesaage here
        </p>
    </div>

Now we can easily identify this <div> tag because it has a class named main container

To get a deep insight regarding HTML classes and ID attributes, please check this class tutorial and this ID one.

Client-Server Architecture 

Take a look at this info to get some intro to it

URLs

When we type a site URL, like https://www.google.com/ it is an easy-to-read-URL, but if we do a search in Google we´ll get something like this

Why all that stuff? Because thr URL can be used to pass info regarding, in this example, our search.

Let´s analyze this

https -> protocol

www.google.com -> domain

/search -> Endpoint (identifies the action the server will perform, this time a SEARCH. We can concatenate several endpoints like search/users)

Then we have the parameters; this starts with “?”, so everything AFTER a ? are the parameters that will use the server to answer our request. Keep in mind that the parameters are a pair name-value separated by the “=” sign

We can have several parameters split by the “&” symbol

So in the example the variable “q”= web scrapping, sourceid = chrome and ie = UTF8.

All this info is received by the server and used to serveour request

Types of Web Scrapping and tools used to perform them

1 – Static scrapping (one-page) : when ALL info is in just one page and it does not load dynamic info.

Tools used: requests (to “ask” for data), Beautiful Soup to parse the XML and HTML we get, and Scrapy that gets done the two functions (request and parse)

: 2 – Static scrapping (several pages, same Domain) also called Horizontal scrolling (pagination )and Vertical scrolling. (product details).

Tools used: Scrapy

3 – Dynamic web scrapping: we´ll use some automation to fill data, to scroll and to wait for the page to load contents before scrapping what we need.

Tools used: Selenium

4 – APIs web scrapping:

Steps to web scrapping

1 – Define a Root or Seed URL, the main one from where to START the data extraction, maybe not the one to extract data, but the one from where we´ll start “travelling” to find the info

2 – Make a REQUEST to this URL

3 – Get the response from the previous Request (it will be HTML format)

4 – Parse the info to obtain what I am searching

5 – Repeat from step 2 with other URL within the same Domain.(may be obtained from the HTML response)

XPATH

To obtain the required info from the HTML response, we´ll need XPATH

XPATH is a language that allows us to build expressions to extract info from XML or HTML data. I can search and extract exactly what I need from all the giberish we´ll get from our requests. We can search within the DOM elements in a number of ways.

Take a look at this awesome tool to learn how to use XPATH

XPATHER

Now, we must understand how XML works, it is made of a structure of LEVELS, being these levels the nodes (HTML tags) and these nodes have sub-levels, or nested-levels called “child nodes”

Take a look at this piece of code; <body> is the ROOT level, and the Childs are: <h1>, <h1> , <div>, and <div>, but the 1st <div> tag has another child -> <p> and the 2nd <div> another child <span>

<body>
    <h1>Main title</h1>
    <h1>Another main title</h1>
    <div class="main container">
        <p>
            Mi mesaage here
        </p>
    </div>
	<span>
			I am here
	</span>
</body>

Now we can define our search axis to start a search, these axis are some parameters to filter the tags we are looking for.

If I use // (double slash) it will search within ALL levels of the document

If I do a single slash (/) it will only search within the root of the document

Note if found nothing, because <p> is a child of a child, it is not the root. This document has only one tag as root, and it is <body>

Ok, after defining the search prefix (//, / or ./) we must add the node we are searching, this is called a “step”. I can also define attributes to narrow the search even more.

This is done by adding [@ =] after the search prefix. Let´s say i want to find the <h1> tag with id title

Here an awesome intro to Xpath

TIP

We can run a “live” xpath request by opening web browser dev tools (usually F12 or right-click -> inspect)

, then go to “console” tab and run this code

$x("path expression")

Let´s see an example by requesting all <div> from root (//)

$x("//div")

Web Scrapping

Remember that in order to get data from a website we need TWO separate procedures;

1 – REQUEST the page/server

2 – PARSE the data we received

We´ll use some Python libraries to do this.

To extract info from one-static-page we´ll use 4 different libraries:

Requests -> to obtain the HTML

LXML and beautifulsoup4 to parse the received info

Scrapy to perform the two operations; request and parse

to install a library just open a CMD -> command prompt in Windows or a terminal if Linux and run

pip install library-name

or pip3 install library-name

or sudo pip install library-name (Linux)

or pip install library-name –user (windows)

We´ll also install (for dynamic sites)

Selenium

Pillow (to extract images)

Pymongo (to store data in DB)

In case you need Twisted to make scrapy work with windows, use this link

Requests full doc

LXML full doc

Pip full doc


Scrapping Wikipedia

Goal: Extract the names that Wikipedia shows in its main page

Tools:

  • Requests to get the HTML from the server and
  • LXML to parse the tree and to get the desired info

Just to refresh, we need TWO steps -> requests the data and parse it to get the exact info.

Bear in mind that when I do a request, it also brings the headers. One of the most useful is “user-agent” that returns the browser from which the request is being called and the operating system. If I DON´T define this user-agent, by default will be ROBOT, so our attempt may be seen as an attack, an automatic web-scrapping and it will be blocked.

So we need to overwrite that default “user-agent” variable.

To do this, BEFORE setting a request I must create an object to host the new values

new_header = {
    "user-agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/80.0.3987.149 Safari/537.36"
}

Now we can print the result with the “text” property

So far we have this working code

"""
Goal: Extract the names that Wikipedia shows in its main page

Tools:
Requests to get the HTML from the server and 
LXML to parse the tree and to get the desired info
"""

import requests

"""
change the user-agent to avoid being blocked
"""

new_header = {
    "user-agent" : "Mozilla/5.0 Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko Chrome/80.0.3987.149 Safari/537.36"
}

"""
define my seed URL, in this case the wikipedia URL
"""
seed_url = "https://www.wikipedia.org/"

"""
now I make the request
"""

request_result = requests.get(seed_url, headers=new_header)

"""
now we can print the request_result
-> [:200] cuts the text to only 200 characters
"""
print (request_result.text[:200])

with this result (cropped to 200 characters)

<!DOCTYPE html>
<html lang="mul" class="no-js">
<head>
<meta charset="utf-8">
<title>Wikipedia</title>
<meta name="description" content="Wikipedia is a free online encyclopedia, created and edited by 

Now it´s time to use LXML to parse this data into more useful one.

Let´s import lxml -> from lxml import html

Now we´ll create a variable to call the parser

parser = html.fromstring(request_result)

now with this parser we have several useful methods to search within the HTML tree. BUT to extract data we need to check the HTML structure to find out

Ok, let´s say we want to extract the ENGLISH text from Wikipedia main page

So we right-click on the element we want to check and then we select “inspect”

And we´ll get this info

This “brings” the HTML tree so we can easily find the element we want to parse

As we can see, the text is within a <a> tag with an ID “js-link-box-en”. Remember that an ID is UNIQUE so we can reach this text within this tag…

Back to our script remember that now that we parsed the text, now we have many methods to use, so let´s try get_element_by_id

parser.get_element_by_id(“js-link-box-en”) #it receives as parameter the element ID we want to show

Now we assign this to a variable

Ingles = parser.get_element_by_id(“js-link-box-en”) #it receives as parameter the element ID we want to show

And now we print it

Print (ingles)

so we have this last piece of code

parser = html.fromstring(request_result.text)

english = parser.get_element_by_id("js-link-box-en") #it receives as parameter the element ID we want to show

print (english)

that brings…

<Element a at 0x31ecae0>

What is that?????? no worries, it is just a CLASS, so we need to call the content of it like this

print (english.text_content())

and now it works

English
6 203 000+ articles

Ok, we made this using XML, but we can also use XPath, remember? Let´s see how to do it

parser.xpath("expression")

and…what is that expression that will lead me to the element?

Back to the inspect page we see that we have an <a> element with an ID and the text is within a chid tag (<strong>)

So, the Xpath expression would be

"//a[@id='js-link-box-en']/strong/text()

And the piece of code to call the element …

english = parser.xpath("//a[@id='js-link-box-en']/strong/text()")
print (english)

and it works, returning

['English']

Ok, now let´s focus on our goal; retrieve ALL languages from home page, so we need to create a XPath expression to do that

We need to find a pattern, something that wraps all languages. Remember that an ID is unique, but a CLASS is for groups, meaning, maybe we could fing a Class that contain our languages.

As we can see, it all happens within <div> tags and every language has a CLASS (class=”central-featured-lang) finishing with lang1, lang2….lang n. So when calling our Xpath expression we must use “contains”

And within that <div> tag they also have <a> and <strong> tags

languages = parser.xpath("//div[contains(@class, 'central-featured-lang')]//strong/text()")
print (languages)

and the result

['English', 'Español', 'æ\x97¥æ\x9c¬èª\x9e', 'Deutsch', 'Ð\xa0Ñ\x83Ñ\x81Ñ\x81кий', 'Français', 'Italiano', 'ä¸\xadæ\x96\x87', 'Português', 'Polski']

It works! And we receive the result as a List, but we can easily iterate it

for language in languages:
    print (language)

result

English
Español
日本語
Deutsch
Русский
Français
Italiano
中文
Português
Polski

Well, now let´s try to do it with another XML way -> find_class

languages = parser.find_class('central-featured-lang')
for language in languages:
    print(language.text_content())

result

English
6 203 000+ articles

Español
1 645 000+ artículos

日本語
1 242 000+ 記事

Deutsch
2 508 000+ Artikel

Русский
1 681 000+ статей

Français
2 275 000+ articles

Italiano
1 656 000+ voci

中文
1 161 000+ 條目

Português
1 048 000+ artigos

Polski
1 442 000+ haseł

Note:

Remember that when working with CLASSES we have this topic to watch out

class=”central-featured-lang lang1″

the space within a class indicates that there is ANOTHER class, so in the example, we have TWO classes (this allows us to style better)

class=”central-featured-lang lang1″

and

lang1

That´s why for XPath the class is ALL the lenght and for XML we must split the first class

Stack Overflow scrapping using Beautiful Soup

As the title suggest, we use no more XML but a nice tool; Beautiful Soup. It works sort of similar because it allows us to apply functions to look by ID, classes, and so.

Goal: extract Title and description of published questions within stack overflow site main page

Tools to use: Beautiful Soup

  1. First, we import the requests library as usual
  2. Then the header stuff to avoid being banned
  3. Then we add the seed URL https://stackoverflow.com/questions
  4. Now we make the request (get) to the URL to get the full tree
  5. We can print the result (it will be 200 if successful)

Nothing new so far, we have this code

"""
Goal: extract Title and description of published
questions within stack overflow site main page
Tools to use: Beautiful Soup
"""
import requests

#change header to avoid being detected as a bot
new_header = {
    "user-agent" : "Mozilla/5.0 Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko Chrome/80.0.3987.149 Safari/537.36"
}

#define seed URL
url = "https://stackoverflow.com/questions"

#now we make the request (get) to the url to get the full tree
#request to server
result = requests.get(url, headers = new_header)

#show is site is reachable
print (result)

Result

<Response [200]>

Ok, time to import beautifulsoup (install it if not yet installed)

pip install beautifulsoup4 –user

from bs4 import BeautifulSoup

Remember that BeautifulSoup is another parser with a set of tools for retrieving/filtering info

So beautifulsoup receives the result as parameter (the text, HTML tree) and we assign it to a variable as usual

soup_info = BeautifulSoup(result.text)

Now inspect the page to look for some clues to scrap

We find a main <div> with an ID = “questions”

And within many <div> with a class = “question-summary”

So, the path is clear enough

We can use FIND to retrieve maybe the ID?

Let´s check

main_questions_container = soup_info.find(id="questions")

Now, with this main element I can get the child ones, so I am not gonna search within soup_info, because here we have ALL the trre info, BUT within the main_questions_container, because here I only have the child elements (question-summary)

Note;

Find retrieves only ONE result

Find_all brings everything, so now we´ll use this last

main_questions_container.find_all(class_="question-summary")

Note: calss is a reserved word, so python uses class_

I can also double check that I am searching within a tag (in this case <div>) by adding it at the very beginning like this

main_questions_container.find_all('div', class_="question-summary")

now we just assign it to a variable

questions_list = main_questions_container.find_all('div', class_="question-summary")

It is a LIST so I can iterate it. But, what to iterate? Maybe the title that is within a <h3> tag

for questions in questions_list:
    question_text = questions.find('h3').text
    print(question_text)

Let´s see the result

How to create a serializer for decimal in flutter
reference to submit is ambiguous: <T>submit(Callable<T>) in ExecutorService and method submit(Runnable) in ExecutorService match
How to grab a case in switch if its in another class using random
Android paging2 library : Network(PageKeyedDataSource) + Database idiomatic/expected way to implement
Why is my CAST not casting to float with 2 decimal places?
Question regarding slice assignment, deep copy and shallow copy in Python
How to access objects in S3 bucket, without making the object's folder public
Google Maps API - Const must be initialized
CS50_ Filter more: Blur
click on map doesn't get triggered when the click is on a polygon
How to generate UK postcode using Faker or by own function in Python?
Resize image to specific filesize python
C#: " 'The given path's format is not supported.'
why my code is in continuous loop for second part even if i have given correct user input what is the difference between (not in) and (!=)
Kompići C++ COCI 2011/2012 2nd round

And now to get the description we inspect and see that there is a class=excerpt

question_description = questions.find(class_='excerpt').text
print(question_description)

Result

Power BI - using Groups vs creating a new column using switch()


            If I have a continuous numeric field and want to group it in Power BI, I can

Create a new column using SWITCH() to perform the grouping

numeric_value_grouped = SWITCH(TRUE(),
    MyTable[... 
       
Is there a way to run simple HTML code in Visual Studio?


            Remember this is Visual Studio, the IDE, not Visual Studio Code, the text editor. Anyways, if I wanted to run some simple HTML code like <p>Hello world!</p>, how would I do it? This code ... 
       
How do I sort an ArrayList of class objects?

             I'm having trouble figuring out how to sort an ArrayList of objects. The objects are of a class CityTemp that implements the interface Comparable, and I have defined a compareTo() method. It works for ...

Ok, some weird spaces, so let´s fix them

question_description = question_description.replace('\n', '').replace('\r', '').strip()

replace to change newlines (\n and \r) with space (‘’) and strip to remove TABS or spaces before and after line so we get this result

How can I fix CORS error while trying to access an Angular website hosted on github pages with custom domain?

As mentioned on the title, I'm hosting my angular website on gh-pages and pointing a custom domain to it. The website was loading before I added in the custom domain. Here's an example of the error I ...

The relationship between DataTables in DataSet: can we check if the parent is so and so?

I loaded a complicated XML file with lots of data where are complex level of nested elements. The DataSet.ReadXml() load all that nicely and I can loop through all the nodes.Essentially each node is ...

Module status keeps running after it has been disabled

I rebooted my linux machine and started noticing these odd requests in my Apache access log.::1 - - [16/Dec/2020:21:28:54 -0500] "GET /server-status?auto HTTP/1.1" 404 147 "-" &...

Scrapy

Full doc

https://docs.scrapy.org/en/0.14/intro/overview.html

Scrappy comes in a set of Classes, we must import a set of functions, modules an classes.

Scrapy is a full framework.

First class:

class Date(Item):
    text = Field()

So, every element to search has its own properties, let´s say a product, has name, price, reviews and so

These are our Fields-> the  info I want to extract from  the product, meaning, I decide what to extract so these fields can be a lot of or just one.

Second Class:

The one that performs the extraction, our “spider”

First we define classes variables (we name our spider with any name and our seed URL)

Here we can define some rules to guide the spider where to look for the data.

Function parse, where the magic happens

    def parse(self, response):

parse recibes a parameter; response, where the HTML tree will be stored (I don´t need BeautifulSoup or XML to parse the tree, Scrapy does it by itself)

Scrapy calls its parsers “selectors”. I can search using XPath, ID, Classes, Lists and so.

To start loading data we use ItemLoader, which receives the class object I created and the selector (the HTML tree where I´ll search for the elements I want to) in this case the Text Field  that will be filled with the XPath expression

item.add_xpath('text', './/h3/a/text()')

Ok, the code so far, no worries this is just an intro we´ll get deeper in next lectures

"""
Scrapy
"""
#import modules
from scrapy.item import Field, Item
from scrapy.spiders import Spider
from scrapy.selector import Selector
from scrapy.loder import ItemLoader

#class type of data to extract, article, image, name, user, product
class Date(Item):
    text = Field()
    
class SpiderData(Spider):
    name = "MySpider"
    
    start_urls = ['https://site-to-crawl']
    #redirection rules here
    
    def parse(self, response):
        sel = Selector(response)
        page_title = sel.xpath('//h1/text()').get()
        
        list = sel.xpath('//div[@id="datos"]')
        for elements in list:
            item = ItemLoader(Dato(), element)
            item.add_xpath('text', './/h3/a/text()')
            

Now the practice with Scrapy

Ok, remember that the first step is to define our classes with the items to extract, it is a class abstraction. So our items to extract (again stackoverflow) are the questions , and the properties are the title and description from main page,

Now we have defined our items and properties, let´s start.

Let´s define the abstraction of what we want to extract.

We create a Class with any related name, the important is that this class inherit from ITEM. (from scrapy.item import item)

And then within the class I just define the properties I want to bring (question and description)

class Question (Item):
    question = Field()
    description = Fiel()

And that´s our Class definition

Now we need to define the CORE class for Scrapy

This is the one to perform our requests, parse, and more BUT it must inherit from SPIDER class

Note, we use Spider because we want to extract from ONE page, if we need to extract from multiple pages we´ll need another (we´ll see it later)

class MainCoreScrapy(Spider):

Now we can define several things within this main function:

  • Spider name
  • Header to avoid being detected as a bot (Scrapy defines it within “custom_setting” property) this object uses key-value pair with USER_AGENT in CAPS and the value the one we already know
  • URL (seed/starting URL)
class MainCoreScrapy(Spider):
    name = "MainSpider"
    custom_settings = {
        'USER_AGENT' : ['Mozilla/5.0 Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko Chrome/80.0.3987.149 Safari/537.36']
          
    }
    start_urls = ['https://stackoverflow.com/questions']
    

Done (because is just one-page-spider)

Now we need to define the function were the magic happens (parse function)

def parse(self, response):

We need nothing to add here because it is just one URL and Scrapy does the magic in auto mode and returns the response with the HTML tree.

So I receive the HTML tree, but where do we parse it?

Well, within the response parameter def parse(self, response):

So, here we have the first step done (request)

Now we need to parse it. We won´t use BeautifulSoup or XML, we´ll use the Scrapy’s way; the selector class

    sel = Selector(response)

now we have this sel variable with the selector response that we´ll use to ask the page for the useful info

I can use XPATH or CSS to make my wat though the tree

SO this is the same example that we used with BeautifulSoup, so we know that we need the question-summary within <div> tag

So we should build an expression to retrieve those <div> within a List to iterate and get the data

questions_list = sel.xpath('//div[@id="questions"]//div[@class="question-summary"]')

Now we can iterate this List

    for question in questions_list:

the variable questions will have every element in every iteration until finishing

so, now we have the element as shown in the image and we are iterating all of them BUT we need to extract every item (the questions) from them.

Remember that when we started the program we imported some tools, one of them was

from scrapy.loader import ItemLoader, this class loads ITEMS,

so ItemLoader is a class that receives as the first parameter, an instance from my class that contains an abstraction of what I need to extract, (questions) and the second parameter will be the HTML element (the selector) with the info that we´ll use to fill these fields

class Question (item):
    question = Field()
    description = Field()

so we have this so far

 for question in questions_list:
        item = ItemLoader(Question()question)

Now I have to fill the fields question and description, maybe we can try this (several ways to do it)

        item.add_xpath('question', './/h3/a/text()')

so the ‘question’ will be filled with the xpath expresion coming from .//h3/a/text() that is contained within the HMTL element that I called in here

    item = ItemLoader(Question()question)

Now we need to fill the ‘description’, but now our Xpath expression should reach ‘excerpt’

NOTE: when I use a dor and // is because the search is RELATIVE to an element, in this case, ‘question (item = ItemLoader(Question()question))’

    item.add_xpath('description', './/div[@class="excerpt"]/text()')

Well, now I need to apply a special return to close this

    yield item.load_item()

this will send to an archive the info loaded in items

yield vs return

I can not only add via XPATH, but I can also do it by “value” to fill any property; just let´s add another field just to check how easy is and how to do it

item.add_value('id', 1)

What is this? we just ADDED a VALUE (1) to the ID, instead of using XPATH we just added it like this

I only need to add a FIELD value to our main abstraction class

id = Field()

that piece of code goes inside our class

class Question (Item):
    id = Field()
    question = Field()
    description = Field()
 

we have this working code so far

"""
Scrapy
"""
#let´s install required modules and functions
from scrapy.item import Field
from scrapy.item import Item
from scrapy.spiders import Spider
from scrapy.selector import Selector
from scrapy.loader import ItemLoader


#ABSTRACTION OF DATA TO EXTRACT
#define the data I have to fill in
#that will go to the results file
class Question (Item):
    id = Field()
    question = Field()
    description = Field()
 
#CLASS CORE - MainCoreScrapy
class MainCoreScrapy(Spider):
    name = "MainSpider"
    
    # configure the USER AGENT in Scrapy
    custom_settings = {
        'USER_AGENT' : ['Mozilla/5.0 Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko Chrome/80.0.3987.149 Safari/537.36']
          
    }
    
    # URL (seed URL)
    start_urls = ['https://stackoverflow.com/questions']
    

    # This function will be filled when we make a request to the seed URL
    def parse(self, response):
        # Selectors: Scrapy´s Class to extract data
        sel = Selector(response)
        questions_list = sel.xpath('//div[@id="questions"]//div[@class="question-summary"]')
    
       
        for question in questions_list:# instantiate my ITEM with the selector where there are the data to fill with
            
            # Fill my ITEM+s properties with XPATH expressions to search within the "question" selector

            item = ItemLoader(Question(), question)
            item.add_xpath('question', './/h3/a/text()')
            item.add_xpath('description', './/div[@class="excerpt"]/text()')
            item.add_value('id', 1)
        
                       
            #Yield info to write data in file
            yield item.load_item()
            
#HOW TO RUN IN TERMINAL:
# scrapy runspider myfilename.py -o filename.csv -t csv
# scrapy runspider myfilename.py -o results.csv -t csv

Ok, if we just run this code from within our IDE or IDLE, it won´t work…why?

Because we need to run it from the terminal that will run the scrapy spider

BUT we need to send it to an archive (-o filename -t extension)

if we run that we´ll get a file -> scrapy_stackoverflow.csv (it can be json too, I just took CSV) after some code executed within the terminal. Look how scrapy returns the seed URL and the parse results within the terminal

Now if we open our csv file with a notepad or some text editor we´ll have something like this

description,id,question

I tried many ways to create service process with session1, but I couldn’t find a suitable way to create a service process.
Can someone help me?thanks.
“,[1],CreateService with Session1

I wonder if I can somehow pass TInputQueryWizardPage, TInputOptionWizardPage, TInputDirWizardPage, TInputFileWizardPage, TOutputMsgWizardPage, TOutputMsgMemoWizardPage, TOutputProgressWizardPage pages …
“,[1],Can I somehow pass TInput/TOutput pages to a function as one parameter in Inno Setup?

Envs: Ubuntu 18.04, Miniconda3, python=3.7(GCC=7.3.0), GCC -v (7.4.0)
The error occurs when I run the following command:
scons build/X86/gem5.opt -j8

The error is as follow:
[ LINK] -> X86/…
“,[1],LTO compilation question when linking X86/marshal file

I have image of skin colour with repetitive pattern (Horizontal White Lines).
My Question is how to denoise the image effectively using FFT without affecting the quality of the image much, somebody …
“,[1],How to remove repititve pattern from an image using FFT

We have the result, very obscured because we have some tabs, spaces , it is not “clean” yet (we´ll fix that in later chapters)

But, take a look at the [1] -> that is the ID we added id = Field() and item.add_value(‘id’, 1)

If we comment those two lines and run again the code, we´ll have no ID [1].

I have a server that accept ssl connections on port 443. I am using the boost libraries for the server implementation. Below is the code snippet:
{
// Open the acceptor with the option to reuse the …
“,Recv-Q has data pending as per netstat command and it never gets cleared

So I have an object which is moving in a circular path and enemy in the centre of this circle. I’m trying to find out how to calculate shotingDirection for bullets. Transform.position isn’t ennough …
“,”How to shoot an object, which is moving in a circle”

df = pd.read_csv(CITY_DATA[city])

def user_stats(df,city):

""""""Displays statistics of users.""""""

print('\nCalculating User Stats...\n')

start_time = time....
    ",How can I display five rows of data based on user in Python?


I want scope viewmodel by Fragment but activity,
interface MarsRepository {
suspend fun getProperties(): List
}
@Module
@InstallIn(FragmentComponent::class)
class MarsRepositoryModule …
“,How install view model in fragmentComponent with Hilt injection?

im trying to set the placeholder for the v-select
<v-select
item-value=””id””
item-text=””name””
:placeholder=””holderValue””
v-model=””selectedDM””
label=””…

If we also comment the lines that brings the “description”, we´ll get a nice list of headlines with no spaces at all

class Question (Item):
    #id = Field()
    question = Field()
    #description = Field()

.....

for question in questions_list:# instantiate my ITEM with the selector where there are the data to fill with
            
            # Fill my ITEM+s properties with XPATH expressions to search within the "question" selector

            item = ItemLoader(Question(), question)
            item.add_xpath('question', './/h3/a/text()')
            #item.add_xpath('description', './/div[@class="excerpt"]/text()')
            #item.add_value('id', 1)

The result is much better

question
how to avoid invalid characters ? JSON
How do I make my require statement wait for it to finish before continueing?
GetEntityMetadata returns 0 attributes
Centralizing a TField’s Size value
GLSL Fit Fragment Shader Mask Into Vertex Dimensions
How can I create rounded corners button in WPF?
Return list of strings for query buffer with StatisticDefinition
How can I quickly groupby a large sparse dataframe?
How to remove space in the input box
Need a help REGEX php preg_match [duplicate]
How to draw a .obj file in pyqtgraph?
I have problem with converting MATLAB code to Python
Get the OpenSSL::PKey::EC key size in Ruby
how should the advanced contact page mysql scheme be?
Why is ZooKeeper LeaderElection Agent not being called by Spark Master?

Finally we can get and ID with an iterable and added auto number isntead of set a fixed value, this is, instaed of havinh [1] we can just add a counter within the code like this (bold)

i = 0

        for question in questions_list:# instantiate my ITEM with the selector where there are the data to fill with
            
            # Fill my ITEM+s properties with XPATH expressions to search within the "question" selector

            item = ItemLoader(Question(), question)
            item.add_xpath('question', './/h3/a/text()')
            #item.add_xpath('description', './/div[@class="excerpt"]/text()')
            item.add_value('id', i)
            
            i +=1

id,question
[0],Failure to clone risc-v tools (failure with newlib-cygwin.git)
[1],How would you represent musical notes in JavaScript?
[2],Do I need to call DeleteObject() on font retrieved from SystemParametersInfo()?
[3],Automaticaly positioning rectangles estheticaly on a canvas with D3.js
[4],Paging in virtual memory
[5],How to run program that connects to another machine in C++?
[6],Rsnapshot filepermission problem with network hdd over raspberry pi
[7],Adding reaction if message contains certain content
[8],”Batch – Findstr with error level condition, quotes?”
[9],Implementing microprofile health checks with EJB application
[10],“Add [name] to fillable property to allow mass assignment on [Illuminate\Foundation\Auth\User].”
[11],Vscode platform specific shortcuts
[12],Problem filling an array of objects. The result is always null
[13],how to get the name instead of reference field in mongoengine and flask -admin
[14],extract email attachment from AWS SES mail in S3 with Python on AWS Lambda


IMPORTANT: if when running the code you get a blank file (0 bytes) is because you made a mistake within any of the XPATH expressions


How to Program Your Own Password Generator With Python

Let´s find out how easy is to create a strong password generator with Python in a few lines

We are gonna take the original code from Zsecurity and we´ll optimize it a bit to handle user input.

I highly recommend Zsecurity if you want to deep into Ethical hacking, Security, Python programming and more.

Ok, back to the code, is something like this

"""
strong passwoed generator
"""
#we import RANDINT to generate a random number between 2 given values
from random import randint

#we add some variables that will store our "dictionary"
#from which we´ll create the password
lowerCase = "abcdefghijklmnopqrstuvwxyz"
upperCase = lowerCase.upper()
numbers = "1234567890"
special = "!#$%&/()=?¡@[]_<>,."

#we add the variables to get a very strong password
password_creation = lowerCase + upperCase + numbers + special

#we ask the user to enter the password lenght
password_lenght = int(input("How many characters do you wish the password? Minimum 8 maximum 1024: "))
password = ""
lenght = 0

if password_lenght < 8 or password_lenght > 1024:
    print("Minimun 8 characters - maximum 1024, try again")
    
#here the magic happens creating the final password    
else:
    while lenght < password_lenght:#while password lenght do not reach the value entered by user
        
#the password is created selecting a random (randint) value from password_creation
        password = password + password_creation[randint(0, len(password_creation) -1)]
        lenght +=1
    print("Generated password: ", password)#we show the result

If we run it we´ll be asked to enter the lenght we want the password to have (I entered 24)

How many characters do you wish the password? Minimum 8 maximum 1024: 24

Generated password: %5VcsbRU!B.k949h#w/kUZXG

Ok, it seems a secure password, maybe we can check how secure is

Kaspersky Password Checker

or here My1login

or you can choose anyone, it seems that it works.

Now we have a problem here

Just for having some boundaries, I asked the user to enter a numer between 8 and 1024 (no need to limit, but a password less than 8 cgharacters is no password and more than 1024 is maybe…a bit long, you can set your own limits editing the code)

But the problem is that there is NO user-input check…yet

Let´s run the code agaian an enter a value OUT of the input request, let´s say five characters

>>> %Run z_strong_password_gen.py
How many characters do you wish the password? Minimum 8 maximum 1024: 5

Minimun 8 characters - maximum 1024, try again

What happened here? Well, the program stopped because we just coded a message telling that the input was not valid, BUT we did nothing after that. So let´s fix it

There are SEVERAL ways to do it, from non-good practices (like “repeating yourself”) to a bit more advanced like creating a function to check user input.

Let´s pick up a medium-level one to make things easy

Goal: Request User Input Until Valid

What are we gonna use: try/except statements and If/break

If you don´t know what is this, you can read a nice intro to try-except (w3school)

Well, here the code again with some comments to understand it better

"""
strong passwoed generator
"""
#we import RANDINT to generate a random number between 2 given values
from random import randint

#we add some variables that will store our "dictionary"
#from which we´ll create the password
lowerCase = "abcdefghijklmnopqrstuvwxyz"
upperCase = lowerCase.upper()
numbers = "1234567890"
special = "!#$%&/()=?¡@[]_<>,."

#we add the variables to get a very strong password
password_creation = lowerCase + upperCase + numbers + special

password = ""
lenght = 0
while True:
    try:
        password_lenght = int(input("How many characters do you wish the password? Minimum 8 maximum 100: "))
        if password_lenght < 8 or password_lenght > 100:
            print("Minimun 8 characters - maximum 100, try again")
    except ValueError:
        print ("Please enter a valid NUMBER...")
        continue
    else:
        while lenght < password_lenght:#while password lenght do not reach the value entered by user
        

            password = password + password_creation[randint(0, len(password_creation) -1)]
            lenght +=1
            
"""
the password is created selecting a random (randint) value
from password_creation and iterating a number of times through
the password_creation. This times is the number entered by the user
so every time the loop runs, it picks a random character from
password_creation and add it to the string it is creating until it reachs the value entered by the user
"""
            
        print("Generated password: ", password)#we show the result   
   

Now let´s add some input validation to avoid “bad” input like non-digit values or numbers out of range

from random import randint

lowerCase = "abcdefghijklmnopqrstuvwxyz"
upperCase = lowerCase.upper()
numbers = "1234567890"
special = "!#$%&/()=?¡@[]_<>,."

password_creation = lowerCase + upperCase + numbers + special

while True:
    # pulled password and length in here to reset on each loop
    password = ""
    length = 0

    try:
        password_length = int(input("How many characters do you wish the password? Minimum 8 maximum 1024: "))
        if password_length < 8 or password_length > 1024:
            print("Minimun 8 characters - maximum 1024, try again")
            # add continue, to not try to create password, if validation fails
            continue
    except ValueError:
#this prevents user to enter non-numerical values
        print("Please enter a valid NUMBER...")
        continue
    else:
        while length < password_length:
            password = password + password_creation[randint(0, len(password_creation) - 1)]
            length += 1
        print("Generated password: ", password)

that was ok, but what about optimizing the code to make it smaller and more legible?

Let´s try it

from random import randint

lowerCase = "abcdefghijklmnopqrstuvwxyz"
upperCase = lowerCase.upper()
numbers = "1234567890"
special = "!#$%&/()=?¡@[]_<>,."

mega_pass = lowerCase+ upperCase + numbers + special

lenght = 0
password_lenght = 0
password = ""


while True:
    try:
        password_lenght = int(input("How many characters long do you want the password? (min 8 - max 100; \n"))
    
    except ValueError:#if user enters a non-numerical value it will ask again
        print("\nNumbers please...")
    
    if  password_lenght not in range (8, 101):#if the user enters a numerical value OUT of range it asks again

        print("\nEnter a digit between 8 and 100...")
        
    
    else:
        while lenght < password_lenght:
            password = password + mega_pass[randint(0, len(mega_pass) -1)]
            lenght +=1
        
        print("\nYou choose" , password_lenght , "characters long\n")
        print("Here is your PASSWORD:\n ", password)
        break#it just ends the program after one password deliver

Let´s test it

Enter a digit between 8 and 100…
How many characters long do you want the password? (min 8 – max 100;
0 -> it must be 8 or more

Enter a digit between 8 and 100…-> so it asks again
How many characters long do you want the password? (min 8 – max 100;
5000 -> is greater than 100, so it asks again

Enter a digit between 8 and 100…
How many characters long do you want the password? (min 8 – max 100;
74747gggg -> we enter digits AND characters

Numbers please…-> it detects them and ask again for just numbers

Enter a digit between 8 and 100…
How many characters long do you want the password? (min 8 – max 100;
80 -> is between 8 and 100

You choose 80 characters long

Here is your PASSWORD:-> so it calculates our password and shows it below
Q$lK=TjJ8lCSx=9_mqkcEXzuVt[W#.p9DrB.FtAexuAXE)z]5_GdhI)G[tV!$YI_NZIz<fRe!T(Fvlnv

That was nice! we could ask the user to enter some value between a given range, NON alphanumerical BUT only digits, check the input, ask again if the user failed to enter the right characters, and calculate the strong password

Resources to keep learning

Data Structures AWESOME Cheat Sheet here

Github Repositories to Learn Python

Leave a Reply

Your email address will not be published. Required fields are marked *