Ok, let´s start coding with these new bits of info we already have.
x = 5
y = 4
print(x < y)
Result = FALSE
x = 5
y = 4
print(x > y)
Result = TRUE
x = 5
y = 4
print(x >= y and y == 4)
Result = TRUE
import random
x = random.randint (1, 5)
if x == 1:
print ("I am number 1")
elif x == 2:
print ("I am number 2")
else:
print ("I am number", x)
if we get 1 -> I am number 1
if we get 2-> I am number 1
if we get another number -> I am number 1
For loop
import random
y = 0
for i in range (10):
y += random.randint(1, 5)
print (y)
print ("y = ", y)
Result:
5
7
12
16
17
18
21
23
26
29
y = 29
While
import random
while w != 5:
w = random.randint (1, 100)
print (w)
print ("\nw = ", w)
Lists
Lists are like variables BUT they can store several ordered values
This is a LIST -> [5, 4, 24, 36, 9]
But we can also have a LIST with different elements, like numbers, float, strings…
The awesome feature of LISTS is that being an ORDERED structure. EACH element within the list has an INDEX (starting from 0), so we can call each element of the list with no mistake.

Take a look at the previous image; starting from the right, we get the FIRST index as 0; starting from the left, e have NEGATIVE indexes, in this case, we have -3.(starting with -1)
To create a LIST, we must put a name = [] -> being the [] what defines our list.
list_! = [] -> this is an empty list
list_2 = [2, 5, 9, 6, 56]
#-> this is a FIVE elements list (always comma separated)
So, let´s say I want to access the third element of this list. How can I do that? Well, remember that a list has ORDERED elements with INDEXES starting from 0 (right) or -1 (left), so I can call it this way:
list = [4, 6, 78, 43, 56]
print (list[2]) #-> it will show 78 (index0, index1, index2 - third position)
Now, let´s say I want to access the last element ...we can use negative indexes (remember that negative indexes start with -1), so the last element would be
list = [4, 6, 78, 43, 56]
print (list[-1]) #-> it will return 56
We can also ADD (append) elements to a list.
Appends ADDS an element at the end of the list
list = [4, 6, 78, 43, 56]
list.append (21)
print (list)
[4, 6, 78, 43, 56, 21]
We can also REMOVE an element from the list, by VALUE (bear in mind that it will remove the first occurrence from the list)
list = [4, 6, 78, 43, 56]
list.append (21)
list.remove (6)
#-> it will remove the VALUE 6
print (list)
[4, 78, 43, 56, 21]
Or we can remove a value by INDEX using POP
list = [4, 6, 78, 43, 56]
list.pop (4)
#-> it will remove index 4, fifth position; number 56
print (list)
[4, 6, 78, 43]
We can aslo EDIT or REPLACE elements within the list
list = [4, 6, 78, 43, 56]
list [0] = 456# replaces index 0, first element (4) with 456
print (list)
[456, 6, 78, 43, 56]
list = [4, 6, 78, 43, 56]
list [0] += 456# edit, adds 456 to first element
print (list)
[460, 6, 78, 43, 56]
How many elements has my list?
list = [4, 6, 78, 43, 56]
print (len(list))
# Result = 5
Show the value calling it by its index
list = [4, 6, 78, 43, 56]
print (list.index(78))
#Result = 2 -> index number 2 has the element/value 78
To check if a value is within a list
list = [4, 6, 78, 43, 56]
print (4 in list)
#Result = True
Sorting and reversing a list
list = [4, 6, 78, 43, 56]
list.sort() #shows list in ASCENDING order
print (list)
list.reverse()
print (list)
Result –> sort = [4, 6, 43, 56, 78]
Result-> reverse = [78, 56, 43, 6, 4]
Loop through a list
We use a FOR loop, setting a variable name to identify the elements within the list
list = [4, 6, 78, 43, 56]
For element in list:
print (element) -> for each loop, the element variable will take every value within the list
Result =
4
6
78
43
56
I can also loop the list by its index numbers instead of by its value
list = [4, 6, 78, 43, 56]
for i in range (len(list)): -> I = variable to host index, len to get the length of list
print (i)
Result =
0
1
2
3
4
I can also loop the list by its index numbers instead of by its value
list = [4, 6, 78, 43, 56]
for i in range (len(list)): -> I = variable to host index, len to get the length of list
print
(i)
Result =
0
1
2
3
4
Finally I can loop the list by its index numbers bringing their values
list = [4, 6, 78, 43, 56]
for i in range (len(list)):
print (list [i])
Result =
4
6
78
43
56
Loop Brief
list = [4, 6, 78, 43, 56]
for element in list:
print (element)# loop by value
print ("\n")
#adds a newline, a separator
for i in range (len(list)):
print (list [i])#loop by index
The two pieces of code return the same value
4
6
75
43
56
Strings
Strings can also be seen as Lists, they also have indexes

We can also count the elements of the string
string_test = "Hello World"
print (len(string_test))#we get the lenght of this string
Result = 11
We can also index a string like a list
string_test = "Hello World"
print (string_test[0])# I call the FIRST character of the string, index 0
Result = H
And we can loop the string
string_test = "Hello World"
for character in string_test:
print (character)
We also have several functions to use with string
string_test = "Hello World"
print (string_test.lower())# convert text to lowercase
print ("\n")
print (string_test.upper())# convert text to uppercase
print ("\n")
print (string_test.capitalize())# capitalize text
hello world
HELLO WORLD
Hello world
We also have methods that return True or False
string_test = "Hello World"
print (string_test.startswith("Hi"))# result-> False
print ("\n")
print (string_test.startswith("He"))# result-> True
print ("\n")
print (string_test.endswith("wo"))# result-> False
print ("\n")
print (string_test.endswith("ld"))# result-> True
print ("\n")
print (string_test.isalpha())#return False because the space betweeno and w is not an alphabet character
print ("\n")
print (string_test.isdigit())# result-> False
Strip, to remove spces before and after text
string_test = " Hello World "#Note that we added a space before H and another after d
print (string_test.strip())Astrip will remove both spaces before and after text
Result
Hello World (with NO spaces before or after text
Split
Split will transform a string into a list, taking as separator the one we pass the function as parameter (usually a comma)
fruits = "apple, orange, banana, tangerine"# note that every fruit is comma separated
print (fruits.split(","))#split will return as a LIST, the COMMA is the separator
Result = ['apple', ' orange', ' banana', ' tangerine']
Join-> list to string
To use join the sintax is like this
“”.join, where “here goes the character I want to use to join”
fruits = "apple, orange, banana, tangerine"# note that every fruit is comma separated
list1 = fruits.split(",")#save the result to the variable list1
list2 = "-".join(list1)# the character "-" will join the list
print (list2)
Result = apple- orange- banana- tangerine
It goes from a list to a string joined by the “-” character
Note -> “handmade ” way to fill a List with characters from a string
Let´s say I have my string and want to “fill” a List with every character with a For loop. We can do it this way
fruits = "apple, orange, banana, tangerine"# note that every fruit is comma separated
print (fruits)
#prints the string fruits
print ("\n")
liststring = []
for c in fruits:
liststring.append(c)
print (liststring)#prints the List liststring
Result =
apple, orange, banana, tangerine
['a', 'p', 'p', 'l', 'e', ',', ' ', 'o', 'r', 'a', 'n', 'g', 'e', ',', ' ', 'b', 'a', 'n', 'a', 'n', 'a', ',', ' ', 't', 'a', 'n', 'g', 'e', 'r', 'i', 'n', 'e']
Ok, of course, that if you want to append the words (NOT the characters one-by-one) there is always an easy and right way to do it
fruits = "apple, orange, banana, tangerine"
lista = []
lista.append(fruits)
print (lista)
Result = ['apple, orange, banana, tangerine']
or if you want to convert a string to a list just put the string name within []
fruits = "apple, orange, banana, tangerine"
lista = [fruits]
print (lista)
Result = ['apple, orange, banana, tangerine']
Tuple-> immutable Lists; once defined we can not change it.
It is defined as a List, BUT with () instead of []

tupla1 = (1, 3, 5)
print (tupla1, "\n")
Result = (1, 3, 5)
I can access its indexes/values like a List
tupla1 = (1, 3, 5)
print (tupla1 [0], "\n")
Result = 1
We can make some operations with the tuple.
tupla1 = (1, 3, 5)
print (tupla1[0] + tupla1[1])# we´ll get 4 (1 + 3)
print (tupla1)
tupla2 = tupla1 + (1, 3, 5, 8)
# we add tupla1 plus values 1, 3, 5 and 8
print (tupla2)
Result =
4
(1, 3, 5)
(1, 3, 5, 1, 3, 5, 8)
BUT now if i want to edit it I can´t do it, because it is an immutable List
var = (3, 4 , 8)
var[0]=8
print (var)
Result =
Traceback (most recent call last):
var[0]=8
TypeError: 'tuple' object does not support item assignment
I can also convert a List to a Tuple, using the reserved word TUPLE
Take a look at the example, we also show the TYPE of the variables to check they are a List or a Tuple
var = [3, 4 , 8]
print("var = " ,type(var))
tupla = tuple(var)
print (var)
print("\n")
print("tupla = " ,type(tupla))
print(tupla)
Result =
var = <class 'list'>
[3, 4, 8]
tupla = <class 'tuple'>
(3, 4, 8)
TIP: Tuples are Faster than Lists
SETS
Sets DO NOT keep an order (they have no index), the go between curly braces {}
They don´t allow duplicated elements, so we can tell that a SET is a List of UNIQUE and UNORDERED elements
As any Set, we can perform some operations with them like union, intersection, difference and symmetric difference.

we have two ways to create a set; by assigning a variable to a {} with comma-separated elements within
set_1 = {1, 4, 6, 8,55}
or take a List and convert it to a set with the reserved keyword SET
list_1 = [5, 4, 76, 22]
set_2 = set(list_1)
We can add elements, but with the ADD function (APPEND is for Lists)
set_1.add(10)
Now let´s see what happens when we want to add a duplicated element
set_1 = {1, 4, 6, 8, 22}
print (set_1)
Result = {1, 4, 6, 8, 22}
#now we add a duplicated element
set_1.add(22)
print (set_1)
Result = {1, 4, 6, 8, 22}
as we can see, we have NO error but NO change, the SET just ignores the duplicated element and returns the same set with no change
We can remove an element
set_1.remove(22)
print (set_1)
Result = {1, 4, 6, 8}
We can also loop the set
set_1 = {1, 4, 6, 8}
for elem in set_1:
print (elem)
Result =
1
4
6
8
Ok, let´s check if a given element is within the set with the IN keyword
set_1 = {1, 4, 6, 8,22}
print (5 in set_1)
Result = False -> there is NO 5 in our set
set_1 = {1, 4, 6, 8,22}
print (6 in set_1)
Result = True -> yes, we have a 6
Tip: searching within a Set is fastest than doing it within a List
Let´s test some operations with sets
Union
set_1 = {1, 4, 6, 8,22}
set_2 = {11, 2, 6, 8, 9}
print (set_1.union(set_2))
Result = {1, 2, 4, 6, 8, 9, 11, 22}
-> note that we had duplicated 6 and 8 and the final set did NOT duplicate them, just kept one element
Intersection
set_1 = {1, 4, 6, 8,22}
set_2 = {11, 2, 6, 8, 9}
print (set_1.intersection(set_2))
Result =
{8, 6}
Dictionaries
Dictionaries DO NOT store individual values but key-values pairs
dic1 = {"name": "Freelancer", "age": 45, "profession": "developer"}
print (dic1)
Result = {'name': 'Freelamcer', 'age': 45, 'profession': 'developer'}
Note the example; we have 3 key-values elements (comma-separated) with the first key-element “name” and value 19, and so…
we can think of dictionaries as unordered lists (we have no indexes) with no content restriction (I can “mix” numbers and letters)
Either way we can access any element within the dictionary because they have a key to every element.
So the keys would be our indexes -> I get the values through the keys
A good practice is to write dictionaries this way (pretty much like Json code) to make them clear
dic1 = {"name": "Freelancer",
"age": 45,
"profession": "developer"
}
print (dic1["age"])#we call the value (45) by its key (age)
Result = 45
Adding values
dic1 = {"name": "Freelancer",
"age": 45,
"profession": "developer"
}
dic1["country"] = "Argentina"# add a new key-value pair
print (dic1)
Result = {'name': 'Freelancer', 'age': 45, 'profession': 'developer', 'country': 'Argentina'}
Deleting and editing values
dic1 = {"name": "Freelancer",
"age": 45,
"profession": "developer"
}
dic1["country"] = "Argentina"# add a new key-value pair
del dic1["age"]# delete a pair
dic1["name"] = "Web Developer"#edit a value
print (dic1)
Result = {'name': 'Web Developer', 'profession': 'developer', 'country': 'Argentina'}
Looping the dictionary
Ok, we´ll use for as usual, BUT now we have to go though 2 values, not just one (for element in….working no more), so how can we do that?
for key_variable_name, value_variable_name in list (dictionary_name.items()):
dic1 = {"name": "Freelancer",
"age": 45,
"profession": "developer"
}
for k, v in list(dic1.items()):
print (k, v)
Result =
name Freelancer
age 45
profession developer
Nested structures
dic1 = {"name": ["This is a list within a dictionary"],
#and now another dictionary within the dic1
"details": {
"age": 45,
"profession": "developer",
"country": "Norway"
}
}
for k, v in list(dic1.items()):
print (k, v)
Result =
name ['This is a list within a dictionary']
details {'age': 45, 'profession': 'developer', 'country': 'Norway'}
But, how do I access the age? I should “index” from the outer structure to inner one
dic1 = {"name": ["This is a list within a dictionary"],
#and now another dictionary within the dic1
"details": {
"age": 45,
"profession": "developer",
"country": "Norway"
}
}
print (dic1["details"]["age"])# note the double "indexation"
#[details][age] to finally reach the value
Result = 45
Another example
dic1 = {"name": ["This is a list within a dictionary", 89, 65, 77],
#and now another dictionary within the dic1
"details": {
"age": 45,
"profession": "developer",
"country": "Norway"
}
}
print (dic1["name"][1])# note the double "indexation"
#[name][1] to finally reach the value
Result = 89
Tip: accesing a value within a dictionary is faster than a list
Functions
A function is a block of code with a anme associated to it, which only runs when it is called.
You can pass data, known as parameters, into a function.
A function can return data or execute a task as a result.
Bear in mind that everything used so far are functions (print, len, int, range, input and so). Here a list of built-in Python functions
But we can also create OUR OWN functions. Remember that we can call a function several ways, that´s one of its main goals; doing a repetitive task several times without the need to coding all again and again, just call the function wiy¿th the new parameters and we are done.
Let´s say I want to get the summation of all numbers within the list, and that I am gonna use this 5 times in my program. I can write 5 times the code to do the same task, or I can create a function and call it several times without the need to code all from scratch over and over again.
We start by creating our new function with the reserved word “def” followed by the “name_of_this_function” and “()“. We finish the command with “:”
Inside the () we must pass the “parameters” for the function to work with. The parameters are all the information that the function needs to operate. Usually we{ll work with some kind of abstraction, meaning, let´s say I want to create the function to add all the elements of a list, so my main parameter will we a List, then I should create a function like this
def myfunct (list_1):
Where the (list_1) parameter is not yet created, BUT we need a list to work with, the function needs a parameter
Ok, once created the function, and after the “:” note that the next line will be indented, we should define the function as it, this menas, we created it with “def”, now let´s define what the function should do.
Ok, we wanted to sum up all the elements of the list, so it would be nice to initialize a variable (x = 0 ) to keep tracking of the summation of elements from the list
And then we should do a for loop to go through all the elements in the list, as usual, AND sum them.
x = 0
for elem in list_1:
x += elem
Now we have the sum of tle elements in the list stored within the “x” variable, so our function worked so far, but the function not only receives parameters, but it also returns a result, so how can we get that result? Of course, with the reserved keyword “return”. And, what should we return? well, in this case the “x” variable containing the sum of elements
Note that so far it is all an abstraction, nothing happens yet, if I do run this code nothing seems to happen
def myfunct (list_1):
x = 0
for elem in list_1:
x += elem
return x
Result =
>>> %Run functions.py
To use this function, we need to “call” it. BUT to call it we have to satisfy the parameters, meaning, I have to “pass the parameters” to the function, so it can execute the abstraction and returns a value
How do we call a function? Well, by its name -> myfunct
BUT we must save the results sent to myfunt in a variable so we can print it, AND we must pass the list (between []) to the function.
summation = myfunct ([1, 2, 3, 4, 5, 6, 7, 8, 9])
print (summation)
Remember that we did this?
def myfunct (list_1):
Well, this one
myfunct ([1, 2, 3, 4, 5, 6, 7, 8, 9])
“fills” (list_1) with the elements that are used by the function. It sounds a bit weird, I know, is like starting from the end, but that´s how it works. It happened why when we “call” the function with the parameters, then it is executed the “abstraction” of the function, not before, this is not line-by-line from 1 to 10, but it goes from start to the end, fetch the parameters and then executes the code and show the results.
Let´s see and check the full code
def myfunct (list_1):
x = 0
for elem in list_1:
x += elem
return x
summation = myfunct ([1, 2, 3, 4, 5, 6, 7, 8, 9])
print (summation)
Result = 45
So, when I call the function by its name (myfunct) it is the moment the function runs using the values that I pass as a List (note that we assign the function to a variable “summation”so we can later print it)
So, the parameters are the communication between the program and the function, and the return is the communication between the function and the program.
We must pay attention to variables when defining functions; local and global ones.
In our recent example we defined the variable as “local” to the function, meanind that if we call the variable “out” of it, we´ll get an error . Let´s check
def myfunct (list_1):
x = 0
for elem in list_1:
x += elem
return x
print (x)-> calling a local variable out of the function
summation = myfunct ([1, 2, 3, 4, 5, 6, 7, 8, 9])
print (summation)
Result =
print (x)
NameError: name 'x' is not defined
We get this error because the “x” variable is defined WITHIN the function. The same thing will occur if I define a variable out of the function…my function will not recognize it because it is “out” of the function environment -> myfunct do not know the variable “summation”
Then, we´ll repeat thi “The communication between the function and the program is through to the parameters, this info (parameters) I give to the function are used to get a result, how can I get those results? with a return”
I can also use default parameters, this is, if the code do not pass me parameters, the function uses its own definde default parameters
Archives
To operate with files we have some “modes”
r : Read mode
a = Add
w = Write from 0 (start a file)
So, to open a file to Read, we have to get a variable where to assign the file we want to open/read/modify “=” open (“filename to operate with” , “mode to operate”)
So if I have a file named filex.txt contaning:
Peech,1 ,1
orange,3, 3
Apple,10, 5
I can READ it like this, and obtain a List from its elements
filex = open ("file1.txt", "r")
print ( filex.readlines())
Result = ['Peech,1 ,1\n', 'orange,3, 3\n', 'Apple,10, 5\n']
Note that we have every file with a /n after them. Why?
Because they are special characters; a/n means a Newline (Enter) and our filex.txt has the lines and 3 newlines/Enters
To fix this we must iterate (for loop) the fil instead of readint it as a whole
for variable name to iterate every line in archive_to_read
Then I have to “clean” every line (the /n in this case) when reading it with strip function
filex = open ("file1.txt", "r")
for line in filex:
line = line.strip()
print (line)
Result =
Peech,1 ,1
orange,3, 3
Apple,10, 5
We could iterate the filex.txt, get every line WITHOUT the /n, and we printed it
Now let´s see how to get, maybe the 5 from the Apple
To get that done, we must find a pattern; this time we can see that tehre are commas “,”, splitting every fruit from its quantity and proce. We also know that we have a function to convert a strinh to a list (split)
filex = open ("file1.txt", "r")
for line in filex:
line = line.strip()
line = line.split(",")
-> this is the split function
print (line)
Result =
['Peech', '1 ', '1']
['orange', '3', ' 3']
['Apple', '10', ' 5']
Now, we have a file structure that can be indexed, so I can easily retrieve the element I need.
Let´s see an exampl, let´s index [0] to grab the fruit names
filex = open ("file1.txt", "r")
for line in filex:
line = line.strip()
line = line.split(",")
print (line[0]) -> indexing the first positio (0) to get the fruit name
Result =
Peech
orange
Apple
Remember that I get STRINGS when operating with files, so if we get the index [1]
filex = open ("file1.txt", "r")
#print ( filex.readlines())
for line in filex:
line = line.strip()
line = line.split(",")
print (line[2])
Result =
1
3
5
Those “numbers” are TEXT, so to operate with them I must first transform them with INT
filex = open ("file1.txt", "r")
for line in filex:
line = line.strip()
line = line.split(",")
print (int (line [1]) * 5) -> here we convert text to integer and multiply
Result =
5
15
50
Adding info to a file
Remember to open the file in “a-mode”, no more r-mode
filex = open ("file1.txt", "a")
filex.write ("Grapes ,7, 8")
Result (in our txt file) =
Peech,1, 1
orange,3, 3
Apple,10, 5
Grapes ,7, 8 -> it was added to the end
BUT if I add another file, we could get this
filex = open ("file1.txt", "a")
filex.write ("Grapes ,7, 8")
filex.write ("Berries, 2, 4")
Result =
Peech,1, 1
orange,3, 3
Apple,10, 5
Grapes ,7, 8Berries ,2, 4 -> what happened? it is with no newline!
As we can see, we need to add a newline (Enter) so every new line is added as a new sentence, like this
filex = open ("file1.txt", "a")
filex.write ("Grapes ,7, 8\n")
filex.write ("Berries, 2, 4\n")
Result =
Peech,1, 1
orange,3, 3
Apple,10, 5
Grapes ,7, 8
Berries ,2, 4
Creating a new archive (w)
Bear in mind that with “W” we create a new file, if it exists, it will be overwritten.
filex = open ("file3.txt", "w")
filex.write("This is a New File created with W- mode")
filex.close()
Result (within the newly created file3.txt)
This is a New File created with W- mode
Error Handling
A complete list of errors here
We need to avoid our program to crash when finding an error, so we need to handle those erros in an elegant way. (Try-Except)
Let´s see an example that will give an error:
text1 = "Hi there"
x = int (text1)
print ("if all ok I get here")
Resust =
x = int (text1)
ValueError: invalid literal for int() with base 10: 'Hi there'
So, to avoid the rpogram to crash, we use Try-except to handle the error and keep moving
When I have some “dangerous piece of code that could crash my program, I should put it within a Try so it can handle en error
text1 = "Hi there"
try:
#-> the dangerous code here
x = int (text1)
except:
#here the code that handles the exception, in our case a print
print ("Something went wrong")
# -> the message to show when an error occurs
print ("\nif all ok I get here")
Result =
Something went wrong
if all ok I get here
“Now, if I want to see what kind of error raises the code, I must add Exception as variable_to_hold_error to except
text1 = "Hi there"
try:
x = int (text1)
except Exception as err:
#Print our custom message "Something went wrong" + the Python error message
print ("Something went wrong", err)
print ("\nif all ok I get here")
Result =
Something went wrong invalid literal for int() with base 10: 'Hi there'
if all ok I get here
Classes and Objects
To understand the difference between Classes and Objets we´ll get a real nice example; let´s suppose you are an architect building some houses.
In order to build them, you FIRST need drawings/designs regarding how the house will look, with its pipes, windows, floors, and more.
ONCE you have those plans, THEN you build the house or houses.
Then:
Plans/drawings -> CLASSES
Houses/Buildings -> OBJECTS
Classes are just concepts (abstractions) in where we initialize/create the Objects
Furthermore, to every class-plans, I can define multiple PROPERTIES (color, power consumption, and so) that could change to every object-house.
Finally, I can add some METHODS to the classes defined, so my Objects can perform or be able to do some action (maybe ring a bell in a house)
So we end with this:
Plans/drawings -> CLASSES
Houses/Buildings -> OBJECTS
Attributes/Features -> PROPERTIES
Actions -> METHODS (functions that work WITHIN the Class, and can only be used but the Objects of the class)
Ok, enough theory let´s create a class
We have a reserverd word for Classes called…you guessed: class. Then the name of our abstraction/class followed by “:”
After that we need to use a “constructor; “__init__” is a reseved method in python classes. This method is called when an object is created from a class and it allows the class to initialize the attributes of the class
“The constructor allows us to add the properties we wish for our “house” (object)
Remember that a Method is just a function that work ONLY within the class for the objects created for that class, so you guessed again, an __init__ method is defined with….def.
def __init__ (self, parameters) -> remember that we always get the “self parameter”
def __init__ (self, color)
Then, to define the properties of my Object (house), I have to use self for every one of them. Self works as a dictionaty (pair key-value) BUT using this sintaxis; self.property = value (self.color = color) -> that color variable is this one def __init__ (self, color)
So far we have:to build a house, assign it a color, we have 0 water and electricity consumption, so the code is
class house:
def __init__(self, color):
self.color = color
self.electricity = 0
self.water = 0
Then, EVERY house I build, will have 3 properties; color, electricity and water consumption.
These “self” are very important because we can use them to access these properties from within another methods
Now we could define the methods; what can I do with a house? Maybe we could paint it, so let´s define a paint method
Remember that every method within a class MUST have the self as first parameter
def paint (self, color):
self.color = color
Now we can define another methods like…using the lights,
def lights_on (self):
self.electricity += 10
the water
def use_water(self):
self.water += 7
and the doorbell
def ring_doorbell(self):
print ("rinnnnnnnnnggggggg!!!!!!")
self.electricity += 1
Our code so far would look like this
class house:
def __init__(self, color):
self.color = color
self.electricity = 0
self.water = 0
def paint (self, color):
self.color = color
def lights_on (self):
self.electricity += 10
def use_water(self):
self.water += 7
def ring_doorbell(self):
print ("rinnnnnnnnnggggggg!!!!!!")
self.electricity += 1
Bear in mind that if we run this code NOTHING happens (or shoe) because we just DEFINED the Class, we only have the plans, we did not put a single brick of our house yet
Let´s begin to create our objects
First I choose a variable to host the object, the class name and the parameters to “build the house” in this case “color” that we created before in the constructor (__init__). Remember that in our example we defined only one parameter; color (electricity and water are initialized to zero, not passed as parameter)
So we “call” the constructor method with the parameters defined in the method
my_house = house ("red")
-> we call the constructor method to build the house with a red color
So this piece of code will build a RED house with ZERO water and electricity consumption
class house:
def __init__(self, color):
self.color = color
self.electricity = 0
self.water = 0
Now that the object is created, how do I access it? How can I see it?
It is as easy as invoking the variable name where we host the object DOT the property name
print (my_house.color)
print (my_house.electricity)
print (my_house.water)
Result =
red
0
0
Now I can start calling the METHODS we created for our Object, so we´ll try with the doorbell
my_house.ring_doorbell()
print (my_house.electricity)
Result =
rinnnnnnnnnggggggg!!!!!!
1 -> note that before we had 0 consumption, but now we have 1 because we called the method my_house.ring_doorbell()
and it performed the action +=1
Now I can paint again my house calling the paint methos BUT passing a different parameter-color
my_house.paint ("green")
print (my_house.color)
Result = green
The full code so far with comments so it is easy to understand:
class house:
#I create the class "house"
def __init__(self, color): #I invoke the constructor with only one parameter; color
self.color = color #property color
self.electricity = 0 #property electricity
self.water = 0 #property water
#so we have or class-plans created with the properties color, electricity and water
#Now we start creating methods
def paint (self, color):#method paint
self.color = color
def lights_on (self):#method lights_on taht will add 10 to the 0 initial state
self.electricity += 10
def use_water(self):#method use_water taht will add 7 to the 0 initial state
self.water += 7
def ring_doorbell(self):#method ring_doorbell that will print a mesaage and add 1 to the electricity state
print ("rinnnnnnnnnggggggg!!!!!!")
self.electricity += 1
#so fr we have only created the plans, all abstraction
#now let´s create the object house
#I create a variable (my_house) who will host the object house with a
#color parameter (red)
my_house = house ("red")
print (my_house.color)#shows the house color
print (my_house.electricity)#show electricity use
print (my_house.water)#shows water use
my_house.ring_doorbell()#we call the methos ring_doorbell
print (my_house.electricity)#we show the electricity use, notice now is 1 and not 0
my_house.paint ("green")#we call the paint methos and our house change from red to green
print (my_house.color)
Result =
red
0
0
rinnnnnnnnnggggggg!!!!!!
1
green
Brief: the Class is the abstraction, so I began to build it when I call the class and pass it parameters so the constructor method can build the object.
But, what if I want to use the rpevious class-plans to build another thing, not a house but a mansion maybe?
Of course, I could write the class-plans from scratch, but it is wiser to get the useful things from our class and use them, this is called inheritance -> is the mechanism of deriving new classes from existing ones
The new (child) class will have the same properties and method than the father class
And how do we get this done? Just by creatinbg a new class and, between brackets, the name of the father -class
class mansion (house):
So our mansion class will have the same properties and methods as “house”, but the reason to do this is to change some methods as we need them to perfom another actions.
Let´s say that I want to change poweer consumption , so my mansion will spend more than 10 of electricity
class mansion (house):
def lights_on (self):
self.electricity += 38
Note that we did not use a constructor (__init__) because it is inherited from the father (house)
We can also modify the other methods too
class mansion (house):
def lights_on (self):
self.electricity += 38
def use_water (self):
self.use_water += 19
def ring_doorble (self):
print ("Ding-Dong!!")
self.ring_doorbel += 3
So we changed 3 methods,
Now to create the objet, let´s assign it to a variable as we did before, the name of the class and between brackets the parameter to the constructor (inherited from the father)
my_mansion = mansion ("white")
So the constructor “def init(self, color):” will use the white color to build the mansion.
my_mansion = mansion ("white")
print (my_mansion.color)
Result = white
Now let´s call the other methods to check that we are using the inherited methods defined within mansion class
class mansion (house):
def lights_on (self):
self.electricity += 38
def use_water (self):
self.use_water += 19
def ring_doorbell (self):
print ("Ding-Dong!!")
self.electricity += 3
my_mansion = mansion ("white")
print (my_mansion.color)#shows mansion color
print (my_mansion.electricity)#shows mansion electricity use
print (my_mansion.water)#shows mansion water use
my_mansion.ring_doorbell()#we call the methos ring_doorbell
print (my_mansion.electricity)
my_mansion.paint ("gold")#we call the paint method FROM the parent
print (my_mansion.color)
Result =
red
0
0
rinnnnnnnnnggggggg!!!!!!
1
green
white
0
0
Ding-Dong!!
3
gold
Intro to Web Scrapping
Web scraping, is the process of retrieving or “scraping” data from a website. automatically, not manually. Web scraping uses intelligent automation to retrieve millions or even billions of data points from the internet’s websites..
If there is no API to download data from a site, I can use web scrapping
How does a website work?
To perform web scrapping, we need to know exactly how a website works, so take a look at this awesome intro to HTML here
We need to understand some TAGS and its structure, meaning, who is the parent tag and its children.
Sol et´s say we have <div> </div> , so this parent tag will have children…whre? Just inside them, so ANYTHING within <div> </div> will be children of that DIV tag.
<body>
<div>
<p>
Mi message here
</p>
</div>
</body>

Now let´s take a look at this HTML
<body>
<div>
<p>
Mi mesaage here
</p>
</div>
<span>
I am here
</span>
</body>

Why are they called siblings? Because the TAGS <div> and <span> are at the same level, they are childs from <body>, but siblings to each other
In order to make easy to identify a TAG, we can put them something called ATTRIBUTES
These attributes can be Class or ID and are formed by a NAME and a VALUE
<div class="main container">
<p>
Mi mesaage here
</p>
</div>
Now we can easily identify this <div> tag because it has a class named main container
To get a deep insight regarding HTML classes and ID attributes, please check this class tutorial and this ID one.
Client-Server Architecture
Take a look at this info to get some intro to it
URLs
When we type a site URL, like https://www.google.com/ it is an easy-to-read-URL, but if we do a search in Google we´ll get something like this

Why all that stuff? Because thr URL can be used to pass info regarding, in this example, our search.
Let´s analyze this
https -> protocol
www.google.com -> domain
/search -> Endpoint (identifies the action the server will perform, this time a SEARCH. We can concatenate several endpoints like search/users)
Then we have the parameters; this starts with “?”, so everything AFTER a ? are the parameters that will use the server to answer our request. Keep in mind that the parameters are a pair name-value separated by the “=” sign
We can have several parameters split by the “&” symbol
So in the example the variable “q”= web scrapping, sourceid = chrome and ie = UTF8.
All this info is received by the server and used to serveour request
Types of Web Scrapping and tools used to perform them
1 – Static scrapping (one-page) : when ALL info is in just one page and it does not load dynamic info.
Tools used: requests (to “ask” for data), Beautiful Soup to parse the XML and HTML we get, and Scrapy that gets done the two functions (request and parse)
: 2 – Static scrapping (several pages, same Domain) also called Horizontal scrolling (pagination )and Vertical scrolling. (product details).
Tools used: Scrapy
3 – Dynamic web scrapping: we´ll use some automation to fill data, to scroll and to wait for the page to load contents before scrapping what we need.
Tools used: Selenium
4 – APIs web scrapping:
Steps to web scrapping
1 – Define a Root or Seed URL, the main one from where to START the data extraction, maybe not the one to extract data, but the one from where we´ll start “travelling” to find the info
2 – Make a REQUEST to this URL
3 – Get the response from the previous Request (it will be HTML format)
4 – Parse the info to obtain what I am searching
5 – Repeat from step 2 with other URL within the same Domain.(may be obtained from the HTML response)
XPATH
To obtain the required info from the HTML response, we´ll need XPATH
XPATH is a language that allows us to build expressions to extract info from XML or HTML data. I can search and extract exactly what I need from all the giberish we´ll get from our requests. We can search within the DOM elements in a number of ways.
Take a look at this awesome tool to learn how to use XPATH
Now, we must understand how XML works, it is made of a structure of LEVELS, being these levels the nodes (HTML tags) and these nodes have sub-levels, or nested-levels called “child nodes”
Take a look at this piece of code; <body> is the ROOT level, and the Childs are: <h1>, <h1> , <div>, and <div>, but the 1st <div> tag has another child -> <p> and the 2nd <div> another child <span>
<body>
<h1>Main title</h1>
<h1>Another main title</h1>
<div class="main container">
<p>
Mi mesaage here
</p>
</div>
<span>
I am here
</span>
</body>
Now we can define our search axis to start a search, these axis are some parameters to filter the tags we are looking for.
If I use // (double slash) it will search within ALL levels of the document

If I do a single slash (/) it will only search within the root of the document

Note if found nothing, because <p> is a child of a child, it is not the root. This document has only one tag as root, and it is <body>
Ok, after defining the search prefix (//, / or ./) we must add the node we are searching, this is called a “step”. I can also define attributes to narrow the search even more.
This is done by adding [@ =] after the search prefix. Let´s say i want to find the <h1> tag with id title

Here an awesome intro to Xpath
TIP
We can run a “live” xpath request by opening web browser dev tools (usually F12 or right-click -> inspect)
, then go to “console” tab and run this code
$x("path expression")
Let´s see an example by requesting all <div> from root (//)
$x("//div")

Web Scrapping
Remember that in order to get data from a website we need TWO separate procedures;
1 – REQUEST the page/server
2 – PARSE the data we received
We´ll use some Python libraries to do this.
To extract info from one-static-page we´ll use 4 different libraries:
Requests -> to obtain the HTML
LXML and beautifulsoup4 to parse the received info
Scrapy to perform the two operations; request and parse
to install a library just open a CMD -> command prompt in Windows or a terminal if Linux and run
pip install library-name
or pip3 install library-name
or sudo pip install library-name (Linux)
or pip install library-name –user (windows)

We´ll also install (for dynamic sites)
Selenium
Pillow (to extract images)
Pymongo (to store data in DB)
In case you need Twisted to make scrapy work with windows, use this link
Scrapping Wikipedia
Goal: Extract the names that Wikipedia shows in its main page
Tools:
- Requests to get the HTML from the server and
- LXML to parse the tree and to get the desired info
Just to refresh, we need TWO steps -> requests the data and parse it to get the exact info.
Bear in mind that when I do a request, it also brings the headers. One of the most useful is “user-agent” that returns the browser from which the request is being called and the operating system. If I DON´T define this user-agent, by default will be ROBOT, so our attempt may be seen as an attack, an automatic web-scrapping and it will be blocked.
So we need to overwrite that default “user-agent” variable.
To do this, BEFORE setting a request I must create an object to host the new values
new_header = {
"user-agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/80.0.3987.149 Safari/537.36"
}
Now we can print the result with the “text” property
So far we have this working code
"""
Goal: Extract the names that Wikipedia shows in its main page
Tools:
Requests to get the HTML from the server and
LXML to parse the tree and to get the desired info
"""
import requests
"""
change the user-agent to avoid being blocked
"""
new_header = {
"user-agent" : "Mozilla/5.0 Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko Chrome/80.0.3987.149 Safari/537.36"
}
"""
define my seed URL, in this case the wikipedia URL
"""
seed_url = "https://www.wikipedia.org/"
"""
now I make the request
"""
request_result = requests.get(seed_url, headers=new_header)
"""
now we can print the request_result
-> [:200] cuts the text to only 200 characters
"""
print (request_result.text[:200])
with this result (cropped to 200 characters)
<!DOCTYPE html>
<html lang="mul" class="no-js">
<head>
<meta charset="utf-8">
<title>Wikipedia</title>
<meta name="description" content="Wikipedia is a free online encyclopedia, created and edited by
Now it´s time to use LXML to parse this data into more useful one.
Let´s import lxml -> from lxml import html
Now we´ll create a variable to call the parser
parser = html.fromstring(request_result)
now with this parser we have several useful methods to search within the HTML tree. BUT to extract data we need to check the HTML structure to find out
Ok, let´s say we want to extract the ENGLISH text from Wikipedia main page

So we right-click on the element we want to check and then we select “inspect”

And we´ll get this info

This “brings” the HTML tree so we can easily find the element we want to parse
As we can see, the text is within a <a> tag with an ID “js-link-box-en”. Remember that an ID is UNIQUE so we can reach this text within this tag…

Back to our script remember that now that we parsed the text, now we have many methods to use, so let´s try get_element_by_id
parser.get_element_by_id(“js-link-box-en”) #it receives as parameter the element ID we want to show
Now we assign this to a variable
Ingles = parser.get_element_by_id(“js-link-box-en”) #it receives as parameter the element ID we want to show
And now we print it
Print (ingles)
so we have this last piece of code
parser = html.fromstring(request_result.text)
english = parser.get_element_by_id("js-link-box-en") #it receives as parameter the element ID we want to show
print (english)
that brings…
<Element a at 0x31ecae0>
What is that?????? no worries, it is just a CLASS, so we need to call the content of it like this
print (english.text_content())
and now it works
English
6 203 000+ articles
Ok, we made this using XML, but we can also use XPath, remember? Let´s see how to do it
parser.xpath("expression")
and…what is that expression that will lead me to the element?
Back to the inspect page we see that we have an <a> element with an ID and the text is within a chid tag (<strong>)

So, the Xpath expression would be
"//a[@id='js-link-box-en']/strong/text()
And the piece of code to call the element …
english = parser.xpath("//a[@id='js-link-box-en']/strong/text()")
print (english)
and it works, returning
['English']
Ok, now let´s focus on our goal; retrieve ALL languages from home page, so we need to create a XPath expression to do that
We need to find a pattern, something that wraps all languages. Remember that an ID is unique, but a CLASS is for groups, meaning, maybe we could fing a Class that contain our languages.

As we can see, it all happens within <div> tags and every language has a CLASS (class=”central-featured-lang) finishing with lang1, lang2….lang n. So when calling our Xpath expression we must use “contains”
And within that <div> tag they also have <a> and <strong> tags
languages = parser.xpath("//div[contains(@class, 'central-featured-lang')]//strong/text()")
print (languages)
and the result
['English', 'Español', 'æ\x97¥æ\x9c¬èª\x9e', 'Deutsch', 'Ð\xa0Ñ\x83Ñ\x81Ñ\x81кий', 'Français', 'Italiano', 'ä¸\xadæ\x96\x87', 'Português', 'Polski']
It works! And we receive the result as a List, but we can easily iterate it
for language in languages:
print (language)
result
English
Español
æ¥æ¬èª
Deutsch
Ð ÑÑÑкий
Français
Italiano
䏿
Português
Polski
Well, now let´s try to do it with another XML way -> find_class
languages = parser.find_class('central-featured-lang')
for language in languages:
print(language.text_content())
result
English
6 203 000+ articles
Español
1 645 000+ artÃculos
æ¥æ¬èª
1 242 000+ è¨äº
Deutsch
2 508 000+ Artikel
Ð ÑÑÑкий
1 681 000+ ÑÑаÑей
Français
2 275 000+ articles
Italiano
1 656 000+ voci
䏿
1 161 000+ æ¢ç®
Português
1 048 000+ artigos
Polski
1 442 000+ haseÅ
Note:
Remember that when working with CLASSES we have this topic to watch out
class=”central-featured-lang lang1″
the space within a class indicates that there is ANOTHER class, so in the example, we have TWO classes (this allows us to style better)
class=”central-featured-lang lang1″
and
lang1
That´s why for XPath the class is ALL the lenght and for XML we must split the first class
Stack Overflow scrapping using Beautiful Soup
As the title suggest, we use no more XML but a nice tool; Beautiful Soup. It works sort of similar because it allows us to apply functions to look by ID, classes, and so.
Goal: extract Title and description of published questions within stack overflow site main page
Tools to use: Beautiful Soup
- First, we import the requests library as usual
- Then the header stuff to avoid being banned
- Then we add the seed URL https://stackoverflow.com/questions
- Now we make the request (get) to the URL to get the full tree
- We can print the result (it will be 200 if successful)
Nothing new so far, we have this code
"""
Goal: extract Title and description of published
questions within stack overflow site main page
Tools to use: Beautiful Soup
"""
import requests
#change header to avoid being detected as a bot
new_header = {
"user-agent" : "Mozilla/5.0 Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko Chrome/80.0.3987.149 Safari/537.36"
}
#define seed URL
url = "https://stackoverflow.com/questions"
#now we make the request (get) to the url to get the full tree
#request to server
result = requests.get(url, headers = new_header)
#show is site is reachable
print (result)
Result
<Response [200]>
Ok, time to import beautifulsoup (install it if not yet installed)
pip install beautifulsoup4 –user
from bs4 import BeautifulSoup
Remember that BeautifulSoup is another parser with a set of tools for retrieving/filtering info
So beautifulsoup receives the result as parameter (the text, HTML tree) and we assign it to a variable as usual
soup_info = BeautifulSoup(result.text)
Now inspect the page to look for some clues to scrap

We find a main <div> with an ID = “questions”
And within many <div> with a class = “question-summary”

So, the path is clear enough
We can use FIND to retrieve maybe the ID?
Let´s check
main_questions_container = soup_info.find(id="questions")
Now, with this main element I can get the child ones, so I am not gonna search within soup_info, because here we have ALL the trre info, BUT within the main_questions_container, because here I only have the child elements (question-summary)
Note;
Find retrieves only ONE result
Find_all brings everything, so now we´ll use this last
main_questions_container.find_all(class_="question-summary")
Note: calss is a reserved word, so python uses class_
I can also double check that I am searching within a tag (in this case <div>) by adding it at the very beginning like this
main_questions_container.find_all('div', class_="question-summary")
now we just assign it to a variable
questions_list = main_questions_container.find_all('div', class_="question-summary")
It is a LIST so I can iterate it. But, what to iterate? Maybe the title that is within a <h3> tag

for questions in questions_list:
question_text = questions.find('h3').text
print(question_text)
Let´s see the result
How to create a serializer for decimal in flutter
reference to submit is ambiguous: <T>submit(Callable<T>) in ExecutorService and method submit(Runnable) in ExecutorService match
How to grab a case in switch if its in another class using random
Android paging2 library : Network(PageKeyedDataSource) + Database idiomatic/expected way to implement
Why is my CAST not casting to float with 2 decimal places?
Question regarding slice assignment, deep copy and shallow copy in Python
How to access objects in S3 bucket, without making the object's folder public
Google Maps API - Const must be initialized
CS50_ Filter more: Blur
click on map doesn't get triggered when the click is on a polygon
How to generate UK postcode using Faker or by own function in Python?
Resize image to specific filesize python
C#: " 'The given path's format is not supported.'
why my code is in continuous loop for second part even if i have given correct user input what is the difference between (not in) and (!=)
Kompići C++ COCI 2011/2012 2nd round
And now to get the description we inspect and see that there is a class=excerpt

question_description = questions.find(class_='excerpt').text
print(question_description)
Result
Power BI - using Groups vs creating a new column using switch()
If I have a continuous numeric field and want to group it in Power BI, I can
Create a new column using SWITCH() to perform the grouping
numeric_value_grouped = SWITCH(TRUE(),
MyTable[...
Is there a way to run simple HTML code in Visual Studio?
Remember this is Visual Studio, the IDE, not Visual Studio Code, the text editor. Anyways, if I wanted to run some simple HTML code like <p>Hello world!</p>, how would I do it? This code ...
How do I sort an ArrayList of class objects?
I'm having trouble figuring out how to sort an ArrayList of objects. The objects are of a class CityTemp that implements the interface Comparable, and I have defined a compareTo() method. It works for ...
Ok, some weird spaces, so let´s fix them
question_description = question_description.replace('\n', '').replace('\r', '').strip()
replace to change newlines (\n and \r) with space (‘’) and strip to remove TABS or spaces before and after line so we get this result
How can I fix CORS error while trying to access an Angular website hosted on github pages with custom domain?
As mentioned on the title, I'm hosting my angular website on gh-pages and pointing a custom domain to it. The website was loading before I added in the custom domain. Here's an example of the error I ...
The relationship between DataTables in DataSet: can we check if the parent is so and so?
I loaded a complicated XML file with lots of data where are complex level of nested elements. The DataSet.ReadXml() load all that nicely and I can loop through all the nodes.Essentially each node is ...
Module status keeps running after it has been disabled
I rebooted my linux machine and started noticing these odd requests in my Apache access log.::1 - - [16/Dec/2020:21:28:54 -0500] "GET /server-status?auto HTTP/1.1" 404 147 "-" &...
Scrapy
Full doc
https://docs.scrapy.org/en/0.14/intro/overview.html
Scrappy comes in a set of Classes, we must import a set of functions, modules an classes.
Scrapy is a full framework.
First class:
class Date(Item):
text = Field()
So, every element to search has its own properties, let´s say a product, has name, price, reviews and so
These are our Fields-> the info I want to extract from the product, meaning, I decide what to extract so these fields can be a lot of or just one.
Second Class:
The one that performs the extraction, our “spider”
First we define classes variables (we name our spider with any name and our seed URL)
Here we can define some rules to guide the spider where to look for the data.
Function parse, where the magic happens
def parse(self, response):
parse recibes a parameter; response, where the HTML tree will be stored (I don´t need BeautifulSoup or XML to parse the tree, Scrapy does it by itself)
Scrapy calls its parsers “selectors”. I can search using XPath, ID, Classes, Lists and so.
To start loading data we use ItemLoader, which receives the class object I created and the selector (the HTML tree where I´ll search for the elements I want to) in this case the Text Field that will be filled with the XPath expression
item.add_xpath('text', './/h3/a/text()')
Ok, the code so far, no worries this is just an intro we´ll get deeper in next lectures
"""
Scrapy
"""
#import modules
from scrapy.item import Field, Item
from scrapy.spiders import Spider
from scrapy.selector import Selector
from scrapy.loder import ItemLoader
#class type of data to extract, article, image, name, user, product
class Date(Item):
text = Field()
class SpiderData(Spider):
name = "MySpider"
start_urls = ['https://site-to-crawl']
#redirection rules here
def parse(self, response):
sel = Selector(response)
page_title = sel.xpath('//h1/text()').get()
list = sel.xpath('//div[@id="datos"]')
for elements in list:
item = ItemLoader(Dato(), element)
item.add_xpath('text', './/h3/a/text()')
Now the practice with Scrapy
Ok, remember that the first step is to define our classes with the items to extract, it is a class abstraction. So our items to extract (again stackoverflow) are the questions , and the properties are the title and description from main page,
Now we have defined our items and properties, let´s start.
Let´s define the abstraction of what we want to extract.
We create a Class with any related name, the important is that this class inherit from ITEM. (from scrapy.item import item)
And then within the class I just define the properties I want to bring (question and description)
class Question (Item):
question = Field()
description = Fiel()
And that´s our Class definition
Now we need to define the CORE class for Scrapy
This is the one to perform our requests, parse, and more BUT it must inherit from SPIDER class
Note, we use Spider because we want to extract from ONE page, if we need to extract from multiple pages we´ll need another (we´ll see it later)
class MainCoreScrapy(Spider):
Now we can define several things within this main function:
- Spider name
- Header to avoid being detected as a bot (Scrapy defines it within “custom_setting” property) this object uses key-value pair with USER_AGENT in CAPS and the value the one we already know
- URL (seed/starting URL)
class MainCoreScrapy(Spider):
name = "MainSpider"
custom_settings = {
'USER_AGENT' : ['Mozilla/5.0 Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko Chrome/80.0.3987.149 Safari/537.36']
}
start_urls = ['https://stackoverflow.com/questions']
Done (because is just one-page-spider)
Now we need to define the function were the magic happens (parse function)
def parse(self, response):
We need nothing to add here because it is just one URL and Scrapy does the magic in auto mode and returns the response with the HTML tree.
So I receive the HTML tree, but where do we parse it?
Well, within the response parameter def parse(self, response):
So, here we have the first step done (request)
Now we need to parse it. We won´t use BeautifulSoup or XML, we´ll use the Scrapy’s way; the selector class
sel = Selector(response)
now we have this sel variable with the selector response that we´ll use to ask the page for the useful info
I can use XPATH or CSS to make my wat though the tree
SO this is the same example that we used with BeautifulSoup, so we know that we need the question-summary within <div> tag

So we should build an expression to retrieve those <div> within a List to iterate and get the data
questions_list = sel.xpath('//div[@id="questions"]//div[@class="question-summary"]')
Now we can iterate this List
for question in questions_list:
the variable questions will have every element in every iteration until finishing

so, now we have the element as shown in the image and we are iterating all of them BUT we need to extract every item (the questions) from them.
Remember that when we started the program we imported some tools, one of them was
from scrapy.loader import ItemLoader, this class loads ITEMS,
so ItemLoader is a class that receives as the first parameter, an instance from my class that contains an abstraction of what I need to extract, (questions) and the second parameter will be the HTML element (the selector) with the info that we´ll use to fill these fields
class Question (item):
question = Field()
description = Field()
so we have this so far
for question in questions_list:
item = ItemLoader(Question()question)
Now I have to fill the fields question and description, maybe we can try this (several ways to do it)
item.add_xpath('question', './/h3/a/text()')
so the ‘question’ will be filled with the xpath expresion coming from .//h3/a/text() that is contained within the HMTL element that I called in here
item = ItemLoader(Question()question)
Now we need to fill the ‘description’, but now our Xpath expression should reach ‘excerpt’

NOTE: when I use a dor and // is because the search is RELATIVE to an element, in this case, ‘question (item = ItemLoader(Question()question))’
item.add_xpath('description', './/div[@class="excerpt"]/text()')
Well, now I need to apply a special return to close this
yield item.load_item()
this will send to an archive the info loaded in items
I can not only add via XPATH, but I can also do it by “value” to fill any property; just let´s add another field just to check how easy is and how to do it
item.add_value('id', 1)
What is this? we just ADDED a VALUE (1) to the ID, instead of using XPATH we just added it like this
I only need to add a FIELD value to our main abstraction class
id = Field()
that piece of code goes inside our class
class Question (Item):
id = Field()
question = Field()
description = Field()
we have this working code so far
"""
Scrapy
"""
#let´s install required modules and functions
from scrapy.item import Field
from scrapy.item import Item
from scrapy.spiders import Spider
from scrapy.selector import Selector
from scrapy.loader import ItemLoader
#ABSTRACTION OF DATA TO EXTRACT
#define the data I have to fill in
#that will go to the results file
class Question (Item):
id = Field()
question = Field()
description = Field()
#CLASS CORE - MainCoreScrapy
class MainCoreScrapy(Spider):
name = "MainSpider"
# configure the USER AGENT in Scrapy
custom_settings = {
'USER_AGENT' : ['Mozilla/5.0 Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko Chrome/80.0.3987.149 Safari/537.36']
}
# URL (seed URL)
start_urls = ['https://stackoverflow.com/questions']
# This function will be filled when we make a request to the seed URL
def parse(self, response):
# Selectors: Scrapy´s Class to extract data
sel = Selector(response)
questions_list = sel.xpath('//div[@id="questions"]//div[@class="question-summary"]')
for question in questions_list:# instantiate my ITEM with the selector where there are the data to fill with
# Fill my ITEM+s properties with XPATH expressions to search within the "question" selector
item = ItemLoader(Question(), question)
item.add_xpath('question', './/h3/a/text()')
item.add_xpath('description', './/div[@class="excerpt"]/text()')
item.add_value('id', 1)
#Yield info to write data in file
yield item.load_item()
#HOW TO RUN IN TERMINAL:
# scrapy runspider myfilename.py -o filename.csv -t csv
# scrapy runspider myfilename.py -o results.csv -t csv
Ok, if we just run this code from within our IDE or IDLE, it won´t work…why?
Because we need to run it from the terminal that will run the scrapy spider

BUT we need to send it to an archive (-o filename -t extension)

if we run that we´ll get a file -> scrapy_stackoverflow.csv (it can be json too, I just took CSV) after some code executed within the terminal. Look how scrapy returns the seed URL and the parse results within the terminal

Now if we open our csv file with a notepad or some text editor we´ll have something like this
description,id,question
“
I tried many ways to create service process with session1, but I couldn’t find a suitable way to create a service process.
Can someone help me?thanks.
“,[1],CreateService with Session1
“
I wonder if I can somehow pass TInputQueryWizardPage, TInputOptionWizardPage, TInputDirWizardPage, TInputFileWizardPage, TOutputMsgWizardPage, TOutputMsgMemoWizardPage, TOutputProgressWizardPage pages …
“,[1],Can I somehow pass TInput/TOutput pages to a function as one parameter in Inno Setup?
“
Envs: Ubuntu 18.04, Miniconda3, python=3.7(GCC=7.3.0), GCC -v (7.4.0)
The error occurs when I run the following command:
scons build/X86/gem5.opt -j8
The error is as follow:
[ LINK] -> X86/…
“,[1],LTO compilation question when linking X86/marshal file
“
I have image of skin colour with repetitive pattern (Horizontal White Lines).
My Question is how to denoise the image effectively using FFT without affecting the quality of the image much, somebody …
“,[1],How to remove repititve pattern from an image using FFT
We have the result, very obscured because we have some tabs, spaces , it is not “clean” yet (we´ll fix that in later chapters)
But, take a look at the [1] -> that is the ID we added id = Field() and item.add_value(‘id’, 1)
If we comment those two lines and run again the code, we´ll have no ID [1].
I have a server that accept ssl connections on port 443. I am using the boost libraries for the server implementation. Below is the code snippet:
{
// Open the acceptor with the option to reuse the …
“,Recv-Q has data pending as per netstat command and it never gets cleared
“
So I have an object which is moving in a circular path and enemy in the centre of this circle. I’m trying to find out how to calculate shotingDirection for bullets. Transform.position isn’t ennough …
“,”How to shoot an object, which is moving in a circle”
“
df = pd.read_csv(CITY_DATA[city])
def user_stats(df,city):
""""""Displays statistics of users.""""""
print('\nCalculating User Stats...\n')
start_time = time....
",How can I display five rows of data based on user in Python?
“
I want scope viewmodel by Fragment but activity,
interface MarsRepository {
suspend fun getProperties(): List
}
@Module
@InstallIn(FragmentComponent::class)
class MarsRepositoryModule …
“,How install view model in fragmentComponent with Hilt injection?
“
im trying to set the placeholder for the v-select
<v-select
item-value=””id””
item-text=””name””
:placeholder=””holderValue””
v-model=””selectedDM””
label=””…
If we also comment the lines that brings the “description”, we´ll get a nice list of headlines with no spaces at all
class Question (Item):
#id = Field()
question = Field()
#description = Field()
.....
for question in questions_list:# instantiate my ITEM with the selector where there are the data to fill with
# Fill my ITEM+s properties with XPATH expressions to search within the "question" selector
item = ItemLoader(Question(), question)
item.add_xpath('question', './/h3/a/text()')
#item.add_xpath('description', './/div[@class="excerpt"]/text()')
#item.add_value('id', 1)
The result is much better
question
how to avoid invalid characters ? JSON
How do I make my require statement wait for it to finish before continueing?
GetEntityMetadata returns 0 attributes
Centralizing a TField’s Size value
GLSL Fit Fragment Shader Mask Into Vertex Dimensions
How can I create rounded corners button in WPF?
Return list of strings for query buffer with StatisticDefinition
How can I quickly groupby a large sparse dataframe?
How to remove space in the input box
Need a help REGEX php preg_match [duplicate]
How to draw a .obj file in pyqtgraph?
I have problem with converting MATLAB code to Python
Get the OpenSSL::PKey::EC key size in Ruby
how should the advanced contact page mysql scheme be?
Why is ZooKeeper LeaderElection Agent not being called by Spark Master?
Finally we can get and ID with an iterable and added auto number isntead of set a fixed value, this is, instaed of havinh [1] we can just add a counter within the code like this (bold)
i = 0
for question in questions_list:# instantiate my ITEM with the selector where there are the data to fill with
# Fill my ITEM+s properties with XPATH expressions to search within the "question" selector
item = ItemLoader(Question(), question)
item.add_xpath('question', './/h3/a/text()')
#item.add_xpath('description', './/div[@class="excerpt"]/text()')
item.add_value('id', i)
i +=1
id,question
[0],Failure to clone risc-v tools (failure with newlib-cygwin.git)
[1],How would you represent musical notes in JavaScript?
[2],Do I need to call DeleteObject() on font retrieved from SystemParametersInfo()?
[3],Automaticaly positioning rectangles estheticaly on a canvas with D3.js
[4],Paging in virtual memory
[5],How to run program that connects to another machine in C++?
[6],Rsnapshot filepermission problem with network hdd over raspberry pi
[7],Adding reaction if message contains certain content
[8],”Batch – Findstr with error level condition, quotes?”
[9],Implementing microprofile health checks with EJB application
[10],“Add [name] to fillable property to allow mass assignment on [Illuminate\Foundation\Auth\User].”
[11],Vscode platform specific shortcuts
[12],Problem filling an array of objects. The result is always null
[13],how to get the name instead of reference field in mongoengine and flask -admin
[14],extract email attachment from AWS SES mail in S3 with Python on AWS Lambda
IMPORTANT: if when running the code you get a blank file (0 bytes) is because you made a mistake within any of the XPATH expressions
How to Program Your Own Password Generator With Python
Let´s find out how easy is to create a strong password generator with Python in a few lines
We are gonna take the original code from Zsecurity and we´ll optimize it a bit to handle user input.
I highly recommend Zsecurity if you want to deep into Ethical hacking, Security, Python programming and more.
Ok, back to the code, is something like this
"""
strong passwoed generator
"""
#we import RANDINT to generate a random number between 2 given values
from random import randint
#we add some variables that will store our "dictionary"
#from which we´ll create the password
lowerCase = "abcdefghijklmnopqrstuvwxyz"
upperCase = lowerCase.upper()
numbers = "1234567890"
special = "!#$%&/()=?¡@[]_<>,."
#we add the variables to get a very strong password
password_creation = lowerCase + upperCase + numbers + special
#we ask the user to enter the password lenght
password_lenght = int(input("How many characters do you wish the password? Minimum 8 maximum 1024: "))
password = ""
lenght = 0
if password_lenght < 8 or password_lenght > 1024:
print("Minimun 8 characters - maximum 1024, try again")
#here the magic happens creating the final password
else:
while lenght < password_lenght:#while password lenght do not reach the value entered by user
#the password is created selecting a random (randint) value from password_creation
password = password + password_creation[randint(0, len(password_creation) -1)]
lenght +=1
print("Generated password: ", password)#we show the result
If we run it we´ll be asked to enter the lenght we want the password to have (I entered 24)
How many characters do you wish the password? Minimum 8 maximum 1024: 24
Generated password: %5VcsbRU!B.k949h#w/kUZXG
Ok, it seems a secure password, maybe we can check how secure is

or here My1login
or you can choose anyone, it seems that it works.
Now we have a problem here
Just for having some boundaries, I asked the user to enter a numer between 8 and 1024 (no need to limit, but a password less than 8 cgharacters is no password and more than 1024 is maybe…a bit long, you can set your own limits editing the code)
But the problem is that there is NO user-input check…yet
Let´s run the code agaian an enter a value OUT of the input request, let´s say five characters
>>> %Run z_strong_password_gen.py
How many characters do you wish the password? Minimum 8 maximum 1024: 5
Minimun 8 characters - maximum 1024, try again
What happened here? Well, the program stopped because we just coded a message telling that the input was not valid, BUT we did nothing after that. So let´s fix it
There are SEVERAL ways to do it, from non-good practices (like “repeating yourself”) to a bit more advanced like creating a function to check user input.
Let´s pick up a medium-level one to make things easy
Goal: Request User Input Until Valid
What are we gonna use: try/except statements and If/break
If you don´t know what is this, you can read a nice intro to try-except (w3school)
Well, here the code again with some comments to understand it better
"""
strong passwoed generator
"""
#we import RANDINT to generate a random number between 2 given values
from random import randint
#we add some variables that will store our "dictionary"
#from which we´ll create the password
lowerCase = "abcdefghijklmnopqrstuvwxyz"
upperCase = lowerCase.upper()
numbers = "1234567890"
special = "!#$%&/()=?¡@[]_<>,."
#we add the variables to get a very strong password
password_creation = lowerCase + upperCase + numbers + special
password = ""
lenght = 0
while True:
try:
password_lenght = int(input("How many characters do you wish the password? Minimum 8 maximum 100: "))
if password_lenght < 8 or password_lenght > 100:
print("Minimun 8 characters - maximum 100, try again")
except ValueError:
print ("Please enter a valid NUMBER...")
continue
else:
while lenght < password_lenght:#while password lenght do not reach the value entered by user
password = password + password_creation[randint(0, len(password_creation) -1)]
lenght +=1
"""
the password is created selecting a random (randint) value
from password_creation and iterating a number of times through
the password_creation. This times is the number entered by the user
so every time the loop runs, it picks a random character from
password_creation and add it to the string it is creating until it reachs the value entered by the user
"""
print("Generated password: ", password)#we show the result
Now let´s add some input validation to avoid “bad” input like non-digit values or numbers out of range
from random import randint
lowerCase = "abcdefghijklmnopqrstuvwxyz"
upperCase = lowerCase.upper()
numbers = "1234567890"
special = "!#$%&/()=?¡@[]_<>,."
password_creation = lowerCase + upperCase + numbers + special
while True:
# pulled password and length in here to reset on each loop
password = ""
length = 0
try:
password_length = int(input("How many characters do you wish the password? Minimum 8 maximum 1024: "))
if password_length < 8 or password_length > 1024:
print("Minimun 8 characters - maximum 1024, try again")
# add continue, to not try to create password, if validation fails
continue
except ValueError:
#this prevents user to enter non-numerical values
print("Please enter a valid NUMBER...")
continue
else:
while length < password_length:
password = password + password_creation[randint(0, len(password_creation) - 1)]
length += 1
print("Generated password: ", password)
that was ok, but what about optimizing the code to make it smaller and more legible?
Let´s try it
from random import randint
lowerCase = "abcdefghijklmnopqrstuvwxyz"
upperCase = lowerCase.upper()
numbers = "1234567890"
special = "!#$%&/()=?¡@[]_<>,."
mega_pass = lowerCase+ upperCase + numbers + special
lenght = 0
password_lenght = 0
password = ""
while True:
try:
password_lenght = int(input("How many characters long do you want the password? (min 8 - max 100; \n"))
except ValueError:#if user enters a non-numerical value it will ask again
print("\nNumbers please...")
if password_lenght not in range (8, 101):#if the user enters a numerical value OUT of range it asks again
print("\nEnter a digit between 8 and 100...")
else:
while lenght < password_lenght:
password = password + mega_pass[randint(0, len(mega_pass) -1)]
lenght +=1
print("\nYou choose" , password_lenght , "characters long\n")
print("Here is your PASSWORD:\n ", password)
break#it just ends the program after one password deliver
Let´s test it
…
Enter a digit between 8 and 100…
How many characters long do you want the password? (min 8 – max 100;
0 -> it must be 8 or more
Enter a digit between 8 and 100…-> so it asks again
How many characters long do you want the password? (min 8 – max 100;
5000 -> is greater than 100, so it asks again
Enter a digit between 8 and 100…
How many characters long do you want the password? (min 8 – max 100;
74747gggg -> we enter digits AND characters
Numbers please…-> it detects them and ask again for just numbers
Enter a digit between 8 and 100…
How many characters long do you want the password? (min 8 – max 100;
80 -> is between 8 and 100
You choose 80 characters long
Here is your PASSWORD:-> so it calculates our password and shows it below
Q$lK=TjJ8lCSx=9_mqkcEXzuVt[W#.p9DrB.FtAexuAXE)z]5_GdhI)G[tV!$YI_NZIz<fRe!T(Fvlnv
That was nice! we could ask the user to enter some value between a given range, NON alphanumerical BUT only digits, check the input, ask again if the user failed to enter the right characters, and calculate the strong password