-
Notifications
You must be signed in to change notification settings - Fork 1
/
readme.txt
101 lines (83 loc) · 2.46 KB
/
readme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
##################################################
#* *#
#* READ ME *#
#* *#
#* *#
#* Luigi Franco Tedesco *#
#* Jamal Hammoud *#
##################################################
All our code was developed by us from scratch.
Our programming work was divided in two, following
the two experiments prosed by the article studied:
Intrinsically Motivated Reinforcement Learning:
An Evolutionary Perspective
To simulate the experiences you should just type:
$ python Experience#.py
being # the number of the experience you want to
test.
The fist experience:
The agent interacts in a 6x6 grid markovian
environment where the agent can only walk
through single line divisions.
His goal defined by its feature is to eat the most
food possible which are situated inside boxes.
state :
: agent position
: boxes condition
: agent hungry
: food in place
_______________________________________________
| | | ||| | | |
| box1 | | ||| | | |
|_______|_______|______|||______|_______|_______|
| | | | | | |
| | | | | | |
|_______|_______|_______|_______|_______|_______|
| | | ||| | | |
| | | ||| |agent | |
|=======|=======|======|||======|_______|=======|
|=======|=======|======|||======| |=======|
| | | ||| | | |
| | | ||| | | |
|_______|_______|______|||______|_______|_______|
| | | | | | |
| | | | | | |
|_______|_______|_______|_______|_______|_______|
| | | ||| | | |
| | | ||| | | box2 |
|_______|_______|______|||______|_______|_______|
To learn from its experience, the agent uses a
e-greedy Q-learning.
The experience is completely coded and the results
presented were verified.
The result of the experiment is a txt file with
the mean history of each reward function tested
to be plot further.
The second experience:
The agent moves in a non-markovian environment
described as a 3x3 grid. The agent s goal is to
eat the greater number of worms possible.
After been eaten the worm apear in any of the two
others most-right areas avaible randomly.
state :
: agent pos
: agent hungre
: worm in place
_______________________
| | | |
| Agent | | |
| | | |
|-------|=======|=======|
| | | |
| | | |
|-------|=======|=======|
| | | |
| | | |
|-------|=======|=======|
| | | |
| | | worm |
|_______|_______|_______|
The code for this experiement is not completely
finished. Thus, It was not possible to have the
final results. However the problem is already
structured.