﻿WEBVTT

1
00:00:01.216 --> 00:00:03.759
<v Male Narrator>Pity not Brett the robot.</v>

2
00:00:03.759 --> 00:00:06.348
While it may look like a bit of a buffoon trying

3
00:00:06.348 --> 00:00:07.857
to get the block in the hole,

4
00:00:07.857 --> 00:00:10.597
it's actually doing something fascinating.

5
00:00:10.597 --> 00:00:13.210
It's teaching itself to master a child's game like a child

6
00:00:13.210 --> 00:00:15.671
would, with trial and error.

7
00:00:15.671 --> 00:00:18.213
Emphasis on the error.

8
00:00:18.213 --> 00:00:21.255
Brett is a new breed of robot, no mere hunk of metal

9
00:00:21.255 --> 00:00:24.076
that humans programmed to follow strict commands.

10
00:00:24.076 --> 00:00:26.944
And that's a big deal, 'cause if humans expect robots to

11
00:00:26.944 --> 00:00:29.765
ever be truly useful, we'll have to let the machines

12
00:00:29.765 --> 00:00:31.611
find their own way.

13
00:00:31.611 --> 00:00:34.305
[computer music]

14
00:00:34.305 --> 00:00:36.685
Robots have a problem, they're good at working in

15
00:00:36.685 --> 00:00:38.891
structured environments like factories, but terrible

16
00:00:38.891 --> 00:00:42.385
at navigating an unpredictable world.

17
00:00:42.385 --> 00:00:45.323
It used to be that you had to tediously program a robot

18
00:00:45.323 --> 00:00:47.622
to handle different objects.

19
00:00:47.622 --> 00:00:50.083
But that's changing with robots like Brett here

20
00:00:50.083 --> 00:00:53.078
at UC Berkeley, because Brett can teach itself skills

21
00:00:53.078 --> 00:00:55.412
from scratch using artificial intelligence.

22
00:00:55.412 --> 00:00:57.827
<v ->A child could be able to learn this, but for the robot</v>

23
00:00:57.827 --> 00:00:59.882
it's actually quite difficult because it has to learn

24
00:00:59.882 --> 00:01:02.529
really fine dexterity in order to be able to actually get

25
00:01:02.529 --> 00:01:04.131
the block into the correct hole.

26
00:01:04.131 --> 00:01:06.755
And it has to be able to learn precision so that it can

27
00:01:06.755 --> 00:01:09.379
do this over and over again really reliability.

28
00:01:09.379 --> 00:01:12.850 line:15% 
And so learning these kind of skills is just really

29
00:01:12.850 --> 00:01:15.718 line:15% 
useful for a lot of different manipulation tasks.

30
00:01:15.718 --> 00:01:17.703
<v Male Narrator>When Brett starts out, it doesn't</v>

31
00:01:17.703 --> 00:01:19.619
actually know how its arm works.

32
00:01:19.619 --> 00:01:22.010
At the beginning, it just kind of flails.

33
00:01:22.010 --> 00:01:24.135
But every so often it flails closer to the hole

34
00:01:24.135 --> 00:01:26.341
it's supposed to put the peg in.

35
00:01:26.341 --> 00:01:29.383
Brett's AI tallies this as a victory.

36
00:01:29.383 --> 00:01:31.020
So attempt after attempt, the robot

37
00:01:31.020 --> 00:01:33.020
gets closer to its goal.

38
00:01:34.038 --> 00:01:36.952
This is called reinforcement learning.

39
00:01:36.952 --> 00:01:41.132
The robot is essentially failing its way toward victory.

40
00:01:41.132 --> 00:01:44.534
In this way, Brett can teach itself a new task.

41
00:01:44.534 --> 00:01:47.076
No one told it how to put the peg in the hole,

42
00:01:47.076 --> 00:01:50.246
just that it needed to somehow do so.

43
00:01:50.246 --> 00:01:52.672
You've probably heard about AI using reinforcement

44
00:01:52.672 --> 00:01:56.771
learning to teach itself to do things in a virtual space.

45
00:01:56.771 --> 00:01:59.452
Like here, through trial and error, this creepy thing

46
00:01:59.452 --> 00:02:02.610
taught itself to walk and eventually run.

47
00:02:02.610 --> 00:02:05.443
That's right, it invented running.

48
00:02:06.337 --> 00:02:08.822
That's relatively easy, since in the virtual world you

49
00:02:08.822 --> 00:02:12.270
can try and fail rapidly over and over and over.

50
00:02:12.270 --> 00:02:15.230 line:15% 
<v ->And if you think about something like reinforcement</v>

51
00:02:15.230 --> 00:02:18.098 line:15% 
learning, where you learn from trial and error,

52
00:02:18.098 --> 00:02:21.093
the challenge is that often you need a lot of trial

53
00:02:21.093 --> 00:02:23.299
and error before you get somewhere.

54
00:02:23.299 --> 00:02:26.524
And so, if you run it all on the real robot,

55
00:02:26.524 --> 00:02:29.404
it's not always that easy to do.

56
00:02:29.404 --> 00:02:32.446
<v Male Narrator>So physical robots can learn, but a</v>

57
00:02:32.446 --> 00:02:34.660
machine like Brett learns nowhere near

58
00:02:34.660 --> 00:02:36.605
as quickly as a human.

59
00:02:36.605 --> 00:02:39.055
Sure, a programmer could keep tweaking the algorithms to

60
00:02:39.055 --> 00:02:41.098
make this process more efficient.

61
00:02:41.098 --> 00:02:43.432
<v ->But what if you could let the computer itself change</v>

62
00:02:43.432 --> 00:02:44.907
its own algorithm?

63
00:02:44.907 --> 00:02:47.403
So it says, hey I'm gonna make a tweak to my algorithm,

64
00:02:47.403 --> 00:02:48.668
see what happens now.

65
00:02:48.668 --> 00:02:50.828
If you can automate that process of tweaking your

66
00:02:50.828 --> 00:02:53.196
algorithm, you can run it in parallel over many many

67
00:02:53.196 --> 00:02:56.888
machines, you could hope that maybe as a consequence

68
00:02:56.888 --> 00:02:59.640
you end up with a better algorithm than ones

69
00:02:59.640 --> 00:03:01.404
that humans can design.

70
00:03:01.404 --> 00:03:03.841
And now you might have a reinforcement learning algorithm

71
00:03:03.841 --> 00:03:06.216
that can have a robot learning to walk in a few hours

72
00:03:06.216 --> 00:03:07.757
rather than two weeks.

73
00:03:07.757 --> 00:03:09.160
Maybe even faster.

74
00:03:09.160 --> 00:03:11.238
<v Male Narrator>This is known as learning to learn.</v>

75
00:03:11.238 --> 00:03:13.978
And robots will need it to make sense of new environments.

76
00:03:13.978 --> 00:03:16.567
<v ->When a robot is deployed in the real world,</v>

77
00:03:16.567 --> 00:03:18.901
you can't just deploy it with a fixed set of skills.

78
00:03:18.901 --> 00:03:21.234
It need to acquire the ability to continue to learn

79
00:03:21.234 --> 00:03:22.906
once it's deployed.

80
00:03:22.906 --> 00:03:25.042
And without that ability it's just not going to be able

81
00:03:25.042 --> 00:03:27.829
to function in realistic environments so it needs to also

82
00:03:27.829 --> 00:03:32.229
have acquired the ability to learn as it lives.

83
00:03:32.229 --> 00:03:34.191
<v Male Narrator>So one day, Brett's descendants may</v>

84
00:03:34.191 --> 00:03:36.536
well learn as quickly as we humans do.

85
00:03:36.536 --> 00:03:40.036
And this will all look like child's play.

