How To Pull Data into S3 using AWS Sagemaker

How To Pull Data into S3 using AWS Sagemaker


Hi this is J. Weathers from The Deep
Learning Team AWS Solutions Architect Professional and MCSA cloud platforms I’m really excited because today we’re entering the 2018 Data Science Bowl
we’re using AWS Sage maker to pull the data down to s3 and we’re going to run
all of our algorithms from AWS SageMaker let’s get started. First things
first we went to the Kaggle website registered for the 2018 Data Science
Bowl the goal is to find nuclei in divergent images to advance medical
discovery. This is awesome this is exciting this is groundbreaking and this
is why we’re entering the Data Science Bowl. have a Slack channel it’s here would love
for you to be a part of it and help us win this challenge we’re so excited
ok so Data Science Bowl let’s get into how to pull the data which is here on
Kaggle you got your sample submission CSV your test CSV your Train CSV train
zip and your train labels zip these are very extremely large images and I’m
going to show you a video on how to get those images from here to Amazon s3
without downloading these images to your computer ok so to hit the ground running
what I did was I went to timeline and pulled this Keras U-Net starter that was
created by Kjetil Amdal-Savik and it’s a really quick notebook you can actually
fork this notebook I believe I’ve already forked it and then you can
actually download this Note book and it gives you a good starting
point for your algorithm once that comes up you’re gonna click this download
button and pull that down to your hard drive from there you go to sage maker
notebook I’ve already done this but let me show you how to do it
I go to AWS SageMaker go to sage maker click in service
I open my sage maker instance I go click on upload and then here’s the notebook that I
downloaded I’m going to open that click upload and then I’ll have that notebook
I can rename it so I’m gonna rename it Science Bowl click upload awesome if you
click here here’s the Science Bowl and you select the kernel this one is Python
3 set kernel good to go ok since we already have this notebook
here first things first so what I did was I went to a pre-baked AWS framework
I use the image classification from Cal Tech I use the transfer learning model I
copied this which provides us the identification access to AWS the roles
and the buckets and the docker images for AWS next what we’re going to do is
actually copy this code which tells AWS where to download the packages from now
let’s go back to the cattle website and where the download is will have
highlighted right here we’re going to paste in the Kaggle link now if we go
back to the Kaggle website I’m going to click on the train download button copy
the link address go back to the notebook and paste in the train data but first I
need to add an actual download link for this particular URL so paste that in now
I’m going back to the train label CSV file click on the copy link address for
that sorry the train zip copy that link then I’m going to create another
downloads for this one so I have four packages I have to download from the
from the Kaggle website download paste that link in for the train label CSV now
I’m starting to create the bucket upload to s3 and I have to named name the
buckets accordingly and those buckets are actually going to be stored inside
of my s3 folder structure and s3 is my storage okay great typing that all in
getting it ready sample results okay so I’m going to call this samplesubmission call this bucket because this
is creating the bucket I’m gonna call this bucket the test call this bucket
train for the Caltech bucket I will name it stage this is
where stage_test.zip that’s it you have to actually name the file that it’s going
to be here I’m going to name this one stage1_train.zip this one I’m
going to name stage1_train_labels.csv.zip
okay so I have four columns so I need to add another upload the s3 here I’m going to make this yeah so I made
a mistake so I should actually put this one underneath because I want them to line up with
the data so I’m going to take this out and put my test
this one is gonna to be my test folder this is going to be my train folder it’s gonna be
my train labels sample submission okay that looks good all right so now that we have that done okay Stage1 all right everything was good
okay so now I’m going to go ahead and I’m gonna run the cell and see what
happens so the bucket we identified was a videodeep-learning we ran that it executed then we’re gonna run this one and see if
we could get this working okay it ran a little too fast but let’s see
if we ahh if we got our files YES!!! our folders are there all right it was fast I was like
lightning fast okay so we got it our folders there we have our data an s3 we
have our notebook that was awesome so freaking awesome okay all right so
we’re gonna start our competition if you like our videos please subscribe like
our Channel most importantly join our slack channel you’ll see the link below until next time I hope you enjoyed this video H How to Pull Data into AWS S3 with AWS SageMaker. Have a great day! sage maker have a great day

Daniel Ostrander

Related Posts

8 thoughts on “How To Pull Data into S3 using AWS Sagemaker

  1. 이성운 says:

    Best lecture. Thank you.

  2. Igor Shkarin says:

    Guys, I'm new in ML, can you tell me what is the reason you are doing it? Why we need to download stuff in to the s3 through the Jupyter notebook? Sorry, my que could be stupid, trying to understand.
    Thanks in advance!

  3. Igor Shkarin says:

    And can you please to rec more vids about sagemaker?)

  4. ll ll says:

    curb your music track video editing.

  5. Lawrence Choo says:

    Does anyone knows that sagemaker is capable to install or run keras model.

  6. Freddie Karlbom says:

    I don't understand how you got this to work, as Kaggle doesn't allow access to the datasets unless you are logged in so your download function should just have downloaded the redirect page for unauthenticated users?

  7. vyvian somaya says:

    ClientError: An error occurred (EntityTooLarge) when calling the PutObject operation: Your proposed upload exceeds the maximum allowed size

    i get this error. i wrote the exact code shown here to download cocodataset from their website. please help

  8. veda vinothini says:

    Use full video..

    Need a process flow for Image Classification using SageMaker

Leave a Reply

Your email address will not be published. Required fields are marked *