Excel 2013 Statistical Analysis #11: Power Query Import Multiple Text Files, Grade Histogram by Year

Excel 2013 Statistical Analysis #11: Power Query Import Multiple Text Files, Grade Histogram by Year


Welcome to Excel 2013,
statistical analysis video number 11. Hey, if you want to download
this workbook and the grade zipped folder that we’re
going to use for this video, then click on the
link below the video, and you can download these. Hey, in this video,
we want to talk about importing grade
data to make a histogram, and we’re going to use the
new feature Power Query. Now this is something
you have to download. Earlier in the class, I
showed you how to download it. Our computer labs and classroom
have this as a download. Now I’m going to show
you this example. We’re going to take grade
data from various classes and build a histogram
and look at it. Now this is not something
that’s going to be on a test, but this is the
kind of thing that’s going to happen to you
out in the working world. You’re going to have
a bunch of data, and if you know how
to use power queries, it’s a few clicks
instead of hours of work. Now let’s go look at our folder. And the way it’s
going to download it’s going to say grade zip. And I’m simply going to unzip it
by a right-click, extract all. And instantly, it’ll ask me
where I’m going to say extract, and there’s the folder. If I double-click
and open it up, double-click it and open it up,
what we have our text files. And again, this is grade data
from a large span of years for various classes, and we want
to build a histogram from this. Now the thing about text
files dot TXT or dot CSV, these are text
files and databases. And systems that
analyze data like Excel have to communicate
with each other. And so oftentimes, the way
they send data back and forth are these text files. Now we could open them or import
them one at a time in Excel. But the beauty of Power
Query is we just tell Power Query where
this folder is, and it will go get
all the files for us. It’s quite remarkable. Now let’s go back over to Excel. We need to get the data,
so we go to Power Query. And we go to get external
data, and strangely enough, we go from file. But down at the bottom,
it’s from folder, browse. There is you browse to
wherever your location is. Mine’s on the desktop. I’m going to select this
grades and click OK, click OK. Now it imports all three files. There’s a bunch of
information about the files, but we don’t want it. The only column we
want is content. So I’m going to
right-click– remember this is Power Query–
right-click the content and remove other columns. Boop, they’re all gone. Now I can come and
click this double arrow, and it will totally
expand the files. Now there’s one strange
thing about this. When we import it, well, the
field names are at the top. That’s not strange class. And then there is the class. We’ll use that as our row
label on the pivot table. Grade will actually count
based on classes and year. So those are field names. That’s not strange,
but I’m going to click the dropdown
this is a filter. And it always shows
a unique list. Now on big data sets, it
says list may be incomplete, so I’m going to say load more. And sure enough, it says Class. That’s the actual
name of the field. And the reason why is because
when you import imported multiple files, way down
at the bottom somewhere, it mixed it up, and it
put the field name there. And there’s actually a
grade and a year too. After you import, you come and
you filter out the field name. Now once we filter
out for this column, it’ll filter out
the other ones too because it’ll remove
the whole record. Now this is really cool. This is a filter that
Power Query is using. And when we click OK, it
totally cleaned up the data. Now we have this table
that we can close and load to our spreadsheet. But I want to come
over here first and name this just in case
we ever have to look at this or use this again– create import, enter. And now we close and load to. It’s going to come as a table. We don’t want to
write a new worksheet. We want on our
existing worksheet. And I actually don’t want A2. I want A1, click OK, click Load. You got to be kidding me. Look over here in the workbooks
query window, 75,000 records. That is just amazing. We can close this. Lets control down arrow. Wow, 75,000 records–
Control Home. And now we want to build our
frequency distribution insert pivot table. We’re going to use a
pivot table or Alt NV. And I don’t want to put
it on a new worksheet. I want it on this existing
worksheet location. Let’s say E1– E1, click OK. Now I’m going to drag
grade down to rows. Right-click group and
this goes from 0 to 4. I’m going to say increments
of 0.5, click OK. Ambiguous labels because
we have decimals– we talked about that in
the last couple of videos– highlight. We go to home, editing, replace
or Control-H. I want to– and this is from
the last video– I have a dash. And I want to very carefully
put a space up space to and a space. Now I’m going to say replace
all, click OK, click Close. Now we can drag
grade down to values. That is a lot of counts. Now what’s so amazing about
this is Excel and Power Query and pivot tables make
this large data analysis quick and easy. Now notice we have
a year variable. I’m going to close
this field list. And actually, before
we add the year back, I want to name this
grade category frequency. And we’ll change the
column right there. Now what I’d like to do is
since there’s so much data here, I want to be able to see
how the frequency table will change based on year. So this is a pivot
table– amazing feature. Analyze over to filter
and insert slicer– it’s just a beautiful
fancy filter. So I’m going to click
on this, year, click OK. Now before we make this look
better, just check this out. You click on any
one of the years, and you got to be kidding me. A slicer to instantly
filter this data set. Now there is a slicer
tools option ribbon. I’m going to come up here
and say maybe four columns, point down and pull to the edge. That’s looking pretty good. That is amazing. So we can select
any particular year. Now let’s make our histogram
Insert Column cluster, click on the title,
grade histogram by year, delete the legend that’s chart
junk, delete the horizontal lines, the vertical
axis, dad labels– let’s say data labels
up at the top– click on the columns,
right-click Format, Data series or control 1. We can change the gap width. I’m going to go
over to our fill. We’re getting the hang of this. We’ve done this a few
times– vary colors by point. I don’t need the
task pane anymore. Right-click, hide all field
buttons, click and drag. Right at the top there, now I
can instantly select any year. And check out– the histogram
is totally changing. So or query from
file from folder to import a bunch of
files, pivot table to create the
frequency distribution chart to create the
histogram and even this amazing slicer all to
do statistical data analysis. How much fun is that? Now we’ll actually use this
exact same data set again in other chapters
when we’re learning other statistical techniques. All right, we’ll
see you next video.

Danny Hutson

6 thoughts on “Excel 2013 Statistical Analysis #11: Power Query Import Multiple Text Files, Grade Histogram by Year

  1. Excel 2013 Statistical Analysis #11: Power Query Import Multiple Text Files, Grade Histogram by Year
    Download files: http://people.highline.edu/mgirvin/excelisfun.htm
    Topics in this video:
    1. (00:16) Over View of File Import and Histogram Creation
    2. (00:56) Look at Zipped Folder from class download then unzip it with Right-click, “Extract All”
    3. (01:15) Text Files for communication between databases and data analysis programs like Excel
    4. (02:06) Use Power Query to Import Multiple Files
    5. (02:10) Get External Data Tab in Power Query, From File Button, From Folder Button
    6. (02:33) We only need to keep “Content” Column, so right-click “Content” Field Name and point to “Remove Other Columns”
    7. (02:51) To reveal data in imported tables, click the button with the Two Downward Point Arrows.
    8. (02:58) Filter out Field Name.
    9. (04:10) Name Query
    10. (04:17) Close and Load To a cell in our worksheet (this brings table of data from the Power Query editor window into our worksheet)
    11. (04:51) Build Frequency Distribution with a PivotTable
    12. (05:28) Use Find and Replace feature to create non-ambiguous labels in a Grouped Decimal Number PivotTable.
    13. (06:31) Add a Slicer for the Year Variable to the PivotTable
    14. (07:26) Create Histogram

Leave a Reply

Your email address will not be published. Required fields are marked *