An Introduction to Data Science

Categories:

Recommended

Data Science: Many Skills

Data Science refers to an emerging area of work concerned with the collection, preparation, analysis, visualization, management, and preservation of large collections of information. Although the name Data Science seems to connect most strongly with areas such as databases and computer science, many different kinds of skills – including non-mathematical skills – are needed.

Section 1

Overview 

  1. Data science includes data analysis as an important component of the skill set required for many jobs in this area, but is not the only necessary skill.
  2. A brief case study of a supermarket point of sale system illustrates the many challenges involved in data science work.
  3. Data scientists play active roles in the design and implementation work of four related areas: data architecture, data acquisition, data analysis, and data archiving.
  4. Key skills highlighted by the brief case study include communication skills, data analysis skills, and ethical reasoning skills.

For some, the term “Data Science” evokes images of statisticians in white lab coats staring fixedly at blinking computer screens filled with scrolling numbers. Nothing could be further from the truth. First of all, statisticians do not wear lab coats: this fashion statement is reserved for biologists, doctors, and others who have to keep their clothes clean in environments filled with unusual fluids. Second, much of the data in the world is non-numeric and unstructured. In this context, unstructured means that the data are not arranged in neat rows and columns. Think of a web page full of photographs and short messages among friends: very few numbers to work with there. While it is certainly true that companies, schools, and governments use plenty of numeric information – sales of products, grade point averages, and tax assessments are a few examples – there is lots of other information in the world that mathematicians and statisticians look at and cringe. So, while it is always useful to have great math skills, there is much to be accomplished in the world of data science for those of us who are presently more comfortable working with words, lists, photographs, sounds, and other kinds of information.

In addition, data science is much more than simply analyzing data. There are many people who enjoy analyzing data and who could happily spend all day looking at histograms and averages, but for those who prefer other activities, data science offers a range of roles and requires a range of skills. Let’s consider this idea by thinking about some of the data involved in buying a box of cereal.

Whatever your cereal preferences – fruity, chocolaty, fibrous, or nutty – you prepare for the purchase by writing “cereal” on your grocery list. Already your planned purchase is a piece of data, albeit a pencil scribble on the back on an envelope that only you can read. When you get to the grocery store, you use your data as a re- minder to grab that jumbo box of FruityChocoBoms off the shelf and put it in your cart. At the checkout line the cashier scans the barcode on your box and the cash register logs the price. Back in the warehouse, a computer tells the stock manager that it is time to request another order from the distributor, as your purchase was one of the last boxes in the store. You also have a coupon for your big box and the cashier scans that, giving you a predetermined dis- count. At the end of the week, a report of all the scanned manufacturer coupons gets uploaded to the cereal company so that they can issue a reimbursement to the grocery store for all of the coupon discounts they have handed out to customers. Finally, at the end of the month, a store manager looks at a colorful collection of pie charts showing all of the different kinds of cereal that were sold, and on the basis of strong sales of fruity cereals, decides to offer more varieties of these on the store’s limited shelf space next month.

Category:

Attribution

Jeffrey Stanton (2012, 13), An Introduction to Data Science, URL: http://jsresearch.net/groups/teachdatascience

This work is licensed under Creative Commons Attribution- NonCommercial-ShareAlike 3.0 license  (https://creativecommons.org/licenses/by-nc-sa/3.0/).

VP Flipbook Maker

Convert and display your work as digital flipbook with VP Online Flipbook Maker! You can also create a new flipbook with the tool. Try it now!