0% found this document useful (0 votes)
179 views57 pages

Data Processing and Analysis in Programming

The document provides a scoring guide for the AP Computer Science Principles exam, focusing on various programming concepts and data handling. It includes multiple-choice questions with correct answers and explanations regarding programming efficiency, data representation, metadata analysis, and algorithm development. The guide emphasizes understanding how to process data, the implications of data structures, and the importance of metadata in various contexts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
179 views57 pages

Data Processing and Analysis in Programming

The document provides a scoring guide for the AP Computer Science Principles exam, focusing on various programming concepts and data handling. It includes multiple-choice questions with correct answers and explanations regarding programming efficiency, data representation, metadata analysis, and algorithm development. The guide emphasizes understanding how to process data, the implications of data structures, and the importance of metadata in various contexts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

AP COMPUTER SCIENCE PRINCIPLES Scoring Guide

Big idea 2

1. A programmer is writing a program that is intended to be able to process large amounts of data. Which of the
following considerations is LEAST likely to affect the ability of the program to process larger data sets?
(A) How long the program takes to run
(B) How many programming statements the program contains
(C) How much memory the program requires as it runs
(D) How much storage space the program requires as it runs

Answer B

This option is correct. The number of statements in a program is not likely to affect how a program will
handle larger data sets. The efficiency of a program is independent of the number of statements it
contains. There are some programs with very few statements that take a long time to execute, as well as
programs with many statements that take little time to execute.

2. A cable television company stores information about movie purchases made by subscribers. Each day, the following
information is summarized and stored in a publicly available database.

• The day and date each movie was purchased

• The title of each movie purchased

• The cities where subscribers purchased each movie

• The number of times each movie was purchased by subscribers in a given city

A sample portion of the database is shown below. The database is sorted by date and movie title.

Which of the following CANNOT be determined using only the information in the database?

AP Computer Science Principles Page 1 of 57


Scoring Guide

Big idea 2

(A) The date when a certain movie was purchased the greatest number of times
(B) The number of movies purchased by an individual subscriber for a particular month
(C) The total number of cities in which a certain movie was purchased
(D) The total number of movies purchased in a certain city during a particular month

Answer B

This option is correct. It is not possible to determine the number of movies purchased by an individual
subscriber for a particular month. In this database, information about individual subscribers, such as their
ID number, is not stored.

3. A certain social media Web site allows users to post messages and to comment on other messages that have been
posted. When a user posts a message, the message itself is considered data. In addition to the data, the site stores the
following metadata.

• The time the message was posted

• The name of the user who posted the message

• The names of any users who comment on the message and the times the comments were made

For which of the following goals would it be more useful to analyze the data instead of the metadata?
(A) To determine the users who post messages most frequently
(B) To determine the time of day that the site is most active
(C) To determine the topics that many users are posting about
(D) To determine which posts from a particular user have received the greatest number of comments

4. A computer program performs the operation and represents the result as the value . Which
of the following best explains this result?
(A) An overflow error occurred.
(B) The precision of the result is limited due to the constraints of using a floating-point representation.
(C) The program attempted to execute the operation with the arguments in reverse order.
(D) The program attempted to represent a floating-point number as an integer.

Answer B
This option is correct. The fixed number of bits used to represent real numbers (as floating-point numbers) limits
the range of floating-point values.

Page 2 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

5. A computer program uses 4 bits to represent nonnegative integers. Which of the following statements describe a
possible result when the program uses this number representation?

I. The operation will result in an overflow error.

II. The operation will result in an overflow error.

III. The operation will result in an overflow error.


(A) I only
(B) II only
(C) II and III only
(D) I, II, and III

Answer B

This option is correct. With a 4-bit integer representation, values can be represented, which allows for the
values between to , inclusive. If an operation results in a value greater than , an overflow error will
occur. Of the operations given in the options, only gives a result larger than .

6. A certain programming language uses 4-bit binary sequences to represent nonnegative integers. For example, the
binary sequence 0101 represents the corresponding decimal value 5. Using this programming language, a
programmer attempts to add the decimal values 14 and 15 and assign the sum to the variable total. Which
of the following best describes the result of this operation?
(A) The correct sum of 29 will be assigned to the variable total.
An overflow error will occur because 4 bits is not large enough to represent either of the
(B)
values 14 or 15.
An overflow error will occur because 4 bits is not large enough to represent 29, the sum
(C)
of 14 and 15.
A round-off error will occur because the decimal values 14 and 15 are represented as
(D)
approximations due to the fixed number of bits used to represent numbers.

Answer C

Correct. The largest binary value that can be represented using 4 bits is 1111, which is equal to the
decimal value 15. Since the sum is larger than the largest representable value, an overflow error will
occur.

AP Computer Science Principles Page 3 of 57


Scoring Guide

Big idea 2

7. Which of the following is an advantage of a lossless compression algorithm over a lossy compression algorithm?
A lossless compression algorithm can guarantee that compressed information is kept secure, while a
(A)
lossy compression algorithm cannot.
A lossless compression algorithm can guarantee reconstruction of original data, while a lossy
(B)
compression algorithm cannot.
A lossless compression algorithm typically allows for faster transmission speeds than does a lossy
(C)
compression algorithm.
A lossless compression algorithm typically provides a greater reduction in the number of bits stored or
(D)
transmitted than does a lossy compression algorithm.

Answer B

Correct. Lossless compression algorithms are guaranteed to be able to reconstruct the original data, while
lossy compression algorithms are not.

8. A large data set contains information about all students majoring in computer science in colleges across the United
States. The data set contains the following information about each student.

• The student’s gender

• The state in which the student attends college

• The student’s grade point average on a 4.0 scale

Which of the following questions could be answered by analyzing only information in the data set?
Do students majoring in computer science tend to have higher grade point averages than students
(A)
majoring in other subjects?
How many states have a higher percentage of female computer science majors than male computer
(B)
science majors attending college in that state?
(C) What percent of students attending college in a certain state are majoring in computer science?
(D) Which college has the highest number of students majoring in computer science?

Answer B

This option is correct. The data set stores information about an individual student’s gender and state.
This information can be aggregated to extract information about the percentage of female majors in each
state.

Page 4 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

A programmer is developing a word game. The programmer wants to create an algorithm that will take a list of words and
return a list containing the first letter of all words that are palindromes (words that read the same backward or forward).
The returned list should be in alphabetical order. For example, if the list contains the words
, the returned list would contain
(because , , and are palindromes).

The programmer knows that the following steps are necessary for the algorithm but is not sure in which order they should
be executed.

9. Executing which of the following sequences of steps will enable the algorithm to work as intended?

I. First shorten, then keep palindromes, then sort

II. First keep palindromes, then shorten, then sort

III. First sort, then keep palindromes, then shorten


(A) I only
(B) II only
(C) I and III
(D) II and III

Answer D
This option is correct. Options II and III perform the steps in a correct order. In order to generate the desired list,
the algorithm must perform the "shorten" step after the "keep palindromes" step, otherwise the "keep palindromes"
step would not be able to determine whether the original word was a palindrome.

10. A library of e-books contains metadata for each book. The metadata are intended to help a search feature find books
that users are interested in. Which of the following is LEAST likely to be contained in the metadata of each e-book?

AP Computer Science Principles Page 5 of 57


Scoring Guide

Big idea 2

(A) An archive containing previous versions of the e-book


(B) The author and title of the e-book
(C) The date the e-book was first published
(D) The genre of the e-book (e.g., comedy, fantasy, romance, etc.)

Answer A
This option is correct. Metadata for an e-book would typically be used to provide descriptive information about
the book. Previous versions of the e-book would likely be considered data, not metadata.

A media librarian at a movie studio is planning to save digital video files for archival purposes. The movie studio would
like to be able to access full-quality videos if they are needed for future projects.

11. Which of the following actions is LEAST likely to support the studio’s goal?
Using video file formats that conform to published standards and are supported across many different
(A)
devices
(B) Using lossy compression software to reduce the size requirements of the data being stored
(C) Using storage media that can be expanded for additional data capacity
(D) Using a system that incorporates redundancy to handle disk failure

Answer B
This option is correct. Using lossy compression will provide only an approximation of the original video data.
The full-quality original versions of the videos will be lost if lossy compression is used.

12. A digital photo file contains data representing the level of red, green, and blue for each pixel in the photo. The file
also contains metadata that describes the date and geographic location where the photo was taken. For which of the
following goals would analyzing the metadata be more appropriate than analyzing the data?
(A) Determining the likelihood that the photo is a picture of the sky
(B) Determining the likelihood that the photo was taken at a particular public event
(C) Determining the number of people that appear in the photo
(D) Determining the usability of the photo for projection onto a particular color background

A group of students take hundreds of digital photos for a science project about weather patterns. Each photo file contains
data representing the level of red, green, and blue for each pixel in the photo. The file also contains metadata that
describes the date, time, and geographic location where the photo was taken.

Page 6 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

13. For which of the following goals would analyzing the metadata be more appropriate than analyzing the data?

Select two answers.

A Determining the chronological order of the photos

B Determining the number of clouds in a particular photo

C Determining whether a photo is suitable for printing in black-and-white

D Determining whether two photos were taken at the same location on different days

Answer A

Correct. The time and date that a photo is taken is considered metadata about the image. This information
can be used to determine the chronological order of the images.

Answer D

Correct. The location and date that a photo is taken is considered metadata about the image. This
information can be used to determine whether two pictures were taken at the same location on different
dates.

14. An Internet service provider (ISP) is considering an update to its servers that would save copies of the Web pages
most frequently visited by each user.

Which of the following is LEAST likely to occur as a result of the update?


(A) Average response time for user requests might decrease.
(B) Privacy of users might be negatively affected.
(C) Storage requirements for the servers might increase.
(D) Web sites that are not visited frequently might no longer be accessible to users.

AP Computer Science Principles Page 7 of 57


Scoring Guide

Big idea 2

Answer D

This option is correct. The actions of the ISP will only affect how frequently visited pages are loaded
into Web browsers. Pages not saved by the ISP are still accessed as they were before.

15. An online store uses 6-bit binary sequences to identify each unique item for sale. The store plans to increase the
number of items it sells and is considering using 7-bit binary sequences. Which of the following best describes the
result of using 7-bit sequences instead of 6-bit sequences?
(A) 2 more items can be uniquely identified.
(B) 10 more items can be uniquely identified.
(C) 2 times as many items can be uniquely identified.
(D) 10 times as many items can be uniquely identified.

Answer C

This option is correct. Using 6-bit binary sequences allows for 26 or 64 different items to be identified.
Using 7-bit binary sequences allows for 27 or 128 different items to be identified. Thus there are two
times as many items that can be uniquely identified.

Page 8 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

Grades in a computer science course are based on total points earned on a midterm exam and a final exam. The teacher
provides a way for students to improve their course grades if they receive high scores on the final exam: if a student’s
final exam score is greater than the student’s midterm exam score, the final exam score replaces the midterm exam score
in the calculation of total points.

The table below shows two students’ scores on the midterm and final exams and the calculated total points each student
earns.

• Khalil does better on the midterm exam than on the final exam, so his original midterm and final exam scores are
added to compute his total points.
• Josefina does better on the final exam than on the midterm exam, so her final exam score replaces her midterm
exam score in the total points calculation.

A programmer is writing a procedure to calculate a student’s final grade in the course using the score replacement policy
described. The student’s exam scores are stored in the variables and . The procedure
returns the larger of and .

16. Which of the following could be used in the procedure to calculate a student’s total points earned in the course and
store the result in the variable ?

(A)

(B)

(C)

(D)

Answer B

This option is correct. This expression uses the procedure to replace the midterm score with the higher of
the two scores. The selected value is then added to the final exam score and assigned to .

AP Computer Science Principles Page 9 of 57


Scoring Guide

Big idea 2

17. A retailer that sells footwear maintains a single database containing records with the following information about
each item for sale in the retailer’s store.

• Item identification number

• Footwear type (sneakers, boots, sandals, etc.)

• Selling price (in dollars)

• Size

• Color

• Quantity available

Using only the database, which of the following can be determined?

(A) Which items listed in the database are not currently in the store
(B) Which colors are more popular among men than women
(C) Which type of footwear is most popular among adults
(D) The total number of shoes sold in a particular month

Answer A

This option is correct. Because the database stores information on item identification numbers and
quantities available, the retailer can search for all item identification numbers that have a quantity of 0.

Page 10 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

18. ASCII is a character-encoding scheme that uses 7 bits to represent each character. The decimal (base 10) values 65
through 90 represent the capital letters A through Z, as shown in the table below.

What ASCII character is represented by the binary (base 2) number 1001010 ?


(A) H
(B) I
(C) J
(D) K

Answer C

This option is correct. The table shows that the letter J is represented by the decimal value 74, which in
binary (base 2) is 1001010.

AP Computer Science Principles Page 11 of 57


Scoring Guide

Big idea 2

19. ASCII is a character-encoding scheme that uses a numeric value to represent each character. For example, the
uppercase letter “G” is represented by the decimal (base 10) value 71. A partial list of characters and their
corresponding ASCII values are shown in the table below.

ASCII characters can also be represented by hexadecimal numbers. According to ASCII character encoding, which
of the following letters is represented by the hexadecimal (base 16) number 56?
(A) A
(B) L
(C) V
(D) Y

20. Directions: The question or incomplete statement below is followed by four suggested answers or
completions. Select the one that is best in each case.

A search engine has a trend-tracking feature that provides information on how popular a search term is. The
data can be filtered by geographic region, date, and category. Categories include arts and entertainment, computers
and electronics, games, news, people and society, shopping, sports, and travel. Which of the following questions
is LEAST likely to be answerable using the trends feature?
(A) In what month does a particular sport receive the most searches?
(B) In which political candidates are people interested?
(C) What is the cost of a certain electronics product?
(D) Which region of the country has the greatest number of people searching for opera performances?

Answer C
This option is correct. The cost of a given product is not tracked by the described search engine.

Page 12 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

A smartphone stores the following data for each photo that is taken using the phone.

• The file name of the photo


• The date and time the photo was taken
• The geographic location where the photo was taken

Assume that all of the photos that have been taken on the phone are accessible.

21. Which of the following can be determined using the photo data described?

I. The number of photos that were taken at a particular geographic location


II. The number of photos that were taken in the last year
III. The name of the person who took the most recent photo
(A) III only
(B) I and II only
(C) I and III only
(D) I, II, and III

Answer B
This option is correct. The number of photos taken at a particular geographic location can be determined from the
geographic data stored with each photo. The number of photos taken in the last year can be determined from the
data and time data stored with each photo. The name of the person who took a photo is not captured in the photo
data.

22. A student is recording a song on her computer. When the recording is finished, she saves a copy on her computer.
The student notices that the saved copy is of lower sound quality than the original recording. Which of the
following could be a possible explanation for the difference in sound quality?

(A) The song was saved using fewer bits per second than the original song.
(B) The song was saved using more bits per second than the original song.
(C) The song was saved using a lossless compression technique.
(D) Some information is lost every time a file is saved from one location on a computer to another location.

Answer A

This option is correct. The representation of sound as data involves the computational manipulation of

AP Computer Science Principles Page 13 of 57


Scoring Guide

Big idea 2

information. For one copy of a song to have a lower sound quality than another copy, a lower ratio of bits
per second must have been used.

23. A text-editing application uses binary sequences to represent each of 200 different characters. What is the minimum
number of bits needed to assign a unique bit sequence to each of the possible characters?
(A) 4
(B) 6
(C) 7
(D) 8

Answer D

This option is correct. Using 8 bits will allow for up to 256 characters .

Page 14 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

A file storage application allows users to save their files on cloud servers. A group of researchers gathered user data for
the first eight years of the application’s existence. Some of the data are summarized in the following graphs. The line
graph on the left shows the number of registered users each year. The line graph on the right shows the total amount of
data stored by all users each year. The circle graph shows the distribution of file sizes currently stored by all users.

(note: 1 MB = 1,000 KB)

24. Which of the following best describes the average amount of data stored per user for the first eight years of the
application’s existence?
(A) Across all eight years, the average amount of data stored per user was about 10 GB.
(B) Across all eight years, the average amount of data stored per user was about 100 GB.
(C) The average amount of data stored per user appears to increase by about 10 GB each year.
(D) The average amount of data stored per user appears to increase by about 100 GB each year.

AP Computer Science Principles Page 15 of 57


Scoring Guide

Big idea 2

Answer A

Correct. The two line graphs are roughly the same shape. Each value on the right line graph is about 10
times the corresponding value on the left line graph. Therefore, the average amount of data stored per
user is about 10 GB.

25. Which of the following observations is most consistent with the information in the circle graph?
(A) Over 75% of the files stored are 1 MB in size or less.
(B) Over 75% of the files stored are 10 MB in size or less.
(C) Over 75% of the files stored are at least 100 KB in size.
(D) Over 75% of the files stored are at least 1 MB in size.

Answer B

Correct. The files that are up to 10 MB represent 17% + 24% + 25% + 10%, or 76%.

26. Which of the following best describes the growth in the number of registered users for the first eight years of the
application’s existence?
(A) The number of registered users increased at about a constant rate each year for all eight years.
The number of registered users increased at about a constant rate for years 1 to 5 and then about doubled
(B)
each year after that.
The number of registered users about doubled each year for years 1 to 5 and then increased at about a
(C)
constant rate after that.
(D) The number of registered users about doubled each year for all eight years.

Answer C

Correct. From years 1 to 5, the number of registered users roughly doubled each year. From years 5 to 8,
the number of registered users increased by about 100 million each year.

27. A video-streaming Web site uses 32-bit integers to count the number of times each video has been played. In
anticipation of some videos being played more times than can be represented with 32 bits, the Web site is planning
to change to 64-bit integers for the counter. Which of the following best describes the result of using 64-bit integers
instead of 32-bit integers?

Page 16 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

(A) 2 times as many values can be represented.


(B) 32 times as many values can be represented.
(C) 232 times as many values can be represented.
(D) 322 times as many values can be represented.

A color in a computing application is represented by an RGB triplet that describes the amount of red, green, and blue,
respectively, used to create the desired color. A selection of colors and their corresponding RGB triplets are shown in the
following table. Each value is represented in decimal (base 10).

Color Name RGB Triplet


indigo (75, 0, 130)
ivory (255, 255, 240)
light pink (255, 182, 193)
light yellow (255, 255, 224)
magenta (255, 0, 255)
neutral gray (127, 127, 112)
pale yellow (255, 255, 160)
vivid yellow (255, 255, 14)

28. What is the binary RGB triplet for the color indigo?
(A) (00100101, 00000000, 10000010)
(B) (00100101, 00000000, 01000001)

(C) (01001011, 00000000, 10000010)

(D) (01001011, 00000000, 01000001)

Answer C

Correct. The decimal value 75 is equal to 64 + 8 + 2 + 1, which is equal to , which is


equal to the binary number 01001011. The decimal value 0 is equal to the binary number 00000000. The
decimal value 130 is equal to 128 + 2, which is equal to , which is equal to the binary number
10000010.

29. According to information in the table, what color is represented by the binary RGB triplet (11111111,
11111111, 11110000) ?

AP Computer Science Principles Page 17 of 57


Scoring Guide

Big idea 2

(A) Ivory
(B) Light yellow
(C) Neutral gray
(D) Vivid yellow

Answer A

Correct. The binary number 11111111 is equal to , or


255. The binary number 11110000 is equal to , or 240. Therefore, the given
binary triplet represents the color ivory.

30. Biologists often attach tracking collars to wild animals. For each animal, the following geolocation data is collected
at frequent intervals.

• The time

• The date

• The location of the animal

Which of the following questions about a particular animal could NOT be answered using only the data collected
from the tracking collars?
(A) Approximately how many miles did the animal travel in one week?
(B) Does the animal travel in groups with other tracked animals?
(C) Do the movement patterns of the animal vary according to the weather?
(D) In what geographic locations does the animal typically travel?

31. A video game character can face toward one of four directions: north, south, east, and west. Each direction is stored
in memory as a sequence of four bits. A new version of the game is created in which the character can face toward
one of eight directions, adding northwest, northeast, southwest, and southeast to the original four possibilities.
Which of the following statements is true about how the eight directions must be stored in memory?
Four bits are not enough to store the eight directions. Five bits are needed for the new version of the
(A)
game.
Four bits are not enough to store the eight directions. Eight bits are needed for the new version of the
(B)
game.
Four bits are not enough to store the eight directions. Sixteen bits are needed for the new version of the
(C)
game.
(D) Four bits are enough to store the eight directions.

Page 18 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

Answer D

Correct. Four bits can represent , or 16 pieces of information.

A bookstore has a database containing information about each book for sale in the store. A sample portion of the database
is shown below.

Selling Quantity
Author Title Genre
Price Available
J. M. Barrie Peter and Wendy $6.99 Fantasy 3
L. Frank Baum The Wonderful Wizard of Oz $7.99 Fantasy 2
Arthur Conan
The Hound of the Baskervilles $7.49 Mystery 4
Doyle
Mary Shelley Frankenstein $7.99 Horror 4
Twenty Thousand Leagues Under Science
Jules Verne $6.99 3
the Sea Fiction
Science
H. G. Wells The War of the Worlds $4.99 3
Fiction

32. A store employee wants to calculate the total amount of money the store will receive if they sell all of the available
science fiction books. Which columns in the database can be ignored and still allow the employee to perform this
calculation?

Select two answers.

A Author

B Title

C Genre

D Quantity Available

Answer A

Correct. In order to perform the desired calculation, the selling price, the genre, and the quantity
available are needed. The author is not needed.

AP Computer Science Principles Page 19 of 57


Scoring Guide

Big idea 2

Answer B

Correct. In order to perform the desired calculation, the selling price, the genre, and the quantity
available are needed. The title is not needed.

33. A large spreadsheet contains the following information about the books at a bookstore. A sample portion of the
spreadsheet is shown below.

D E

A B C

Number of Cost

Book Title Author Genre

Copies in Stock (in dollars)

1 Little Women Louisa May Alcott Fiction 3 13.95


2 The Secret Adversary Agatha Christie Mystery 2 12.95
3 A Study in Scarlet Arthur Conan Doyle Mystery 0 8.99
4 The Hound of the Baskervilles Arthur Conan Doyle Mystery 1 8.95
5 Les Misérables Victor Hugo Fiction 1 12.99
6 Frankenstein Mary Shelley Horror 2 11.95

An employee wants to count the number of books that meet all of the following criteria.

• Is a mystery book
• Costs less than $10.00
• Has at least one copy in stock

For a given row in the spreadsheet, suppose genre contains the genre as a string, num contains the number of
copies in stock as a number, and cost contains the cost as a number. Which of the following expressions will
evaluate to true if the book should be counted and evaluates to false otherwise?

(A) (genre = "mystery") AND ((1 ≤ num) AND (cost < 10.00))

(B) (genre = "mystery") AND ((1 < num) AND (cost < 10.00))
(C) (genre = "mystery") OR ((1 ≤ num) OR (cost < 10.00))
(D) (genre = "mystery") OR ((1 < num) OR (cost < 10.00))

Page 20 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

Answer A

Correct. For a book to be counted, the value of genre must be "mystery" so that only mystery
books are counted. The value of num must be greater than or equal to 1 so that only books that have
at least one copy in stock are counted. The value of cost must be less than 10.00 so that only
books that cost less than $10 are counted. All three conditions must be true, so the AND operator is
used between them.

34. In a certain computer program, two positive integers are added together, resulting in an overflow error. Which of the
following best explains why the error occurs?
(A) The program attempted to perform an operation that is considered an undecidable problem.
(B) The precision of the result is limited due to the constraints of using a floating-point representation.
The program can only use a fixed number of bits to represent integers; the computed sum is greater than
(C)
the maximum representable value.
The program cannot represent integers; the integers are converted into decimal approximations, leading
(D)
to rounding errors.

Answer C

Correct. Overflow errors occur when an arithmetic operation results in a value outside the range of
numbers that can be represented by a fixed number of bits.

A video-streaming Web site keeps count of the number of times each video has been played since it was first added to the
site. The count is updated each time a video is played and is displayed next to each video to show its popularity.

At one time, the count for the most popular video was about two million. Sometime later, the same video displayed a
seven-digit negative number as its count, while the counts for the other videos displayed correctly.

35. Which of the following is the most likely explanation for the error?
The count for the video became larger than the maximum value allowed by the data type used to store
(A)
the count.
(B) The mathematical operations used to calculate the count caused a rounding error to occur.
The software used to update the count failed when too many videos were played simultaneously by too
(C)
many users.
The software used to update the count contained a sampling error when using digital data to approximate
(D)
the analog count.

AP Computer Science Principles Page 21 of 57


Scoring Guide

Big idea 2

Answer A

Correct. This situation is consistent with the behavior of an overflow error. When the value of the count
exceeded the maximum value that can be represented by a fixed number of bits, the count overflowed
and wrapped around to a negative number.

36. A researcher is analyzing data about students in a school district to determine whether there is a relationship
between grade point average and number of absences. The researcher plans on compiling data from several sources
to create a record for each student.

The researcher has access to a database with the following information about each student.

• Last name
• First name
• Grade level (9, 10, 11, or 12)
• Grade point average (on a 0.0 to 4.0 scale)

The researcher also has access to another database with the following information about each student.

• First name
• Last name
• Number of absences from school
• Number of late arrivals to school

Upon compiling the data, the researcher identifies a problem due to the fact that neither data source uses a unique
ID number for each student. Which of the following best describes the problem caused by the lack of unique ID
numbers?
(A) Students who have the same name may be confused with each other.
(B) Students who have the same grade point average may be confused with each other.
(C) Students who have the same grade level may be confused with each other.
(D) Students who have the same number of absences may be confused with each other.

Answer A

Correct. A unique identifier would be required in order to distinguish between two students with the
same first and last names.

37. A team of researchers wants to create a program to analyze the amount of pollution reported in roughly 3,000
counties across the United States. The program is intended to combine county data sets and then process the data.
Which of the following is most likely to be a challenge in creating the program?

Page 22 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

(A) A computer program cannot combine data from different files.


(B) Different counties may organize data in different ways.
(C) The number of counties is too large for the program to process.
(D) The total number of rows of data is too large for the program to process.

Answer B

Correct. It will be a challenge to clean the data from the different counties to make the data uniform. The
way pollution data is captured and organized may vary significantly from county to county.

38. A student is creating a Web site that is intended to display information about a city based on a city name that a user
enters in a text field. Which of the following are likely to be challenges associated with processing city names that
users might provide as input?

Select two answers.

A Users might attempt to use the Web site to search for multiple cities.

B Users might enter abbreviations for the names of cities.

C Users might misspell the name of the city.

D Users might be slow at typing a city name in the text field.

Answer B

Correct. Different users may abbreviate city names differently. This may require the student to clean the
data to make it uniform before it can be processed.

Answer C

Correct. Misspelled city names will not be an exact match to information stored by the Web site. This
may require the student to clean the data to make it uniform before it can be processed.

AP Computer Science Principles Page 23 of 57


Scoring Guide

Big idea 2

39. The owner of a clothing store records the following information for each transaction made at the store during a
7-day period.

• The date of the transaction


• The method of payment used in the transaction
• The number of items purchased in the transaction
• The total amount of the transaction, in dollars

Customers can pay for purchases using cash, check, a debit card, or a credit card.

Using only the data collected during the 7-day period, which of the following statements is true?
The average amount spent per day during the 7-day period can be determined by sorting the data by the
(A)
total transaction amount, then adding the 7 greatest amounts, and then dividing the sum by 7.
The method of payment that was used in the greatest number of transactions during the 7-day period can
(B) be determined by sorting the data by the method of payment, then adding the number of items purchased
for each type of payment method, and then finding the maximum sum.
The most expensive item purchased on a given date can be determined by searching the data for all
(C)
items purchased on the given date and then sorting the matching items by item price.
The total number of items purchased on a given date can be determined by searching the data for all
(D) transactions that occurred on the given date and then adding the number of items purchased for each
matching transaction.

Answer D

Correct. For each transaction, the data includes the date of the transaction and the number of items
purchased in the transaction. By searching the data to find all transactions that occurred on the given
date, and then adding the number of items purchased in each of those transactions, the total number of
items purchased on a given date can be determined.

Page 24 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

40. Two lists, list1 and list2, contain the names of books found in two different collections. A librarian wants to create
newList, which will contain the names of all books found in either list, in alphabetical order, with duplicate entries
removed.

For example, if list1 contains

["Macbeth", "Frankenstein", "Jane Eyre"]

and list2 contains

["Frankenstein", "Dracula", "Macbeth", "Hamlet"],

then newList will contain

["Dracula", "Frankenstein", "Hamlet", "Jane Eyre", "Macbeth"].

The following procedures are available to create newList.

Which of the following code segments will correctly create newList ?

AP Computer Science Principles Page 25 of 57


Scoring Guide

Big idea 2

newList ← Combine (list1, list2)


(A) newList ← Sort (newList)
newList ← RemoveDuplicates (newList)
list1 ← Sort (list1)
list2 ← Sort (list2)
(B)
newList ← Combine (list1, list2)
newList ← RemoveDuplicates (newList)
list1 ← RemoveDuplicates (list1)
list2 ← RemoveDuplicates (list2)
(C)
newList ← Combine (list1, list2)
newList ← Sort (newList)
list1 ← RemoveDuplicates (list1)
list1 ← Sort (list1)
(D) list2 ← RemoveDuplicates (list2)
list2 ← Sort (list2)
newList ← Combine (list1, list2)

Answer A

This option is correct. When list1 and list2 are combined, the newList may have duplicates and will
likely not be sorted. Performing the Sort and then the RemoveDuplicates procedures will result in a list
that is sorted, has no duplicates, and contains the names of all the books found in either list1 or list2.

Each student at a school has a unique student ID number. A teacher has the following spreadsheets available.

• Spreadsheet I contains information on all students at the school. For each entry in this spreadsheet, the student
name, the student ID, and the student’s grade point average are included.
• Spreadsheet II contains information on only students who play at least one sport. For each entry in this
spreadsheet, the student ID and the names of the sports the student plays are included.
• Spreadsheet III contains information on only students whose grade point average is greater than 3.5. For each
entry in this spreadsheet, the student name and the student ID are included.
• Spreadsheet IV contains information on only students who play more than one sport. For each entry in this
spreadsheet, the student name and the student ID are included.

The teacher wants to determine whether students who play a sport are more or less likely to have higher grade point
averages than students who do not play any sports.

41. Which of the following pairs of spreadsheets can be combined and analyzed to determine the desired information?

(A) Spreadsheets I and II


(B) Spreadsheets I and IV
(C) Spreadsheets II and III
(D) Spreadsheets III and IV

Page 26 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

Answer A

Correct. The desired information can be determined by using the student IDs in spreadsheet II to identify
the students who play a sport. Once the students who play a sport are identified, the grade point averages
of students who play sports in spreadsheet I can be compared to the grade point averages of all other
students in spreadsheet I.

Participants in a survey were asked how many hours per day they spend reading, how many hours per day they spend
using a smartphone, and whether or not they would be interested in a smartphone application that lets users share book
reviews.

The data from the survey are represented in the graph below. Each represents a survey participant who said he or she
was interested in the application, and each represents a participant who said he or she was not interested.

42. Which of the following hypotheses is most consistent with the data in the graph?

(A) Participants who read more were generally more likely to say they are interested in the application.
(B) Participants who read more were generally less likely to say they are interested in the application.
(C) Participants who use a smartphone more were generally more likely to say they read more.
(D) Participants who use a smartphone more were generally less likely to say they read more.

Answer A

This option is correct. The s, indicating participants who are interested in the application, are clustered toward
the top of the graph.

AP Computer Science Principles Page 27 of 57


Scoring Guide

Big idea 2

This indicates that participants who read more were generally more likely to say they are interested in the
application.

43. A user wants to save a data file on an online storage site. The user wants to reduce the size of the file, if possible,
and wants to be able to completely restore the file to its original version. Which of the following actions best
supports the user’s needs?
(A) Compressing the file using a lossless compression algorithm before uploading it
(B) Compressing the file using a lossy compression algorithm before uploading it
(C) Compressing the file using both lossy and lossless compression algorithms before uploading it
(D) Uploading the original file without using any compression algorithm

Answer A

Correct. Lossless compression algorithms allow for complete reconstruction of the original data and
typically reduce the size of the data.

Byte pair encoding is a data encoding technique. The encoding algorithm looks for pairs of characters that appear in the
string more than once and replaces each instance of that pair with a corresponding character that does not appear in the
string. The algorithm saves a list containing the mapping of character pairs to their corresponding replacement characters.

For example, the string can be encoded as by


replacing all instances of with and replacing all instances of with .

44. Which of the following statements about byte pair encoding is true?
Byte pair encoding is an example of a lossy transformation because it discards some of the data in the
(A)
original string.
Byte pair encoding is an example of a lossy transformation because some pairs of characters are
(B)
replaced by a single character.
Byte pair encoding is an example of a lossless transformation because an encoded string can be restored
(C)
to its original version.
Byte pair encoding is an example of a lossless transformation because it can be used to transmit
(D)
messages securely.

Answer C
This option is correct. The transformation is lossless because an encoded string can be restored to its original

Page 28 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

version. For example, can be restored to by


replacing all instances of with and by replacing all instances of with .

45. Directions: The question or incomplete statement below is followed by four suggested answers or
completions. Select the one that is best in each case.

Computers are often used to search through large sets of data to find useful patterns in the data. Which of
the following tasks is NOT an example where searching for patterns is needed to produce useful information?
(A) A credit card company analyzing credit card purchases to identify potential fraudulent charges
A grocery store analyzing customers’ past purchases to suggest new products the customer may be
(B)
interested in
A high school analyzing student grades to identify the students with the top ten highest grade point
(C)
averages
An online retailer analyzing customers’ viewing habits to suggest other products based on the
(D)
purchasing history of other customers

Answer C
This option is correct. By current standards, a single high school’s list of student grades is not considered a large
set of data. Furthermore, identifying 10 students with the highest grade point averages is not an example of finding
patterns.

AP Computer Science Principles Page 29 of 57


Scoring Guide

Big idea 2

Grades in a computer science course are based on total points earned on a midterm exam and a final exam. The teacher
provides a way for students to improve their course grades if they receive high scores on the final exam: if a student’s
final exam score is greater than the student’s midterm exam score, the final exam score replaces the midterm exam score
in the calculation of total points.

The table below shows two students’ scores on the midterm and final exams and the calculated total points each student
earns.

• Khalil does better on the midterm exam than on the final exam, so his original midterm and final exam scores are
added to compute his total points.
• Josefina does better on the final exam than on the midterm exam, so her final exam score replaces her midterm
exam score in the total points calculation.

The teacher has data representing the scores of thousands of students. For each student, the data contain the student name,
the midterm exam score, the final exam score, and the result of the total points calculation.

46. Which of the following could be determined from the data?

I. The average total points earned per student


II. The average increase in total points per student as a result of the score replacement policy
III. The proportion of students who improved their total points as a result of the score replacement policy
(A) III only
(B) I and II only
(C) I and III only
(D) I, II, and III

Answer D
This option is correct. The average total points earned per student can be determined using the result of the total
points calculation
for each student. The average increase in total points per student as a result of the score replacement policy can be
determined by
calculating the differences between each student score before and after the replacement policy was applied. The
proportion of students
who improved their total points as a result of the score replacement policy can be determined by comparing the
midterm and final scores for each student with the result of the total points calculation.

Page 30 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

The player controls in a particular video game are represented by numbers. The controls and their corresponding binary
values are shown in the following table.

Control Binary Value


← 01000
↑ 01001
→ 01011
↓ 01111
Jump 11000
Run 11001
Pause 11011
Reset 11111

The numeric values for the controls can also be represented in decimal (base 10).

47. What control is represented by the decimal value 15 ?


(A) ←
(B) ↑
(C) →
(D) ↓

Answer D

Correct. The decimal number 15 is equal to , which is represented in binary as


01111. Therefore, the decimal value 15 represents the ↓ control.

48. What is the decimal value for the jump control?


(A) 3
(B) 12
(C) 24
(D) 48

AP Computer Science Principles Page 31 of 57


Scoring Guide

Big idea 2

Answer C

Correct. The binary value 11000 is equal to , which is equal to 24.

Two different schools maintain data sets about their currently enrolled students. No individual student is enrolled at both
schools. Each line of data contains information, separated by commas, about one student.

East High School stores the data in the following format.

West High School stores the data in the following format.

The two schools would like to combine their data to make a single data set.

49. Which of the following can be done with the combined data?

Select two answers.

A The schools can create a single list of student names, sorted by last name.

B The schools can determine the average number of days students are absent.

C The schools can determine which ZIP code is represented by the most students.

D The schools can determine the student ID of the student with the greatest number of absences.

Answer A
This option is correct. It is possible to create a single list of student names, sorted by last name. Both data formats
provide the first and last names of each student.

Answer B
This option is correct. It is possible to determine the average number of days students are absent. Both data
formats provide the number of absences for each student.

Page 32 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

50. Which of the following is a true statement about data compression?


(A) Data compression is only useful for files being transmitted over the Internet.
Regardless of the compression technique used, once a data file is compressed, it cannot be restored to its
(B)
original state.
Sending a compressed version of a file ensures that the contents of the file cannot be intercepted by an
(C)
unauthorized user.
(D) There are trade-offs involved in choosing a compression technique for storing and transmitting data.

51. A company uses the following data files.

File
Description Contents
Name
Customer ID
Customer address
Customers A list of customers
Customer e-mail address
Customer phone number
Product ID
A list of products available for purchase from the Product name
Products
company Type of battery used by the
product, if any
Product ID
Purchases A list of customer purchases Product serial number
Customer ID

A new rechargeable battery pack is available for products that use AA batteries. Which of the following best
explains how the data files in the table can be used to send a targeted e-mail to only those customers who have
purchased products that use AA batteries to let them know about the new accessory?
Use the customer IDs in the purchases file to search the customers file to generate a list of e-mail
(A)
addresses
Use the product IDs in the purchases file to search the products file to generate a list of product names
(B)
that use AA batteries
Use the customers file to generate a list of customer IDs, then use the list of customer IDs to search the
(C)
products file to generate a list of product names that use AA batteries
Use the products file to generate a list of product IDs that use AA batteries, then use the list of product
(D) IDs to search the purchases file to generate a list of customer IDs, then use the list of customer IDs to
search the customers file to generate a list of e-mail addresses

Answer D

Correct. The information in the products list can be used to create a list of the product IDs of all products

AP Computer Science Principles Page 33 of 57


Scoring Guide

Big idea 2

that use AA batteries. Since the products list and the purchases list have the product ID information in
common, the list of product IDs can be used in the purchases list to create a list of customer IDs of all the
customers who purchased products that use AA batteries. Finally, since the purchases list and the
customers list have the customer ID in common, the list of customer IDs can be used to generate a list of
e-mail addresses of the customers who purchased products that use AA batteries.

52. Which of the following can be represented by a sequence of bits?

I. An integer
II. An alphanumeric character
III. A machine language instruction
(A) I only
(B) III only
(C) I and II only
(D) I, II, and III

Answer D

Correct. At the lowest level, all digital data (including integers, alphanumeric characters, and machine
language instructions) are represented with sequences of bits.

53. Which of the following are true statements about the data that can be represented using binary sequences?

I. Binary sequences can be used to represent strings of characters.


II. Binary sequences can be used to represent colors.
III. Binary sequences can be used to represent audio recordings.
(A) I only
(B) I and II only
(C) II and III only
(D) I, II, and III

Answer D

Correct. All digital data is represented at the lowest level as sequences of bits. Statement I is true because
strings of characters can be represented by sequences of bits. Statement II is true because colors can be

Page 34 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

encoded as sequences of bits. Statement III is true because sequences of bits can be used to represent
sound.

54. Consider the 4-bit binary numbers 0011, 0110, and 1111. Which of the following decimal values is NOT equal to
one of these binary numbers?
(A) 3
(B) 6
(C) 9
(D) 15

Answer C

Correct. Binary 0011 is equivalent to , or decimal 3. Binary 0110 is equivalent to , or


decimal 6. Binary 1111 is equivalent to , or decimal 15. Decimal 9 is not equivalent
to any of the given binary numbers.

55. A database of information about shows at a concert venue contains the following information.

• Name of artist performing at the show


• Date of show
• Total dollar amount of all tickets sold

Which of the following additional pieces of information would be most useful in determining the artist with the
greatest attendance during a particular month?
(A) Average ticket price
(B) Length of the show in minutes
(C) Start time of the show
(D) Total dollar amount of food and drinks sold during the show

Answer A

Correct. The attendance for a particular show can be calculated dividing the total dollar amount of all
tickets sold by the average ticket price.

AP Computer Science Principles Page 35 of 57


Scoring Guide

Big idea 2

56. Directions: The question or incomplete statement below is followed by four suggested answers or
completions. Select the one that is best in each case.

Digital images are often represented by the red, green, and blue values (an RGB triplet) of each individual pixel in
the image. A photographer is manipulating a digital image and overwriting the original image. Which of the
following describes a lossless transformation of the digital image?
Compressing the image in a way that may lose information but will suffer only a small loss of image
(A)
quality.
Creating the gray scale of an image by averaging the amounts of red, green, and blue in each pixel and
(B) assigning this new value to the corresponding pixel in the new image. The new value of each pixel
represents a shade of gray, ranging from white to black.
Creating the negative of an image by creating a new RGB triplet for each pixel in which each value is
(C) calculated by subtracting the original value from 255. The negative of an image is reversed from the
original; light areas appear dark, and colors are reversed.
Modifying part of the image by taking the pixels in one part of the picture and copying them to the
(D)
pixels in another part of the picture.

Answer C
This option is correct. If a negative of the original image is made, each RGB triplet value will be computed by
subtracting the original value from 255. The original value can then be restored by subtracting the new value from
255. This process is lossless because the exact original can be restored.

57. Which of the following best explains how an analog audio signal is typically represented by a computer?
An analog audio signal is measured as input parameters to a program or procedure. The inputs are
(A)
represented at the lowest level as a collection of variables.
An analog audio signal is measured at regular intervals. Each measurement is stored as a sample, which
(B)
is represented at the lowest level as a sequence of bits.
An analog audio signal is measured as a sequence of operations that describe how the sound can be
(C)
reproduced. The operations are represented at the lowest level as programming instructions.
An analog audio signal is measured as text that describes the attributes of the sound. The text is
(D)
represented at the lowest level as a string.

Answer B

Correct. Analog signals are sampled digitally at discrete intervals over time. These samples, like all
digital data, are represented at the lowest level as a sequence of bits.

58. The position of a runner in a race is a type of analog data. The runner’s position is tracked using sensors. Which of
the following best describes how the position of the runner is represented digitally?

Page 36 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

The position of the runner is determined by calculating the time difference between the start and the end
(A)
of the race and making an estimation based on the runner’s average speed.
The position of the runner is measured and rounded to either 0 or 1 depending on whether the runner is
(B)
closer to the starting line or closer to the finish line.
The position of the runner is predicted using a model based on performance data captured from previous
(C)
races.
The position of the runner is sampled at regular intervals to approximate the real-word position, and a
(D)
sequence of bits is used to represent each sample.

Answer D

Correct. Analog data, like the runner’s position, have values that change smoothly, rather than in discrete
intervals. Analog data can be approximated digitally by measuring values of the analog signal at regular
intervals called samples. The samples are represented digitally as sequences of bits.

59. Historically, it has been observed that computer processing speeds tend to double every two years. Which of the
following best describes how technology companies can use this observation for planning purposes?
Technology companies can accurately predict the dates when new computing innovations will be
(A)
available to use.
Technology companies can plan to double the costs of new products each time advances in processing
(B)
speed occur.
(C) Technology companies can set research and development goals based on anticipated processing speeds.
Technology companies can spend less effort developing new processors because processing speed will
(D)
always improve at the observed rate.

Answer C

This option is correct. If it is assumed that computer processing speeds will double every two years,
then companies can design new products with this assumption.

A binary number is to be transformed by appending three 0s to the end of the number. For example, 11101 is transformed
to 11101000.

60. Which of the following correctly describes the relationship between the transformed number and the original
number?

AP Computer Science Principles Page 37 of 57


Scoring Guide

Big idea 2

(A) The transformed number is 3 times the value of the original number.
(B) The transformed number is 4 times the value of the original number.
(C) The transformed number is 8 times the value of the original number.
(D) The transformed number is 1,000 times the value of the original number.

Answer C

Correct. Appending a 0 to the end of a binary number multiplies the number by 2. Therefore, appending
three 0s to the end of a binary number multiples the number by 2 three times, which is the same as
multiplying the number by 8.

A large spreadsheet contains the following information about local restaurants. A sample portion of the spreadsheet is
shown below.

C D E
A B
Number of Average Accepts
Restaurant Name Price Range
Customer Ratings Customer Rating Credit Cards
1 Joey Calzone’s Pizzeria lo 182 3.5 false
2 78th Street Bistro med 41 4.5 false
3 Seaside Taqueria med 214 4.5 true
4 Delicious Sub Shop II lo 202 4.0 false
5 Rustic Farm Tavern hi 116 4.5 true
6 ABC Downtown Diner med 0 -1.0 true

In column B, the price range represents the typical cost of a meal, where "lo" indicates under $10, "med" indicates
$11 to $30, and "hi" indicates over $30.
In column D, the average customer rating is set to -1.0 for restaurants that have no customer ratings.

61. A student wants to count the number of restaurants in the spreadsheet whose price range is $30 or less and whose
average customer rating is at least 4.0. For a given row in the spreadsheet, suppose prcRange contains the
price range as a string and avgRating contains the average customer rating as a decimal number.

Which of the following expressions will evaluate to true if the restaurant should be counted and evaluates
to false otherwise?

Page 38 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

(A) (avgRating ≥ 4.0) AND ((prcRange = "lo") AND (prcRange = "med"))

(B) (avgRating ≥ 4.0) AND ((prcRange = "lo") OR (prcRange = "med"))

(C) (avgRating ≥ 4.0) OR ((prcRange = "lo") AND (prcRange = "med"))


(D) (avgRating ≥ 4.0) OR ((prcRange = "lo") OR (prcRange = "med"))

Answer B

Correct. This expression evaluates to true only for restaurants with the correct price range
(when prcRange equals "lo" or "med") and the correct customer rating (when avgRating
≥ 4.0).

62. A student is developing an algorithm to determine which of the restaurants that accept credit cards has the greatest
average customer rating. Restaurants that have not yet received any customer ratings and restaurants that do not
accept credit card are to be ignored.

Once the algorithm is complete, the desired restaurant will appear in the first row of the spreadsheet. If there are
multiple entries that fit the desired criteria, it does not matter which of them appears in the first row.

The student has the following actions available but is not sure of the order in which they should be executed.

Action Explanation
Filter by number of ratings Remove entries for restaurants with no customer ratings
Filter by payment type Remove entries for restaurants that do not accept credit cards
Sort by rating Sort the rows in the spreadsheet on column D from greatest to least

Assume that applying either of the filters will not change the relative order of the rows remaining in the spreadsheet.

Which of the following sequences of steps can be used to identify the desired restaurant?

I. Filter by number of ratings, then filter by payment type, then sort by rating
II. Filter by number of ratings, then sort by rating, then filter by payment type
III. Sort by rating, then filter by number of ratings, then filter by payment type
(A) I and II only
(B) I and III only
(C) II and III only
(D) I, II, and III

AP Computer Science Principles Page 39 of 57


Scoring Guide

Big idea 2

Answer D

Correct. Because the relative order of the rows is not changed when the filters are applied, the order in
which the actions are performed does not matter. The filtering can occur either before or after the
spreadsheet is sorted by rating.

Byte pair encoding is a data encoding technique. The encoding algorithm looks for pairs of characters that appear in the
string more than once and replaces each instance of that pair with a corresponding character that does not appear in the
string. The algorithm saves a list containing the mapping of character pairs to their corresponding replacement characters.

For example, the string can be encoded as by


replacing all instances of with and replacing all instances of with .

63. For which of the following strings is it NOT possible to use byte pair encoding to shorten the string’s length?

(A)

(B)

(C)

(D)

Answer B

This option is correct. It is not possible to use byte pair encoding in the string because no pair
of characters appears in the string more than once.

Internet protocol version 4 (IPv4) represents each IP address as a 32-bit binary number. Internet protocol version 6 (IPv6)
represents each IP address as a 128-bit binary number.

64. Which of the following best describes the result of using 128-bit addresses instead of 32-bit addresses?
(A) 4 times as many addresses are available.
(B) 96 times as many addresses are available.
(C) times as many addresses are available.

(D) times as many addresses are available.

Page 40 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

Answer D

This option is correct. With 32-bit addressing, IPv4 has possible addresses. With 128-bit addressing, IPv6
has possible addresses. Since , IPv6 has times as many possible addresses as IPv4.

Delivery trucks enter and leave a depot through a controlled gate. At the depot, each truck is loaded with packages, which
will then be delivered to one or more customers. As each truck enters and leaves the depot, the following information is
recorded and uploaded to a database.

• The truck’s identification number


• The truck’s weight
• The date and time the truck passes through the gate
• Whether the truck is entering or leaving the depot

65. Using only the information in the database, which of the following questions CANNOT be answered?
(A) On which day in a particular range of dates did the greatest number of trucks enter and leave the depot?
(B) What is the average number of customer deliveries made by each truck on a particular day?
(C) What is the change in weight of a particular truck between when it entered and left the depot?
(D) Which truck has the shortest average time spent at the depot on a particular day?

Answer B

Correct. The data captured each time a truck enters or leaves the depot do not include any information
about the number of customers or deliveries associated with the truck.

A library system contains information for each book that was borrowed. Each time a person borrows or returns a book
from the library, the following information is recorded in a database.

• Name and the unique ID number of the person who was borrowing the book
• Author, title, and the unique ID number of the book that was borrowed
• Date that the book was borrowed
• Date that the book was due to be returned
• Date that the book was returned (or 0 if the book has not been returned yet)

66. Which of the following CANNOT be determined from the information collected by the system?

AP Computer Science Principles Page 41 of 57


Scoring Guide

Big idea 2

(A) The total number of books borrowed in a given year


(B) The total number of books that were never borrowed in a given year
(C) The total number of books that were returned past their due date in a given year
(D) The total number of people who borrowed at least one book in a given year

Answer B

Correct. The system only has information for books that were borrowed. Books that have never been
borrowed are not represented in the data.

67. A camera mounted on the dashboard of a car captures an image of the view from the driver’s seat every second.
Each image is stored as data. Along with each image, the camera also captures and stores the car’s speed, the date
and time, and the car’s GPS location as metadata. Which of the following can best be determined using only the
data and none of the metadata?
(A) The average number of hours per day that the car is in use
(B) The car’s average speed on a particular day
(C) The distance the car traveled on a particular day
(D) The number of bicycles the car passed on a particular day

Answer D

Correct. Determining the number of bicycles the car encountered would require the use of image
recognition software to examine the images collected by the camera. The images are the data collected
and no metadata would be required.

Page 42 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

68. A teacher sends students an anonymous survey in order to learn more about the students’ work habits. The survey
contains the following questions.

• On average, how long does homework take you each night (in minutes) ?
• On average, how long do you study for each test (in minutes) ?
• Do you enjoy the subject material of this class (yes or no) ?

Which of the following questions about the students who responded to the survey can the teacher answer by
analyzing the survey results?

I. Do students who enjoy the subject material tend to spend more time on homework each night than the
other students do?
II. Do students who spend more time on homework each night tend to spend less time studying for tests than
the other students do?
III. Do students who spend more time studying for tests tend to earn higher grades in the class than the other
students do?
(A) I only
(B) III only
(C) I and II
(D) I and III

Answer C

Correct. Question I can be answered because the teacher can detect a correlation between responses to
questions 1 and 3 on the survey. Question II can be answered because the teacher can detect a correlation
between responses to questions 1 and 2 on the survey. Question III cannot be answered because the
survey is anonymous and the teacher cannot compare student grades with the responses to the survey
questions.

69. A person wants to transmit an audio file from a device to a second device. Which of the following scenarios best
demonstrates the use of lossless compression of the original file?
A device compresses the audio file before transmitting it to a second device. The second device restores
(A)
the compressed file to its original version before playing it.
A device compresses the audio file by removing details that are not easily perceived by the human ear.
(B)
The compressed file is transmitted to a second device, which plays it.
A device transmits the original audio file to a second device. The second device removes metadata from
(C)
the file before playing it.
A device transmits the original audio file to a second device. The second device plays the transmitted
(D)
file as is.

AP Computer Science Principles Page 43 of 57


Scoring Guide

Big idea 2

Answer A

Correct. Lossless compression is a technique that allows for complete reconstruction of the original data.

70. In which of the following situations would it be most appropriate to choose lossy compression over lossless
compression?
(A) Storing digital photographs to be printed and displayed in a large format in an art gallery
(B) Storing a formatted text document to be restored to its original version for a print publication
(C) Storing music files on a smartphone in order to maximize the number of songs that can be stored
(D) Storing a video file on an external device in order to preserve the highest possible video quality

Answer C

Correct. In situations where minimizing data size or transmission time is maximally important, lossy
compression algorithms are typically chosen.

71. When a cellular telephone user places a call, the carrier transmits the caller’s voice as well as the voice of the person
who is called. The encoded voices are the data of the call. In addition to transmitting the data, the carrier also stores
metadata. The metadata of the call include information such as the time the call is placed and the phone numbers of
both participants. For which of the following goals would it be more useful to computationally analyze the metadata
instead of the data?

I. To determine if a caller frequently uses a specific word

II. To estimate the number of phone calls that will be placed next Monday between 10:30 A.M. and noon.

III. To generate a list of criminal suspects when given the telephone number of a known criminal
(A) I only
(B) II only
(C) II and III only
(D) I, II, and III

Answer C

This option is correct. Statement II is correct because the repository of stored metadata includes time,
so information about the time of calls can be analyzed to make predictions about future calls. Statement

Page 44 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

III is correct because the metadata stores the phone numbers of the two parties of a call. Given one phone
number, the metadata can be processed to provide all phone numbers that were called by or placed to that
person.

72. A list of binary values (0 or 1) is used to represent a black-and-white image. Which of the following is LEAST
likely to be stored as metadata associated with the image?
(A) Copyright information for the image
(B) The date and time the image was created
(C) The dimensions (number of rows and columns of pixels) of the image
(D) A duplicate copy of the data

Answer D

Correct. Metadata typically consists of descriptive information about the data, not a copy of the data
itself.

AP Computer Science Principles Page 45 of 57


Scoring Guide

Big idea 2

73. A large spreadsheet contains information about the photographs in a museum’s collection. A sample portion of the
spreadsheet is shown below.

A B C D

Photographer Subject Year Publicly Available

1 Steven Greene Geyser Eruption 2004 true


2 Linda James Giant Sloth Fossil -1 true
3 Yajaira Lopez Diplodocus Skull 1997 false
4 Masahiro Higashi Sea Turtle 1989 true
5 (unknown) Solar Eclipse 2011 false
6 (unknown) Giant Sequoia -1 true

• In column A, each unknown photographer is set to "(unknown)".


• In column C, each unknown year is set to -1.

A student is developing an algorithm to determine the name of the photographer who took the oldest photograph in
the collection. Photographs whose photographer or year are unknown are to be ignored.

Once the algorithm is complete, the desired entry will appear in the first row of the spreadsheet. If there are multiple
entries that meet the desired criteria, then any of them can appear in the first row.

The student has the following actions available.

Action Explanation
Filter by photographer Removes entries whose photographer is "(unknown)"
Filter by year Removes entries whose year is -1
Sort by subject Sorts the rows in the spreadsheet on column B alphabetically from A to Z
Sort by year Sorts the rows in the spreadsheet on column C from least to greatest

Assume that applying either of the filters will not change the relative order of the rows remaining in the spreadsheet.

Which of the following sequences of steps can be used to identify the desired entry?

Select two answers.

Page 46 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

A Filter by photographer, then filter by year, then sort by subject

B Filter by photographer, then filter by year, then sort by year

C Sort by subject, then sort by year, then filter by photographer

D Sort by year, then filter by year, then filter by photographer

Answer B

Correct. Filtering by photographer will remove any entries with unknown photographers. Filtering by
year will remove any entries with unknown years. Sorting by year will sort the remaining entries in
column C from least to greatest, putting the photograph with the lowest year value in the first row of the
spreadsheet.

Answer D

Correct. Sorting by year will sort the spreadsheet on column C from least to greatest. Filtering by year
will remove any entries with unknown years. Filtering by photographer will remove any entries with
unknown photographers. Since the order of the entries is not affected by the filters, the photograph with
the lowest year value will be in the first row of the spreadsheet.

74. Each student that enrolls at a school is assigned a unique ID number, which is stored as a binary number. The
ID numbers increase sequentially by 1 with each newly enrolled student. If the ID number assigned to the last
student who enrolled was the binary number 1001 0011, what binary number will be assigned to the next student
who enrolls?
(A) 1001 0100
(B) 1001 0111
(C) 1101 0100
(D) 1101 0111

75. A store uses binary numbers to assign a unique binary sequence to each item in its inventory. What is the minimum
number of bits required for each binary sequence if the store has between 75 and 100 items in its inventory?
(A) 5
(B) 6
(C) 7
(D) 8

AP Computer Science Principles Page 47 of 57


Scoring Guide

Big idea 2

Answer C

Correct. Using 6 bits will only allow for up to 64 sequences because . Using 7 bits will allow for
up to 128 sequences because . Therefore, a minimum of 7 bits are needed.

76. Consider the following numeric values.

• Binary 1011
• Binary 1101
• Decimal 5
• Decimal 12

Which of the following lists the values in order from least to greatest?
(A) Decimal 5, binary 1011, decimal 12, binary 1101
(B) Decimal 5, decimal 12, binary 1011, binary 1101
(C) Decimal 5, binary 1011, binary 1101, decimal 12
(D) Binary 1011, binary 1101, decimal 5, decimal 12

Answer A

Correct. Binary 1011 is equivalent to , or decimal 11, and binary 1101 is equivalent to
, or decimal 13. The order of the numbers (written in their equivalent decimal format) is 5,
11, 12, 13.

Page 48 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

77. A large spreadsheet contains information about the schedule for a college radio station. A sample portion of the
spreadsheet is shown below.

A B C D E
Show Name Genre Day Start Time End Time
1 Dot Dot Dash rock Sunday 11:00 A.M. 1:00 P.M.
2 New Afternoon Show talk Sunday 1:00 P.M. 3:00 P.M.
3 Thursday Beats hip-hop Thursday 7:00 P.M. 9:00 P.M.
4 Gossip Time talk Friday 4:00 P.M. 6:00 P.M.
5 Campus Chat talk Saturday 6:00 P.M. 8:00 P.M.
6 Jazz Brunch jazz Saturday 12:00 P.M. 3:00 P.M.

A student wants to count the number of shows that meet both of the following criteria.

Is a talk show
Is on Saturday or Sunday

For a given row in the spreadsheet, suppose genre contains the genre as a string and day contains the day as a
string. Which of the following expressions will evaluate to true if the show should be counted and evaluates
to false otherwise?
(A) (genre = "talk") AND ((day = "Saturday") AND (day = "Sunday"))

(B) (genre = "talk") AND ((day = "Saturday") OR (day = "Sunday"))

(C) (genre = "talk") OR ((day = "Saturday") AND (day = "Sunday"))


(D) (genre = "talk") OR ((day = "Saturday") OR (day = "Sunday"))

Answer B

Correct. For a show to be counted, the value of genre must be "talk" and the value
of day must be "Saturday" or "Sunday".

AP Computer Science Principles Page 49 of 57


Scoring Guide

Big idea 2

A social media site allows users to send messages to each other. A group of researchers gathered user data for the first 10
years of the site’s existence. Some of the data are summarized in the table below, along with some of the company
milestones.

78. The researchers noticed that the total number of registered users appears to be increasing at about a constant rate. If
this pattern continues, which of the following best approximates the total number of registered users, in millions, in
year 12 (two years after the last entry in the table) ?
(A) 30.6
(B) 31.2
(C) 31.8
(D) 32.4

Answer B
This option is correct. The total number of registered users appears to be increasing by about 0.5 million each
year, so in year 12, the number of users can be approximated at 31.2 million (30.2 + 0.5 + 0.5).

Page 50 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

A social media site allows users to send messages to each other. A group of researchers gathered user data for the first 10
years of the site’s existence. Some of the data are summarized in the table below, along with some of the company
milestones.

79. Which of the following hypotheses is most consistent with the data in the table?
(A) The mobile app release did not have any effect on the average number of daily messages sent per user.
(B) The mobile app release discouraged new user registration on the site.
(C) The mobile app release led to users being less frequently active on the site.
(D) The mobile app release led to users tending to write shorter messages.

Answer D
This option is correct. The average number of characters per message appears to decrease after the mobile app
was released.

80. Directions: The question or incomplete statement below is followed by four suggested answers or
completions. Select the one that is best in each case.

Some programming languages use constants, which are variables that are initialized at the beginning of a
program and never changed. Which of the following are good uses for a constant?

I. To represent the mathematical value (pi) as 3.14


II. To represent the current score in a game
III. To represent a known value such as the number of days in a week

AP Computer Science Principles Page 51 of 57


Scoring Guide

Big idea 2

(A) I and II only


(B) I and III only
(C) II and III only
(D) I, II, and III

Answer B
This option is correct. A constant is a good choice for statement I and statement III because the value of pi and
the number of days in a standard calendar week never change.

A program developed for a Web store represents customer account balances using a format that approximates real
numbers. While testing the program, a software developer discovers that some values appear to be mathematically
imprecise.

81. Which of the following is the most likely cause of the imprecision?
(A) The account balances are represented using a fixed number of bits, resulting in overflow errors.
(B) The account balances are represented using a fixed number of bits, resulting in round-off errors.
(C) The account balances are represented using an unlimited number of bits, resulting in overflow errors.
(D) The account balances are represented using an unlimited number of bits, resulting in round-off errors.

Answer B

Correct. The fixed number of bits used to represent real numbers limits the range of these values; this
limitation can result in round-off errors. Round-off errors typically result in imprecise values or results.

Page 52 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

82. The table below shows the time a computer system takes to complete a specified task on the customer data of
different-sized companies.

Based on the information in the table, which of the following tasks is likely to take the longest amount of time when
scaled up for a very large company of approximately 100,000 customers?
(A) Backing up data
(B) Deleting entries from data
(C) Searching through data
(D) Sorting data

A city maintains a database of all traffic tickets that were issued over the past ten years. The tickets are divided into the
following two categories.

• Moving violations
• Nonmoving violations

The data recorded for each ticket include only the following information.

• The month and year in which the ticket was issued


• The category of the ticket

83. Which of the following questions CANNOT be answered using only the information in the database?
(A) Have the total number of traffic tickets per year increased each year over the past ten years?
(B) In the past ten years, were nonmoving violations more likely to occur on a weekend than on a weekday?
In the past ten years, were there any months when moving violations occurred more often than
(C)
nonmoving violations?
(D) In how many of the past ten years were there more than one million moving violations?

AP Computer Science Principles Page 53 of 57


Scoring Guide

Big idea 2

Answer B

Correct. The database only tracks the month and year that each ticket was issued. There is no information
about whether the tickets were issued on weekends or weekdays.

84. A programmer is developing software for a social media platform. The programmer is planning to use compression
when users send attachments to other users. Which of the following is a true statement about the use of
compression?
Lossless compression of video files will generally save more space than lossy compression of video
(A)
files.
Lossless compression of an image file will generally result in a file that is equal in size to the original
(B)
file.
Lossy compression of an image file generally provides a greater reduction in transmission time than
(C)
lossless compression does.
Sound clips compressed with lossy compression for storage on the platform can be restored to their
(D)
original quality when they are played.

Answer C

Correct. Although fewer bits may be stored, information is not necessarily lost when lossy compression
is applied to an image.

An office uses an application to assign work to its staff members. The application uses a binary sequence to represent each
of 100 staff members.

85. What is the minimum number of bits needed to assign a unique bit sequence to each staff member?
(A) 5
(B) 6
(C) 7
(D) 8

Answer C

Correct. Using 7 bits will allow for up to 128 employees .

Page 54 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

86. A wildlife preserve is developing an interactive exhibit for its guests. The exhibit is intended to allow guests to
select the name of an animal on a touch screen and display various facts about the selected animal.

For example, if a guest selects the animal name “wolf,” the exhibit is intended to display the following information.

• Classification: mammal
• Skin type: fur
• Thermoregulation: warm-blooded
• Lifestyle: pack
• Average life span: 10–12 years
• Top speed: 75 kilometers/hour

The preserve has two databases of information available to use for the exhibit. The first database contains
information for each animal’s name, classification, skin type, and thermoregulation. The second database contains
information for each animal’s name, lifestyle, average life span, and top speed.

Which of the following explains how the two databases can be used to develop the interactive exhibit?
Only the first database is needed. It can be searched by animal name to find all the information to be
(A)
displayed.
Only the second database is needed. It can be searched by animal name to find all the information to be
(B)
displayed.
Both databases are needed. Each database can be searched by animal name to find all information to be
(C)
displayed.
The two databases are not sufficient to display all the necessary information because the intended
(D)
display information does not include the animal name.

Answer C

Correct. The information to be displayed comes from both databases. The animal name can be used
search the first database to find the classification, skin type, and thermoregulation information. The
animal name can be used search the second database to find the lifestyle, average life span, and top speed
information.

A student’s overall course grade in a certain class is based on the student’s scores on individual assignments. The course
grade is calculated by dropping the student’s lowest individual assignment score and averaging the remaining scores.

For example, if a particular student has individual assignment scores of 85, 75, 90, and 95, the lowest score (75) is
dropped. The calculated course grade is .

AP Computer Science Principles Page 55 of 57


Scoring Guide

Big idea 2

87. An administrator at the school has data about hundreds of students in a particular course. While the administrator
does not know the values of each student’s individual assignment scores, the administrator does have the following
information for each student.

• The student name


• A unique student ID number
• The number of assignments for the course
• The average assignment score before the lowest score was dropped
• The course grade after the lowest score was dropped

Which of the following CANNOT be determined from this data alone?


(A) For a given student, the value of the highest assignment score
(B) For a given student, the value of the lowest assignment score
(C) For a given student, the change in course grade as a result of dropping the lowest score
(D) The proportion of students who improved their course grade as a result of dropping the lowest score

Answer A

Correct. Without knowing the individual assignment scores, the administrator is unable to determine any
of the student’s individual scores other than the lowest score.

88. Directions: For the question or incomplete statement below, two of the suggested answers are correct. For
this question, you must select both correct choices to earn credit. No partial credit will be earned if only one
correct choice is selected. Select the two that are best in each case.

Which of the following can be represented by a single binary digit?

Select two answers.

A The position of the minute hand of a clock

B The remainder when dividing a whole number by 2

C The value of a Boolean variable

D The volume of a car radio

Answer B
This option is correct. When dividing a whole number (0, 1, 2, 3, …) by 2, the remainder will always be 0 or 1. A
binary digit, by its definition, stores 0 or 1.

Page 56 of 57 AP Computer Science Principles


Scoring Guide

Big idea 2

Answer C
This option is correct. The value of a Boolean variable is either “true” or “false.” These two possible values can
be represented by the binary digits 0 or 1.

AP Computer Science Principles Page 57 of 57

Common questions

Powered by AI

The required process involves filtering out entries with unknown photographers and years, then sorting the remaining entries by year to find the oldest photograph. When complete, the desired entry will automatically appear in the first row of the spreadsheet .

Metadata for a digital photo, such as the date and geographic location, provides contextual information that can be used for goals like determining the likelihood that the photo was taken at a particular public event, rather than the content of the photo itself .

The score replacement policy allows a student's final exam score to replace their midterm exam score if it is higher, contributing to a potentially better course grade by altering the calculation of total points to favor the student's best performance .

Using lossy compression software should be avoided for maintaining full-quality digital video archives because it only provides an approximation of the original video data, leading to a loss of the full-quality original versions of the videos .

To determine the average increase in total points, calculate the differences between each student's total points before and after applying the score replacement policy. Aggregating these differences over all students provides the average impact of the policy .

Byte pair encoding is lossless because it encodes by replacing repeated character pairs, but retains a mapping allowing for perfect reconstruction of the original data, making it possible to restore the encoded string to its original form .

Metadata analysis can predict the number of phone calls by examining the time stamps of past calls. By identifying patterns in the time data stored as metadata, predictions can be made about future call volumes for a specific day and time .

Saving a digital song using fewer bits per second than the original recording results in a lower sound quality because the representation of sound as data involves computational manipulation. The lower the bitrate, the less data there is to accurately represent the sound, resulting in loss of detail and quality .

By analyzing database records containing footwear items and their quantities, the retailer can determine which items are not currently in stock by searching for item identification numbers with a quantity of zero .

To assign a unique bit sequence to each of the 200 different characters, at least 8 bits are needed because 8 bits can represent up to 256 different values, which accommodates more than the required 200 distinct characters .

You might also like