Undergraduate Major in
Information and Data Sciences
Undergraduate Option Rep
The information and data sciences are concerned with the acquisition, storage, communication, processing, and analysis of data. These intellectual activities have a long history, and Caltech has traditionally occupied a position of strength with faculty spread out across applied mathematics, electrical engineering, computer science, mathematics, physics, astronomy, economics, and many others disciplines. In the last decade, there has been a rapid increase in the rate at which data are acquired with the objective of extracting actionable knowledge -- in the form of scientific models and predictions, business decisions, and public policies. From a technological perspective, this rapid increase in the availability of data creates numerous challenges in acquisition, storage, and subsequent analysis. More fundamentally, humans cannot deal with such a volume of data directly, and it is increasingly essential that we automate the pipeline of information processing and analysis. All areas of human endeavor are affected: science, medicine, engineering, manufacturing, logistics, the media, entertainment. The range of scenarios that concern a scientist in this domain are very broad -- from situations in which the available data are nearly infinite (big data), to those in which the data are sparse and precious; from situations in which computation is, for all practical purposes, an infinite resource to those in which it is critical to respond rapidly and computation must thus be treated as a precious resource; from situations in which the data are all available at once to those in which they are presented as a stream.
As such, the information and data sciences now draw not just upon traditional areas spanning computer science, applied mathematics, and electrical engineering -- signal processing, information and communication theory, control and decision theory, probability and statistics, algorithms -- but also a range of new contemporary topics such as machine learning, network science, distributed systems, and neuroscience. The result is an area that is new, fundamentally different that related areas like computer science and statistics, and that is crucial to modern applications in the physical sciences, social sciences, and engineering.
The Information and Data Science (IDS) option is unabashedly mathematical, focusing on the foundations of the information and data sciences, across its roots in probability, statistics, linear algebra, and signal processing. These fields all contribute crucial components of data science today. Further, it takes advantage of the interdisciplinary nature of Caltech by including a required set of application courses where students will learn about how data touches science and engineering broadly. The flexibility provided by this sequence allows students to see data science in action in biology, economics, chemistry, and beyond.
In addition to a major, the IDS option offers a minor that focuses on the mathematical foundations of the information and data sciences, but recognizes the fact that many students in other majors across campus have a need to supplement their options with practical training in data science.
- Computer Science Fundamentals. CS 1; CS 2; and CS 38.
- Mathematical Fundamentals. Ma 2; Ma 3; Ma 108a; and Ma/CS 6ab or Ma 121ab. The analytical tracks of Ma1bc are required.
- Scientific Fundamentals. 18 units selected from the following courses Bi 8, Bi 9, Ch 21abc, Ch 24, Ch 25, Ch 41abc, Ph 2abc, or Ph 12abc. Advanced 100+ courses in Bi, Ch, or Ph with strong scientific component can be used to satisfy this requirement with approval from the option administrator, but cannot simultaneously be used to satisfy the “Applications of Data Science” requirement or the “Advanced Electives” requirement.
- Communication Fundamentals. E10; E11.
Information and Data Science Core Requirements.
- Linear Algebra: ACM/IDS 104; ACM 106a.
- Probability: ACM/EE/IDS 116.
- Statistics: ACM/CS/IDS 157.
- Machine Learning: CMS/CS/CNS/EE/IDS 155 or CS/CNS/EE 156a.
- Signal Processing: EE/IDS 111.
- Information Theory: EE/CS/IDS 160
- Applications Electives. At least 18 units from the following list: Ay 119, BE/Bi 103, Bi/CNS/NB 153, Bi/CNS/NB 162, Bi/BE/CS 183, BEM/Ec 150, CNS/Bi/EE/CS/NB 186, CS/EE/ME 134, EE/CNS/CS 148, Ec/SS 124, ESE 136, Fs/Ay 3, FS/Ph 4, Ge/Ay 117, Ge 165, HPS/Pl/CS 110, SS 228. Other courses that include applications of data science may be substituted with approval from the option coordinator. Courses used to fulfill this requirement may not also be used to fill the any requirement above.
- Advanced Electives. At least 54 units from the following list: IDS courses numbered 100 or above, CS/CNS/EE 156ab, ACM 106b, ACM 95/100ab. Courses used to fulfill this requirement may not also be used to fill the any requirement above.
Courses used to fulfill requirements in the “Applications of Data Science” and Advanced Electives” requirements cannot be used to fulfill the institute humanities and social sciences requirements.
Units used to fulfill the Institute Core requirements do not count toward any of the option requirements. Pass/fail grading cannot be elected for courses taken to satisfy option requirements. Passing grades must be earned in total of 486 units, including all courses used to satisfy the above requirements.
Double majoring Requirements
Students interested in simultaneously pursuing a degree in a second option must fulfill all the requirements of the Information and Data Sciences option. Courses may be used to simultaneously fulfill requirements in both options. However, it is required that students have at least 54 units of “Advanced Electives” and 18 units of “Applications of Data Science” that are not simultaneously used for fulfilling a requirement of the second option, i.e., the requirements of the Advanced Electives and the Applications of Data Science sections must be fulfilled using courses that are not simultaneously used for fulfilling a requirement of the second option. Any proposal to replace these courses must be discussed with the option administrator. To enroll in the program, the student should meet and discuss his/her plans with the option representative. In general, approval is contingent on good academic performance by the student and demonstrated ability for handling the heavier course load.
Typical Course Schedule
|Units per term|
Intro. to Computer Programming
Intro. to Programming Methods
|Ma 2||Differential Equations||9||-||-|
|Ma 3||Intro. to Probability and Statistics||-||9||-|
|Ma/CS 6 ab||Intro. to Discrete Methods||9||9||-|
|ACM/IDS 104||Applied Linear Algebra||9||-||-|
|E 10||Technical Seminar Presentations||-||3||-|
|CMS/CS/CNS/EE/IDS 155||Machine Learning & Data Mining||-||12||-|
|E 11||Written Technical Communication in Engrng and Appl Sci||-||-||3|
|Ma 108 a||Classical Analysis||9||-||-|
|EE/IDS 111||Signal-Processing Systems and Transforms||9||-||-|
|ACM/CS/IDS 157||Statistical Inference||-||-||9|
|ACM/EE/IDS 116||Intro. to Probability Models||9||-||-|
|ACM/EE 106 a||Intro. Methods of Computational Math.||12||-||-|
|EE/CS/IDS 160||Fundamentals of Information Transmission and Storage||-||9||-|
Starting in the sophomore year IDS students will be assigned a faculty advisor whom they should meet with regularly, typically once per quarter. Students in the program are advised by faculty interested in the information and data sciences from across the institute. This includes all the CMS faculty, as well as the following faculty that pursue data science-related research and participate in IDS advising: Justin Bois, Fernando Brandao, Shuki Bruck, George Djorgovski, Laura Doval, Frederick Eberhardt, Federico Echenique, Babak Hassibi, Jonathan Katz, Victoria Kostina, Heather Knutson, Tom Miller, Pietro Perona, Antonio Rangel, Mark Simons, Omer Tamuz, Andrew Thompson, Matt Thomson, Victor Tsai, David Van Valen, Zhongwen Zhan. Students seeking an IDS advisor should contact the undergraduate option secretary at firstname.lastname@example.org.