Principal Data Scientist
About this role
Work Schedule
Standard (Mon-Fri)
Environmental Conditions
Office
Job Description
At Thermo Fisher’s PPD clinical research business, we’re using digital innovation, data science, and AI to reimagine how life-changing therapies reach patients. Our teams combine deep scientific expertise with advanced analytics, automation, and digital platforms to make research smarter, faster, and more connected.
We know that innovation happens when diverse minds meet. Our Digital Science, Data, and AI professionals collaborate closely with scientists, clinicians, and operational experts to solve real-world challenges in clinical research. Alongside our partnership with Open AI, you can be part of the collaboration that will help to improve the speed and success of drug development, enabling customers to get medicines to patients faster and more cost effectively.
You’ll join a culture that values experimentation, learning, and collaboration — where your ideas can help shape how we deliver life-saving solutions and improve global health outcomes. Whether you’re a data engineer, product manager, software developer, or AI scientist, you’ll find opportunities here to apply your skills to work that truly matters — improving global health outcomes.
Principal Data Scientist - Patient Analytical Services Division (PASD)
The Principal Data Scientist is a senior individual contributor and the deepest technical voice on the PASD data science team, focused on applying machine learning, advanced analytics, and modern AI to patient-level healthcare data. This role partners closely with epidemiologists, statisticians, RWE scientists, data engineers, and consulting teams to build scalable analytical and AI solutions that power evidence generation and decision support for biopharmaceutical, biotech, and medical device clients.
The role balances three areas: rigorous ML and advanced analytics on complex patient data (claims, EHR, registries, linked datasets), responsible adoption of generative AI and agentic solutions for analytics productivity and client-facing workflows, and fluent collaboration within RWE and patient analytics contexts. This is a hands-on technical leadership role; influence is exercised through technical depth, mentorship, and setting engineering and modeling standards, not through people or portfolio management.
Key Responsibilities
Technical Leadership (Individual Contributor)
• Serve as a senior technical expert across the full analytics lifecycle, including problem framing, data strategy, model development, validation, deployment, and monitoring.
• Set and uphold high standards for modeling rigor, reproducibility, and engineering quality across the data science team.
• Mentor data scientists and engineers, review code and modeling approaches, and raise the technical bar on projects without owning delivery management.
• Evaluate emerging methods, tools, and frameworks, and guide adoption where they add measurable value.
Machine Learning & Advanced Analytics for Patient Data
• Build predictive and descriptive models on patient-level healthcare data to support use cases such as patient stratification, risk prediction, text analytics, workflow prioritization, and decision support.
• Apply appropriate methods across classical statistical modeling, machine learning, and deep learning, including survival analysis, causal inference, propensity scoring, and longitudinal modeling where relevant.
• Design feature engineering, evaluation, and validation approaches suited to the complexities of real-world healthcare data, including missingness, censoring, bias, and longitudinal structure.
• Develop reproducible, well-tested pipelines using modern data science tooling, experiment tracking, and scalable compute.
Generative AI & Agentic Solutions
• Identify and implement high-value applications of generative AI to improve analytics productivity, scientific review, knowledge retrieval, and internal and client-facing workflows.
• Design and evaluate LLM-powered assistants, retrieval workflows, and agentic applications with appropriate human oversight, traceability, and quality controls.
• Partner with platform and engineering teams to operationalize AI applications using enterprise tooling for experimentation, tracing, evaluation, and monitoring, ensuring responsible deployment in regulated and client-facing environments.
Cross-Functional Partnership
• Partner with RWE scientists, epidemiologists, statisticians, data engineers, product owners, and consulting teams to translate scientific and business questions into sound analytical approaches.
• Communicate methods, assumptions, limitations, and findings clearly to both technical and non-technical audiences, including client-facing contexts.
• Translate technical outputs into scientific and business value for internal teams and client stakeholders.
Key Technologies
Languages and Analytics: Python, SQL, R
ML / AI: scikit-learn, XGBoost, PyTorch, TensorFlow, NLP libraries, LLM APIs
Statistics for RWD: survival analysis, causal inference, propensity scoring, longitudinal modeling
Data Platforms: Databricks, Spark, Delta Lake, Snowflake, AWS, Azure
LLMOps / Agentic AI: MLflow, prompt and version tracking, tracing, evaluation frameworks, RAG architectures, vector search, agent orchestration frameworks
Engineering and Delivery: Git, CI/CD, notebooks, APIs
Qualifications
• Bachelors degree in data science, computer science, statistics, biostatistics, epidemiology, mathematics, bioinformatics, or a related quantitative field or
• Master's degree with significant progressive experience in data science, machine learning, or healthcare analytics (preferred).
• Previous experience in data science that provides the knowledge, skills, and abilities to perform the job (comparable to 8-10 years’ experience).
• Hands-on experience applying ML and advanced analytics to real-world healthcare data such as claims, EHR, registries, or other patient-level longitudinal datasets.
• Strong programming skills in Python and SQL; working proficiency in R.
• Solid grounding in statistical modeling, machine learning, and model evaluation.
• Experience working in modern cloud and data platforms such as Databricks, Spark, AWS, Azure, or Snowflake.
• Strong software engineering fundamentals, including version control, modular code, testing, documentation, and reproducibility.
• Strong written and verbal communication skills, with the ability to present methods and findings clearly to diverse audiences.
Preferred Qualifications
• Experience applying ML and advanced analytics within RWE, HEOR, epidemiology, pharmacoepidemiology, or patient analytics at a pharma, biotech, CRO, medical device, or healthcare analytics organization.
• Domain familiarity with oncology, immunology, rare disease, or therapeutic-area-specific patient analytics.
• Experience with survival analysis, causal inference, propensity scoring, and longitudinal modeling applied to real-world data.
• Experience with NLP, unstructured clinical text, knowledge retrieval, LLM applications, prompt evaluation, and agentic workflows.
• Practical experience with MLOps / LLMOps capabilities such as experiment tracking, tracing, evaluation frameworks, model monitoring, and deployment governance.
• Experience mentoring data scientists and contributing to technical standards in a matrixed environment.
At Thermo Fisher Scientific, we are committed to fostering a healthy and harmonious workplace for our employees. We understand the importance of creating an environment that allows individuals to excel. Please see below for the required qualifications for this position, which also includes the possibility of equivalent experience:
• Able to communicate, receive, and understand information and ideas with diverse groups of people in a comprehensible and reasonable manner.
• Able to work upright and stationary for typical working hours.
• Ability to use and learn standard office equipment and technology with proficiency.
• Able to perform successfully under pressure while prioritizing and handling multiple projects or activities.
• May require as-needed travel (0-20%).
*Location: Remote US (East Coast preferred). Relocation assistance is NOT provided.
*Must be legally authorized to work in the United States without sponsorship.
*Must be able to pass a comprehensive background check, which includes a drug screening.
The annual salary range estimated for this position in North Carolina is $185,000- $215,000 USD. This position may also be eligible to receive a variable annual bonus based on company, team, and/or individual performance results in accordance with company policy.