Job Title: Data Engineer

Location: San Francisco Bay Area / Remote US

Who We Are: Truthset is a venture-backed SaaS startup solving the multi-billion dollar problem of data quality for the entire marketing industry. Our platform enables brands and publishers such as Paramount, Procter & Gamble, and Transunion to optimize consumer data quality, improving marketing ROI. In a fast-paced and collaborative environment, we are committed to excellence and innovation.

Our Tech Stack: AWS (EMR, EC2, S3, Athena, Sagemaker), Spark, DBT, Snowflake, Databricks, Airflow, Terraform, Github, Tableau.

Our Programming Languages: Scala, Python, SQL, and Bash.

Who You Are:

A driven individual excited about joining a small, but growing Data Science and Engineering team. You’ll report to the Head of Data Science and work alongside a Data Scientist and a Principal ML Engineer. You have a deep understanding of data engineering principles and past work experience designing, building, and maintaining data pipelines in cloud environments.

Responsibilities:

Design, build, and maintain scalable data pipelines that supply big data to internal and external teams.
Automate the delivery of terabytes of structured data to a growing group of enterprise clients.
Automate the ingestion of terabytes of external data sources into internal data warehouses in different environments (e.g., AWS, Snowflake, Databricks).
Write, test, debug, and optimize custom Scala code for ETL workflows and other one-off tasks.
Deploy ETL code in the cloud (using batch orchestration tools, like Airflow).
Work closely with the Head of Data Science and Principal ML Engineer to test and deploy new infrastructure for data processing.
Create an internal toolkit (KPIs, testing programs, dashboards) to monitor the health of data pipelines.
Maintain documentation about generated datasets (data dictionaries, feed specs. etc.) for internal and external use.
Advise the Head of Data Science on future tooling upgrades

Core Qualifications:

Bachelor’s in Computer Science, Mathematics, Statistics, or other related fields.
3+ years of relevant work experience.
Proficiency in one or more programming languages such as Python, Scala, Java, or other languages commonly used in data engineering
Experience with cloud/distributed computing tools, including Spark, AWS EMR, and cloud-based data warehouse platforms such as Snowflake, Databricks or Redshift.
A strong background in at least one of the following: distributed data processing or software engineering of data services, or data modeling
Experience with relational (SQL) databases and graph databases
Experience with version control software, such as Github.
Excellent communication and collaboration skills.
Strong problem-solving skills and attention to detail.

Ideal Qualifications:

Industry experience programming in Scala.
Familiarity with a scripting language like Python or R.
Familiarity with Terraform and Airflow.
Familiarity with DBT

Compensation:

The compensation package will include full health benefits, 401k, and the potential for an equity stake.

Contact:

To apply, please email a CV and (optional) cover letter to [email protected]

Salary

$150,000 - $180,000 per year

Truthset

Data Engineer

Build Microservices in Go

Master microservices for beginners

Salary