Job Title: Data Engineer
Location: San Francisco Bay Area / Remote US
Who We Are: Truthset is a venture-backed SaaS startup solving the multi-billion dollar problem of data quality for the entire marketing industry. Our platform enables brands and publishers such as Paramount, Procter & Gamble, and Transunion to optimize consumer data quality, improving marketing ROI. In a fast-paced and collaborative environment, we are committed to excellence and innovation.
Our Tech Stack: AWS (EMR, EC2, S3, Athena, Sagemaker), Spark, DBT, Snowflake, Databricks, Airflow, Terraform, Github, Tableau.
Our Programming Languages: Scala, Python, SQL, and Bash.
Who You Are:
A driven individual excited about joining a small, but growing Data Science and Engineering team. You’ll report to the Head of Data Science and work alongside a Data Scientist and a Principal ML Engineer. You have a deep understanding of data engineering principles and past work experience designing, building, and maintaining data pipelines in cloud environments.
Responsibilities:
- Design, build, and maintain scalable data pipelines that supply big data to internal and external teams.
- Automate the delivery of terabytes of structured data to a growing group of enterprise clients.
- Automate the ingestion of terabytes of external data sources into internal data warehouses in different environments (e.g., AWS, Snowflake, Databricks).
- Write, test, debug, and optimize custom Scala code for ETL workflows and other one-off tasks.
- Deploy ETL code in the cloud (using batch orchestration tools, like Airflow).
- Work closely with the Head of Data Science and Principal ML Engineer to test and deploy new infrastructure for data processing.
- Create an internal toolkit (KPIs, testing programs, dashboards) to monitor the health of data pipelines.
- Maintain documentation about generated datasets (data dictionaries, feed specs. etc.) for internal and external use.
- Advise the Head of Data Science on future tooling upgrades
Core Qualifications:
- Bachelor’s in Computer Science, Mathematics, Statistics, or other related fields.
- 3+ years of relevant work experience.
- Proficiency in one or more programming languages such as Python, Scala, Java, or other languages commonly used in data engineering
- Experience with cloud/distributed computing tools, including Spark, AWS EMR, and cloud-based data warehouse platforms such as Snowflake, Databricks or Redshift.
- A strong background in at least one of the following: distributed data processing or software engineering of data services, or data modeling
- Experience with relational (SQL) databases and graph databases
- Experience with version control software, such as Github.
- Excellent communication and collaboration skills.
- Strong problem-solving skills and attention to detail.
Ideal Qualifications:
- Industry experience programming in Scala.
- Familiarity with a scripting language like Python or R.
- Familiarity with Terraform and Airflow.
- Familiarity with DBT
Compensation:
The compensation package will include full health benefits, 401k, and the potential for an equity stake.
Contact:
To apply, please email a CV and (optional) cover letter to [email protected]
Salary
$150,000 - $180,000 per year