Data Engineer | Backend Developer | AI-Driven Automation Specialist
Hello! I’m Johana, a versatile Data Engineer with a background in Electronic Engineering and extensive experience in project management.
My passion lies in turning raw information into scalable, production-grade data products on Google Cloud Platform (GCP). I design serverless architectures, orchestrate complex automations, and apply NLP & AI to extract insights from massive legislative document collections.
- Cloud-Native Pipelines: End-to-end design and deployment of data pipelines on GCP (Cloud Functions, Cloud Run, Workflows, Cloud Storage, Pub/Sub & managed PostgreSQL) handling millions of records daily.
- Automation Orchestration: Orchestrated large-scale scraping, bulk PDF downloads, OCR, and database loading using n8n and Google Workflows.
- OCR & AI Services: Integrated Tesseract (Cloud Run) and OpenAI APIs for text extraction, automatic summarization, embeddings generation, and topic classification.
- Secure Microservices: Built FastAPI microservices containerized with Docker, secured by dynamic ID-tokens (Cloud Run → Cloud Run) to enable credential-free calls from n8n & Workflows.
- Knowledge Graphs: Created semantic graphs of legislative projects, authors, and topics using text-embedding-3-small embeddings and graph visualization libraries.
- Cost & Security Optimization: Implemented fine-grained IAM, VPC connectors, and storage class tuning to cut GCP costs ~40 % while meeting compliance requirements.
- Domain Expertise: Specialized in large-scale processing of government & legislative documents across LATAM.
Area | Tech & Tools |
---|---|
Languages | Python (Advanced), SQL (Advanced) |
Data Engineering | GCP (Cloud Run, Cloud Functions, Workflows, Pub/Sub, Cloud Storage), Docker, dbt, Apache Airflow, CrateDB |
Automation / Orchestration | n8n, Google Workflows |
NLP & AI | OpenAI GPT, text-embedding-3-small, spaCy, LangChain |
OCR | Tesseract (+ custom wrapper in Cloud Run) |
Data Viz / BI | Metabase, Looker Studio, Power BI, Kepler.gl |
Dev Tools | Git & GitHub, FastAPI, Poetry, VS Code |
Databases | PostgreSQL (Managed / Self-hosted), Snowflake, MySQL, SQLite |
- Advanced LangGraph / CopilotKit patterns for agentic workflows.
- Vertex AI pipelines for scalable model serving on GCP.
Oct 2023 – Present
- Architected GCP serverless data platform powering legislative intelligence products.
- Implemented multi-stage ETL/ELT pipelines with Cloud Run + Workflows, reducing manual processing time by 80 %.
- Deployed OCR & NLP microservices (Tesseract + OpenAI) generating rich metadata, summaries, and embeddings for >50 K documents.
- Led cost-optimization initiative: storage tiering & idle-instance scheduling cut monthly spend from $1.2k → $700.
Apr 2022 – May 2023
- Managed electronic-security projects for public & private sectors (incl. Ecopetrol).
- Introduced data-driven KPIs (Excel + Power BI) boosting SLA adherence by 15 %.
Feb 2012 – Mar 2022
- Oversaw maintenance of >1 000 surveillance devices; dropped mean-time-to-repair by 25 %.
- Championed root-cause analysis culture, improving system reliability.
- Big Data Certified Professional – Talento Tech MINTIC (Oct 2024)
- Data Analytics Certified Professional – Talento Tech MINTIC (Oct 2024)
- Project Management Master – ENEB (May 2023)
- Data Analysis with Python – Platzi (Jun 2023)
- PyLadies Bogotá – Active Member
- Python Colombia – Contributor
- Volunteer – JS Conf CO 2023, PyCon CO 2024
- Creator – FastAPI Workshop Chapter
- Email: [email protected]
- Location: Bogotá, Colombia
- Phone: +57 317 292 1350